Transformer is very powerful and good, but it has certain limitations when processing time series data, such as high computational complexity and insufficient efficiency in processing long sequence data.
In the data-driven era, time series forecasting has become an indispensable part in many fields.
Therefore, Ant and Tsinghua jointly launched a pure MLP architecture model TimeMixersurpassing the Transformer model in terms of performance and efficiency in time series prediction.
They combined the decomposition of the periodic characteristics of time series trends and the multi-scale mixed design pattern, which not only greatly improved the long- and short-range prediction performance, but also achieved extremely high efficiency close to that of a linear model based on a pure MLP architecture.
How did Lai Kangkang do it?
Pure MLP architecture outperforms Transformer
The TimeMixer model adopts a multi-scale mixture architecture to address the complex temporal variation problems in time series forecasting.
The model mainly adopts the full MLP (multi-layer perceptron) architecture, which consists of two major blocks: Past Decomposable Mixing (PDM) and Future Multipredictor Mixing (FMM), and can effectively utilize multi-scale sequence information.
The PDM module is responsible for extracting past information and mixing seasonal and trend components at different scales.
PDM is driven by the mixing of seasons and trends, gradually aggregating detailed seasonal information from fine to coarse, and using coarser-scale prior knowledge to deeply mine macro-trend information, ultimately achieving multi-scale mixing in past information extraction.
FMM is a collection of multiple predictors, where different predictors are based on past information at different scales, enabling FMM to integrate complementary prediction functions of mixed multi-scale sequences.
Experimental results
To verify the performance of TimeMixer, the team conducted experiments on 18 benchmark datasets including long-term prediction, short-term prediction, multivariate time series prediction, and spatiotemporal graph structures, including power load prediction, meteorological data prediction, and stock price prediction.
Experimental results show that TimeMixer surpasses the most advanced Transformer model in multiple indicators. The specific performance is as follows:
Prediction Accuracy: TimeMixer shows higher prediction accuracy on all tested datasets. Taking power load forecasting as an example, TimeMixer reduces the mean absolute error (MAE) by about 15% and the root mean square error (RMSE) by about 12% compared to the Transformer model.
Computational efficiency: Thanks to the efficient computing characteristics of the MLP structure, TimeMixer significantly outperforms the Transformer model in both training time and inference time. Experimental data shows that under the same hardware conditions, TimeMixer's training time is reduced by about 30% and inference time is reduced by about 25%.
Model Interpretability: By introducing Past Decomposable Mixing and Future Multipredictor Mixing techniques, TimeMixer can better explain the information contribution at different time scales, making the decision-making process of the model more transparent and easy to understand.
Generalization: TimeMixer has been tested on multiple different types of datasets and has shown good generalization ability and can adapt to different data distributions and characteristics. This shows that TimeMixer has wide applicability in practical applications.
Long-term forecast: To ensure fair comparison of models, experiments were conducted using standardized parameters, adjusting input length, batch size, and training epochs. In addition, given that the results of various studies usually stem from hyperparameter optimization, this study also includes the results of comprehensive parameter searches.
Short-range forecasting: multivariate data
Short-range forecasting: univariate data
Ablation experiment: To verify the effectiveness of each component of TimeMixer, we conduct experiments on all 18 experimental benchmarks. Future-Multipredictor-Mishing A detailed ablation study is performed for each possible design of the module.
Model efficiency: The team compared the running memory and time of the training phase with the latest state-of-the-art models, where TimeMixer consistently showed good efficiency in terms of GPU memory and running time for a variety of series lengths (ranging from 192 to 3072), in addition to having consistent state-of-the-art performance for both long-term and short-term prediction tasks.
It is worth noting that TimeMixer, as a deep model, shows results close to those of full linear models in terms of efficiency. This makes TimeMixer promising in various scenarios that require high model efficiency.
Well, TimeMixer brings new ideas to the field of time series prediction and also demonstrates the potential of pure MLP structures in complex tasks.
In the future, with the introduction of more optimization technologies and application scenarios, I believe TimeMixer will further promote the development of timing prediction technology and bring greater value to various industries.
This project has received support from NextEvo, the AI innovation and R&D department of Ant Group’s Intelligent Engine Division.
Ant Group's NextEvo-Optimization Intelligence Team is responsible for the technical directions of Ant's operational optimization, time series prediction, and intelligent decision-making combining prediction optimization. The team's work covers the research and development of algorithm technology, platform services, and solutions.
Paper address:
https://arxiv.org/abs/2405.14616v1
https://github.com/kwuking/TimeMixer
Advertising Statement: The external jump links contained in the article (including but not limited to hyperlinks, QR codes, passwords, etc.) are used to convey more information and save selection time. The results are for reference only. All articles in Gamingdeputy include this statement.