PaperHub
5.3
/10
Poster3 位审稿人
最低5最高6标准差0.5
5
5
6
4.0
置信度
正确性2.0
贡献度2.7
表达3.0
NeurIPS 2024

Introducing Spectral Attention for Long-Range Dependency in Time Series Forecasting

OpenReviewPDF
提交: 2024-05-11更新: 2024-11-06
TL;DR

Introduce low-pass filter based spectral attention to address long-range dependency in time series forecasting

摘要

关键词
Time series forecastingLong-range dependencyLow-pass filterSpectral attentionLong term trend

评审与讨论

审稿意见
5

This paper focuses on the exploration of long-term dependency within a whole time series sequence, to address the challenge of the short-term look-back window, which is interesting. Based on the observation, this paper provides Batched Spectral Attention to enable parallel training across multiple timesteps.

优点

  • The analysis of attention is sufficient to show the FFT graphs.

  • The method is designed as an easy plug-in module, which benefits various base models.

缺点

  • The writing and logical presentation can be strengthened, which is hard to understand, e.g., Fig. 2. (I spent lots of time understanding the formulations and Fig. 2 )

  • It is not clear why only the low-frequency components are used, just for long-term dependency, and why the lack of high-frequency components does not affect the short-term vibrations.

  • The comparison in Table 1 may be a little unfair since more computation and memory are introduced by the BSA. Thus, please provide the computation and memory comparison in Table 1 to further evaluate the superiority of BSA.

  • I am curious will the performance still be improved when applying BSA as the look-up window T increases or decreases? That is to validate the effectiveness of BSA in more scenarios to demonstrate its generalization ability.

问题

See weaknesses.

局限性

Not sufficient. I can't find the potential negative societal impact of their work in the Conclusion.

作者回复

We thank you very much for the insightful comments and suggestions. We have addressed each of your questions below. Please also review the global comment and the attatched PDF since they are used for answering your question.


W1 We sincerely apologize for not providing a clear enough presentation of our model and logical framework. Following your valuable comment, we have thoroughly revised Fig. 2. Additionally, we have updated the main text, adding diagrams and explanations to make our methodology easier to understand. The revised version has been added to the attached PDF (Fig. R1 and Fig. R2). Please kindly review these updates in the PDF file from the global comment.

W2 Thank you for your comment. Our BSA module leverages both low-frequency and high-frequency components. The Spectral Attention mechanism focuses on specific frequencies during training, targeting the essential information frequency in the data. If and only if the data's prominent information frequency is low-frequency, the BSA module attends more to low-frequency information.

Our BSA module stores the low-frequency component (M) of the feature (F) obtained through EMA. When computing F', a replacement F, attention involves three components: M (low-frequency component), F (identity), and F-M (high-frequency component). These components are used to compute F' through a weighted sum using the SA-Matrix (in practice, multiple momentums are employed, resulting in multiple Ms).

The learning results indicate that for data with long-term dependencies, the BSA module prioritizes low-frequency components, effectively capturing long-range patterns. This can be seen in Fig. 3 and 4 of the main text.

W3 Thank you for your insightful comment. As you mentioned, BSA, a plug-in module, may increase computational complexity. Therefore, we measured the running time, peak memory, and number of parameters across various models. Full results are reported in the global comment Table E2~E5.

Table E5 Total Average Additional Cost of BSA in Percentage(%). We report the average value of Table E2, E3 and E4.

TimeMemoryNum_Param
Weather6.12128.68261.7774
PEMS030.03120.80812.3110

Each number represents the increased cost due to BSA as a percentage. Experiments on the lightweight weather dataset with 21 channels and the complex PEMS dataset with 358 channels demonstrate the low computational cost and scalability of our model. Our model has constant complexity with respect to input length, making it highly applicable.

W4 Thank you for your inspiring comment. To demonstrate that BSA consistently maintains high performance regardless of changes in the look-back window size, we conducted experiments by modifying the original 96 look-back window to 48 and 192. This table shows the percentage performance gain:

DlinearRLineariTransformer
MSEMAEMSEMAEMSEMAE
Weather4820.849016.54944.37242.680812.07865.2078
9610.13868.71084.59132.50397.78363.0341
1927.41826.13514.30211.60946.87103.3374
PEMS034818.726311.488318.863411.071229.454517.3504
9611.89837.482932.042418.032124.546913.6611
19219.327010.579521.794513.70059.63365.1841

The results show that BSA provides constant performance improvements across all input sizes, with particularly notable improvements as the look-back window size decreases. This clearly indicates that BSA enables the base model to learn long-range information beyond the look-back window, achieving high prediction performance even when the look-back window is limited.

评论

Thank you for the response. I raise my scores to 5.

评论

Dear Reviewer weYb,

We are immensely thankful for your constructive feedback. Your helpful and insightful comments would indeed make our paper stronger.

  • Significantly improve the figures in the paper to make them easier to understand and ensure a more natural flow.
  • Through a comprehensive study of BSA's computation and memory, we demonstrate the high efficiency of BSA.
  • Verify that BSA consistently demonstrates high performance even in situations where the look-back window changes.

While we hope that our response properly addressed your concerns, we would be happy to provide any additional clarification if necessary. We are also awaiting your advice that could further strengthen our paper.

Thank you for your time and effort.

Sincerely, Authors

评论

Hope you have a good day.

评论

Dear Reviewer weYb,

Thank you for your positive response to our rebuttal. Your review has been instrumental in helping us address areas for improvement, resulting in a more robust and higher-quality paper.

Thank you.

Sincerely, Authors

审稿意见
5

The paper presents a Spectral Attention mechanism to address the challenge of capturing long-range dependencies in time series forecasting. By preserving temporal correlations among samples and leveraging frequency domain information, the proposed method enhances the forecasting performance of various baseline models. Extensive experiments demonstrate the efficacy of Spectral Attention, achieving state-of-the-art results on multiple real-world datasets.

优点

  1. Efficacy: The proposed Spectral Attention mechanism significantly improves forecasting performance across multiple real-world datasets.
  2. Versatility: The method can be integrated into various baseline models, showcasing its adaptability and broad applicability.
  3. Experimental validation: The experiments show consistent performance improvements.

缺点

  1. Novelty and Contribution/Missing Related Work
  • While the paper introduces Spectral Attention as a novel method, using frequency domain analysis in time series forecasting is not a new concept. Notably, there is an existing paper [1] with the same title, "Spectral Attention," also focused on time series forecasting, which is not referenced in the manuscript. This referenced paper employs a global/local perspective: the global Spectral Attention provides comprehensive information about the random process the time series are considered part of, thereby addressing some limitations mentioned in the paper, such as the inability to model long-range dependencies. Furthermore, the referenced work includes spectral filtering, differing from the authors' approach by learning a cut-off frequency while filtering each frequency component independently. Their solution integrates easily into pre-existing architectures. Given the significant similarities and the relevance of this other paper as a precursor to the authors' idea, why is it not cited in the manuscript? The authors must include a thorough discussion of [1], acknowledging its relation and relevance to their work.
  • Additionally, recent SSM-based approaches should be covered in the related work section.
  1. Differentiation from Existing Methods
  • The paper would benefit from a clearer distinction between Spectral Attention and existing frequency-based methods such as WaveFroM, FEDformer, or the Spectral Attention Autoregressive Model. The unique advantages and innovations of Spectral Attention compared to these methods should be emphasized more explicitly.
  1. Computational Complexity
  • The paper claims that Spectral Attention adds minimal computational overhead. However, the analysis of computational complexity and memory requirements is somewhat superficial. A detailed comparison of training and inference times, as well as memory usage, with and without Spectral Attention, should be provided to substantiate these claims.
  1. Integration with Different Models
  • While the paper demonstrates that Spectral Attention can be integrated with various TSF models, the integration process lacks detail. Specific guidelines or algorithms for integrating Spectral Attention with different architectures should be included. This would facilitate practitioners in applying the proposed method to their models.

[1] Moreno-Pino, Fernando, Pablo M. Olmos, and Antonio Artés-Rodríguez. "Deep autoregressive models with spectral attention." Pattern Recognition 133 (2023): 109014.

问题

  1. How does the proposed method specifically improve upon the referenced Spectral Attention Autoregressive Model [1]?
    • A detailed comparative analysis and, if possible, empirical validation are necessary to substantiate the claims of improvement.
  2. Can you provide a more detailed explanation of the complexity involved in the spectral attention mechanism?
  3. Given that the proposed model operates in the frequency domain, is it capable of handling irregularly sampled data even if the underlying architecture cannot?

I am open to raising my score if the authors adequately address these concerns.

局限性

Yes

作者回复

We thank you very much for the insightful comments and suggestions. We have addressed your questions below.


W1-a & Q1 Thank you for introducing this significant work. We will add this research to the related work section (we reported a summary in the global comment). While this study is remarkable, we think there are significant differences between their works and our work in terms of scenarios and methods.

From the perspective of the model's objective, SAAM aims to enable autoregressive models to better capture the trend of signals within the input sequence (look-back window). In contrast, our BSA is the first attempt to learn temporal correlations across multiple input samples beyond the look-back window.

In terms of the learning methodology, SAAM can only be applied to autoregressive models. However, many recent state-of-the-art models use non-autoregressive architectures that inherently excel at learning the trend of input signals. BSA, on the other hand, is agnostic to the model structure and can be applied to various types of neural network-based TSF models.

Additionally, SAAM requires performing FFT and calculating autocorrelation on the input signal, which results in quadratic memory complexity on input length. Experimental results show that on the 137-channel solar dataset, SAAM required an additional 18.79% training time. In contrast, BSA does not perform Fourier transforms and maintains a minimal computation cost that is constant with respect to input length. Experiment on the BSA’s computational complexity is provided below in the answer to W3.

W1-b Thank you for your insightful comment. As you suggested, we will add related works such as the State Space Model[1], Structured State Space Model[2], and Mamba[3] to our related work section. Please review the global comment.

[1] Koller 2009.

[2] Gu, Albert, 2021.

[3] Gu, Albert, 2023.

W2 We apologize if it was unclear how our BSA method differs from the WaveForM and FEDformer discussed in the later part of the "2. Related Works, Frequency-Utilizing Models".

WaveForM and FEDformer's goal of using Wavelet transform or FFT to the frequency domain is to decompose information from a single input into a multiple-frequency representation. Therefore, if we try to find the long-term dependency beyond the current look-back window using Wavelet or FFT decomposition, we need the whole data of multiple inputs previous to the current input, which is intractable.

However, our BSA method aims to learn the long-term dependency beyond the look-back window by extracting temporal correlation information between the sequential input samples using multiscale EMA, which is suited for streaming time series inputs. To learn the long-term dependency, BSA does not need all the data from previous sequences; it only needs the momentum values of the selected feature and the current input sequence. This approach is highly efficient and tractable.

To briefly recap the Related Works, WaveForM, FEDformer, and FreTS all utilize the frequency domain transformation and are limited to utilizing the information only to the length of the look-back window. Our BSA is a model-agnostic module that can allow these frequency-utilizing models to learn long-term dependency over the look-back window, as shown by our experiments with FreTS in the main Table 1.

Please refer to our response above on W1-a for detailed differences with SAAM to avoid repetition.

W3 & Q2 Thank you for pointing out this important aspect. To measure the computational cost of BSA, we conducted comprehensive experiments on peak memory usage, running time, and number of parameters. We presented the full table in the global comment Table E2~E5. The average results are as follows:

Table E5 Average Additional Cost of BSA in Percentage(%).

TimeMemoryNum_Param
Weather6.12128.68261.7774
PEMS030.03120.80812.3110

For the weather dataset, which has a small number of channels (21), we observed a 6.1% increase in running time, an 8.7% increase in memory usage, and a 1.8% increase in the number of parameters. In contrast, for the PEMS dataset, which has a large number of channels (358), there was only a 0.03% increase in running time, a 0.80% increase in memory usage, and a 2.3% increase in the number of parameters. This demonstrates the high scalability and applicability of the BSA.

W4 We agree that providing a thorough explanation of the integration process is necessary, and we will strengthen this part accordingly. As you pointed out, this is crucial for facilitating practitioners.

Our method can be applied to any TSF model that satisfies the problem statement in the manuscript and can be used simply by plugging it into any arbitrary activation within the model (For any arbitrary activation F within the model, BSA can be used as a plugin by adding F = BSA(F) in the forward function). The integration is completed by subsequently changing the sampling method to sequential sampling.

In Section 4.3 of the main text, we conducted experiments to determine which part of the model benefits most from the application of the BSA module. We observed performance improvements regardless of where the module was inserted, with the highest performance gains observed when it was applied to the queries or the activations before the decoder.

To make BSA easy to use, we will provide convenient plugin code along with the experimental code in the final version of the manuscript.

Q3 Thank you for your insightful comment. Since BSA is applied as a plug-in to the underlying architecture, we hypothesize that BSA would also be constrained if the underlying architecture cannot handle irregular input. However, if the underlying architecture can process irregular input, we believe BSA could also handle irregular input by using momentum proportional to the irregular time intervals. We consider this to be a valuable topic for future research.

评论

We appreciate your time and effort in reviewing our submission. We kindly request if you could please take a moment to review our rebuttal and provide any further feedback. Your insights are invaluable to us.

Thank you

评论

Thank you for the new experiments carried out by the authors, which have improved the paper. I am raising the score.

评论

Thank you for being a reviewer for NeurIPS2024, your service is invaluable to the community!

The authors have submitted their feedback.

Could you check the rebuttal and other reviewers' comments and start a discussion with the authors and other reviewers?

Regards, Your AC

评论

Dear Reviewer pdEA,

We sincerely appreciate your valuable feedback and time reviewing our work. Your helpful and insightful comments would indeed make our paper stronger:

  • Added recommended related studies to the "related work" section and clarified the distinctions from our research.
  • Clarified the novelty of our BSA method compared to other frequency-utilizing methods
  • Conducted a thorough analysis of BSA's computational cost and memory efficiency, demonstrating its high efficiency
  • Provided a detailed description of the BSA implementation process.

Again, we thank you for your response to our rebuttal. While we hope that our response properly addressed your concerns, we would be happy to provide any additional clarification if necessary. We also welcome any additional advice you may have that could strengthen our paper.

Thank you for your time and effort.

Sincerely, Authors

审稿意见
6

This paper introduces a new mechanism called "Spectral Attention" designed to address the challenge of long-term dependencies in time series prediction. Traditional models such as linear and Transformer-based predictors face limitations in handling long-term dependencies due to fixed input sizes and the shuffling of samples during training. Spectral Attention enhances model performance by preserving temporal correlations and facilitating long-term information processing. It utilizes low-pass filters to capture long-term trends and supports parallel training across multiple time steps, thus expanding the effective input window. Experimental results demonstrate that this mechanism achieves state-of-the-art prediction performance on multiple real-world datasets, opening new avenues for exploration in time series analysis.

优点

  1. Aiming at the problem of long-term dependence in time series prediction, this paper proposes a novel Spectral Attention mechanism.
  2. This method can be extended to multiple prediction models and achieve performance improvement.

缺点

  1. There are some formulas in the paper that do not explain the specific meaning.
  2. the source code of the paper is not open, can not verify the effect of the experiment.

问题

  1. the use of sequential sampling instead of random sampling, which will cause the model to generalize in the face of new data or different distribution of data, there is a risk of overfitting?
  2. There are some formulas in the paper that do not explain the specific meaning. For example, "The base model can be reformulated as P=f2(F,E) and F,E=f1(X)." What is the meaning of "E"?
  3. How is the formula "fSA=f2·SA·f1" derived in the sentence "Our Spectral Attention module takes feature F as input and transforms it into F' of the same size, F' =SA(F), modifying the base model as follows: fSA= f2 · SA · f1"?
  4. why was the experiment not conducted on the common ILI "dataset"?
  5. Do spectral attention mechanisms significantly increase the computational cost of model training and inference?

局限性

no.

作者回复

We thank you very much for the insightful comments and suggestions. We have addressed each of your questions below. Also, please review the global comment and the attached PDF.


Weaknesses

W1 We apologize for not providing sufficient specific meanings for some of the formulas in our paper. We have thoroughly addressed this issue based on your comments and revised our manuscript. Additionally, we have included more detailed diagrams to facilitate understanding of each step of our formulas and to provide a more detailed explanation. Kindly refer to Figures R1 and R2 in the attached PDF.

W2 We will definitely share a GitHub link containing the complete code in the final version of the manuscript so that all experiments can be fully reproduced.


Questions

Q1 Thank you for an inspiring question. We have pondered the same issue. Our sequential sampling may fit more strongly to the distribution of the latter part of the training sequence rather than the overall distribution. This is a topic that has been extensively researched in the field of continual learning, where data distribution changes over time. To address this, we added a regularization term to prevent the distribution shift and conducted experiments. We found that using well-known methods such as EWC[1], LwF[2], and L2 regularization, actually resulted in decreased performance on the test set. This indicates that the validation set, which follows the training sequence, plays a sufficient role in preventing such overfitting.

[1] Elastic Weight Consolidation, 2017

[2] Learning without Forgetting, 2017

Q2 We apologize for the insufficient explanation. To aid understanding, we have added detailed diagrams to the manuscript, which can be found in Fig. R1 of the attached PDF. We aimed to illustrate that BSA can be applied to arbitrary activations of the TSF model. For intermediate activations [E, F] of the model, SA can only be applied to a subset, namely F. E represents intermediate activations where BSA is not applied.

Q3 The notation we used is unconventional. We revised this as in the Fig. R1 in the attached PDF.

Q4 Thank you for your valuable comment. We excluded ILI dataset due to its short sequence length, which didn't suit our work on long-range dependencies beyond the look-back window. Following your advice, we conducted experiments and obtained the following results:

IllnessbaseBSA
MSEMAEMSEMAE
Dlinear2.84881.11092.79711.1135
RLinear2.63691.01332.60491.0050
FreTS5.07331.60304.93041.5907
TimesNet2.71880.94202.71960.9381
iTransformer2.45390.94692.54020.9571
Crossformer4.94561.51754.80941.4887
PatchTST2.01570.88041.97380.8656

BSA showed an average performance improvement of 1.0% in MSE and 0.6% in MAE, and achieved SOTA performance with an MSE of 1.97 and an MAE of 0.866 in the Patch TST model.

Q5 Thank you for pointing out this important aspect. To measure the computational cost of BSA, we conducted comprehensive experiments on the additional cost of peak memory usage, running time, and parameter numbers. The full results can be found in Table E2, E3, E4, and E5 in the global comment.

Table E5 Total Average Additional Cost of BSA in Percentage(%). We report the average value of Table E2, E3 and E4.

TimeMemoryNum_Param
Weather6.12128.68261.7774
PEMS030.03120.80812.3110

For the small weather dataset, we observed a 6.1% increase in running time, an 8.7% increase in memory usage, and a 1.8% increase in the number of parameters. In contrast, for the large PEMS dataset, there was only a 0.03% increase in running time, a 0.80% increase in memory usage, and a 2.3% increase in the number of parameters. This demonstrates that our module has minimal cost and excellent scalability.

评论

Thank you for your detailed response. I have decided to keep my score at 6.

评论

We appreciate your time and effort in reviewing our submission. We kindly request if you could please take a moment to review our rebuttal and provide any further feedback. Your insights are invaluable to us.

Thank you

评论

Thank you for being a reviewer for NeurIPS2024, your service is invaluable to the community!

The authors have submitted their feedback.

Could you check the rebuttal and other reviewers' comments and start a discussion with the authors and other reviewers?

Regards, Your AC

评论

Dear Reviewer A63Y,

We sincerely appreciate your valuable feedback and the time you dedicated to reviewing our work. Your insightful comments have been very helpful in enhancing the quality of our paper:

  • Improved the formulas in the paper for better clarity and also revised Figure 2 (as shown in the attached PDF)
  • Discussed whether the model overfits to the latter part of the training dataset with BSA.
  • Demonstrated BSA's constant superiority on the ILI dataset
  • Conducted extensive experiment on computational cost and memory, demonstraing the high efficiency of BSA

We are also pleased to demonstrate several important additional experimental results:

  • BSA consistently demonstrates high performance even when the original 96 lookback window is changed to 192 or 48 (reviewer weYb)
  • BSA also consistently showed high performance on three additional well-known datasets (PEMS03, Energy-Data, Solar) (global comment)

While we hope that our response properly addressed your concerns, we would be happy to provide any additional clarification if necessary. We also welcome any additional advice you may have that could strengthen our paper.

Sincerely, Authors

作者回复

We thank all the reviewers for their valuable comments and constructive suggestions to strengthen our work. Also, for the positive comments and encouraging remarks: The paper introduces a novel Spectral Attention mechanism, addressing the long-term dependence in time series prediction (A63Y). This method is designed as an easy plug-in module (weYb), that can be applied to various models regardless of the underlying architecture (A63Y, weYb). This methodology demonstrates high efficacy (pdEA). The proposed Spectral Attention mechanism led to significant improvements in forecasting performance across various real-world datasets (A63Y, pdEA). Moreover, this model exhibits versatility (pdEA). The model can be integrated into various baseline models, showcasing its adaptability and broad applicability(A63Y, pdEA, weYb). The paper validated through experiments that BSA consistently delivers performance improvements (A63Y, pdEA).


Literature reviews

We cited and discussed the following papers recommended by the reviewers.

SAAM[1] This paper proposes the Spectral Attention Autoregressive Model (SAAM), which can be applied as a plug-in to auto-regressive models (e.g., DeepAR) to improve the performance of time series forecasting. It is known that autoregressive models, due to their structure, are less capable of addressing long-range trends compared to structures such as Transformers. SAAM compensates for this by calculating the DFT and correlation matrix for the input signal. SAAM is used only with recurrent models and differs from our BSA in that it operates within the model's look-back window. We will include this paper in the "related work" section and describe its relevance and differences compared to our research.

SSM[2], S4[3] Existing recurrent models were computationally inefficient as they required forward passes and backpropagation through time proportional to the length of the input sequence due to their dependence on previous states. S4 enabled parallel computation by efficiently calculating the Linear State-Space Layer, which consists of matrix powers in the computational process of the discretized SSM. This allowed the model to consider long-range dependencies and can be seen as a form that takes advantage of both convolution and recurrent approaches.

Mamba[4] Traditional SSMs are linear time-invariant (LTI) models with finite hidden states for the entire sequence, which are computationally very efficient but failed in tasks such as Selective Copying. To improve the model's ability to understand sequence context, the authors introduced additional learnable parameters and proposed Mamba, a time-varying model.


New Figures (Attached PDF)

We revised the figure and elaborated on the caption to make it easier to understand the principles of SA and BSA modules.

SA module We provided detailed illustrations and explanations of (A) how the SA module is applied to the model in a plug-in manner, (B) how the SA module utilizes sequential input samples and momentum, and (C) how the SA module internally performs spectral attention from features and momentum and updates the momentum.

BSA module We revised Fig. 2 in the manuscript to make it clearer. We intuitively modified the sequential input of data and the flow of each component.


Experiments

Table E1 Additional Datasets Performance Evaluation. We report the average value of all prediction lengths. We can see performance increase in all datasets, especially significant on PEMS03 and Energy-Data datasets.

PEMS03Energy-DataSolar
DlinearbaseMSE0.43640.87370.3451
MAE0.50180.68290.4189
BSAMSE0.38450.83750.3112
MAE0.46430.66700.3949
RLinearbaseMSE0.99220.81630.3806
MAE0.74050.63020.3657
BSAMSE0.67430.79680.3462
MAE0.60690.62210.3488
FreTSbaseMSE0.25940.97640.2439
MAE0.35350.73490.2921
BSAMSE0.22890.93990.2423
MAE0.32360.71030.2899
iTransformerbaseMSE0.26180.83320.2550
MAE0.34530.63950.2763
BSAMSE0.19750.78590.2558
MAE0.29810.61890.2775

Table E2 Additional Time Cost of BSA in Percentage(%). We report the average value of all prediction lengths.

TimesNetiTransformerCrossformerPatchTST
Weather-0.003315.85505.50103.1320
PEMS030.21292.2388-2.0413-0.2854

Table E3 Additional Memory Cost of BSA in Percentage(%). We report the average value of all prediction lengths.

TimesNetiTransformerCrossformerPatchTST
Weather0.823732.30811.25340.3453
PEMS030.34122.34140.17080.3791

Table E4 Additional Parameter Cost of BSA in Percentage(%). We report the average value of all prediction lengths.

TimesNetiTransformerCrossformerPatchTST
weather1.17570.04185.73330.1587
PEMS030.15944.87781.96112.2458

Table E5 Total Average Additional Cost of BSA in Percentage(%). We report the average value of Table E2, E3 and E4.

TimeMemoryNum_Param
Weather6.12128.68261.7774
PEMS030.03120.80812.3110

References

[1] Moreno-Pino, Fernando, Pablo M. Olmos, and Antonio Artés-Rodríguez. "Deep autoregressive models with spectral attention." Pattern Recognition 133 (2023): 109014.

[2] Koller, Daphne, and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.

[3] Gu, Albert, Karan Goel, and Christopher Re. "Efficiently Modeling Long Sequences with Structured State Spaces." International Conference on Learning Representations (2022).

[4] Gu, Albert, and Tri Dao. "Mamba: Linear-time sequence modeling with selective state spaces." arXiv preprint arXiv:2312.00752 (2023).

最终决定

The paper received two Borderline Accepts and one Weak Accept after the rebuttal and discussion period. The reviewers stated that the proposed spectral attention mechanism is capable of resolving long-term dependence in time series forecasting and is designed as a simple plug-in module that can be applied to a variety of models regardless of the underlying architecture. The AC did not find any reason to overturn all the positive evaluations of the reviewers. Therefore, considering the submitted paper, the reviewers' comments, the authors' rebuttals, and the discussion, the AC considers it appropriate to rate this paper as acceptable. However, the AC recommends that the reviewers' comments be incorporated into the camera-ready version.