PaperHub
7.8
/10
Spotlight4 位审稿人
最低4最高5标准差0.4
5
5
5
4
3.3
置信度
创新性2.8
质量2.8
清晰度2.8
重要性2.5
NeurIPS 2025

Enhancing Time Series Forecasting through Selective Representation Spaces: A Patch Perspective

OpenReviewPDF
提交: 2025-04-07更新: 2025-11-30

摘要

关键词
Time Series Forecasting

评审与讨论

审稿意见
5

This work explores a Selective Representation Space (SRS) module, using Selective Patching and Dynamic Reassembly to adaptively select and shuffle patches for better context exploitation. The SRSNet, combining SRS and an MLP head, achieves state-of-the-art results on multi-domain datasets. As a plug-and-play module, SRS also boosts existing patch-based models.

优缺点分析

Strengths:

  1. The SRS module is a plugin which can enhance the time series forecasting, providing general improvement on patch-based models.

  2. The simple baseline SRSNet also demonstrates competitive performance against current strong baselines.

Weaknesses:

  1. Please elaborate the definition or concept of Selective Representation Space in more detail and clearly. What is the main difference between patches and representations?

  2. Though authors have considered the SRS as a simple baseline or as a plugin, the datasets used are not enough. The "ETT long term forecasting benchmark" is often criticized for its flaws such as limited domain coverage and the practice of forecasting at unreasonable horizons (e.g., 720 days into the future for exchange rate or oil temperature of a transformer at a specific hour months into the future). Every new model somehow beats this benchmark; however, there is still barely any absolute progress, only an illusion of it. Please refer to the talk (and paper) from Christoph Bergmeir [1, 2] where he discusses the limitation of this benchmark and current evaluation practices. A very recent position paper [3] also conducted a comprehensive evaluation of models on this benchmark showing that there's no obvious winner. One (not so difficult) way to improve the quality of evaluation is to include results on better benchmarks [4].

[1] https://neurips.cc/virtual/2024/workshop/84712#collapse108471

[2] Hewamalage, Hansika, Klaus Ackermann, and Christoph Bergmeir. "Forecast evaluation for data scientists: common pitfalls and best practices." Data Mining and Knowledge Discovery 37.2 (2023): 788-832.

[3] Brigato, Lorenzo, et al. "Position: There are no Champions in Long-Term Time Series Forecasting." arXiv preprint arXiv:2502.14045 (2025).

[4] Qiu X, Hu J, Zhou L, et al. "TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods." Proceedings of the VLDB Endowment, 2024, 17(9): 2363-2377.

  1. Why are the datasets used in Table 4 corresponding to Efficiency Analysis inconsistent with those in Table 2 and Table 3? Can you provide more details regarding efficiency?

  2. I'm still confused about the mechanism of gradient propagation. Please explain the details of formulas (3)-(5).

问题

See the Weaknesses

局限性

Yes

最终评判理由

The author's rebuttal, especially the additional experiments, addressed all of my concerns. Taking all the reviewers' comments into account, I believe this work is solid. I support its acceptance.

格式问题

No problem

作者回复

Reply to W1

Thanks for the question. In time series forecasting, all the patch-based models obey the calculation chain: Patching —> Embedding (preliminary representations) —> Modeling (senior representations) —> Forecasting (Dimensionality Reduction). Specifically, Patching technique does not directly produce representations, but it determines where the representations come from (the candidate patches). Therefore, we think that the Patching technique actually determines the structure of the representational space, thus affecting the representations. In terms of representations, we think the Embeddings are the preliminary representations directly produced from the patches, and the hidden representations across model layers are the senior ones.

Reply to W2

Thanks for the suggestion. To provide more empircal evidence, we conduct additional experiments on another 6 datasets from TFB. Their statistical information is as follows:

datasetsNASDAQNYSEAQShunyi
Shifting Values0.932 (Strong)0.620 (Modest)0.019 (Weak)
datasetsPEMS08ZafNooAQWan
Seasonality Values0.850 (Strong)0.757 (Modest)0.119 (Weak)

To comprehensively evaluate the capability of SRS, we make plugin experiments with all 5 models in our paper on the above 6 datasets, the mean results of 4 forecasting horizons are reported as follows:

DatasetsNASDAQNYSEAQShunyi
MetricsMSEMAEMSEMAEMSEMAE
PatchTST0.9720.7210.4830.4370.7030.507
+ SRS0.7570.5690.4180.3790.6410.461
Improved21.99%21.03%13.50%13.50%8.83%9.14%
Crossformer1.7370.9920.9830.9090.6940.504
+ SRS1.2560.7090.8540.7850.6350.455
Improved27.35%28.10%13.48%13.84%8.64%9.71%
PatchMLP1.3520.8100.4670.4400.7220.516
+ SRS1.0370.6060.4060.3800.6660.475
Improved23.31%25.30%13.77%13.82%7.78%8.04%
xPatch1.0600.7080.5450.4740.7350.509
+ SRS0.8500.5630.4690.4080.6780.464
Improved19.63%20.32%14.56%14.63%7.68%8.75%
MLP1.2000.8030.4630.4320.7170.513
+ SRS0.9250.6000.3950.3660.6620.467
Improved23.50%25.50%15.56%15.94%7.68%8.89%
DatasetsPEMS08ZafNooAQWan
MetricsMSEMAEMSEMAEMSEMAE
PatchTST0.3320.3220.5090.4540.8120.498
+ SRS0.3120.3000.4410.3860.6610.395
Improved6.02%6.89%13.29%15.08%18.49%20.70%
Crossformer0.2650.2820.4940.4560.7860.490
+ SRS0.2540.2670.4290.3870.5920.369
Improved4.23%5.30%13.02%15.05%24.86%24.74%
PatchMLP0.3060.2830.5310.4660.8290.505
+ SRS0.2940.2700.4570.3950.6370.388
Improved3.93%4.63%14.09%15.23%23.26%23.18%
xPatch0.2690.2420.5930.4790.8490.506
+ SRS0.2580.2320.5210.4170.7010.406
Improved3.61%4.11%12.19%12.87%17.37%19.65%
MLP0.3030.2780.5230.4550.8260.505
+ SRS0.2830.2590.4390.3710.6110.362
Improved6.59%6.62%16.18%18.46%26.11%28.44%

Based on the above results, we have two main observations:

  1. The SRS excels at datasets with high shifting values (NASDAQ) and low seasonality values (AQWan), because there exists more special samples about changeable periods, anomalies and shifting in these cases. The SRS can improve all patch-based models significantly by 17% — 28% on these datasets.
  2. When the datasets are closer to normal cases with higher seasonality values and lower shifting values, the performance improved by the SRS gradually decreases. On NYSE and ZafNoo, the performance is improved by 12% — 18%. Note that even on AQShunyi and PEMS08, the improvement still exists and is about 4% — 10%. This provides evidence that the SRS can handle both the normal and special cases well.

Reply to W3

Thanks for the suggestion. To provide more details regarding efficiency, we conduct additional experiments. Our experiments consist of two parts:

(1) The efficiency comparison between SRSNet and other baselines:

ETTm2Solar
MemoryTraining TimeMemoryTraining Time
DLinear5736.99881515.6563
Amplifer55510.8271514.3796
TimesNet2,41386.55313,1411812.4588
FEDformer917202.96853,751227.446
Stationary18,373147.15718,529156.269
Crossformer7,523118.320816,375205.602
PatchTST86717.6226,777137.60377
iTransformer57914.5451,01520.663125
TimeMixer1,09920.37820,602107.149
TimeKAN1,20727.10813,109326.378
SRSNet1,05916.2936,30156.149

(2) The complexity introduced by the SRS module:

DatasetsVariantsMemory (MB)Params (M)Inference Time (s/batch)Training Time (s/batch)MACs (G)
ETTh1PatchTST283711.2765.0765.13116.214
SRS290711.2985.7225.76316.905
Overhead2.47%0.19%12.73%12.31%4.26%
Crossformer401111.10127.50332.61356.280
SRS415911.11230.31135.27656.625
Overhead3.69%0.10%10.21%8.17%0.61%
SolarPatchTST27822.0814.42984.23188.714600.261
SRS29767.6814.45195.200101.981613.790
Overhead6.99%0.15%13.02%14.95%2.25%
Crossformer173550.71179.03182.47261.822
SRS189780.71786.67490.26862.174
Overhead9.35%0.88%9.67%9.45%0.57%

Based on the above tables, we have two main observations:

  1. The SRSNet makes accuracy and efficiency meet, though not enough efficient as Linear-based models, but is more efficient than most transformer-based baselines and achieves stronger performance.
  2. It is observed that the overhead introduced by the SRS is relatively trivial. The Memory, MACs, and Params increase little. And the Training Time and Inference Time are also controllable under the end-to-end supervised cases. And the performance can be consistently improved.

Reply to W4

Thanks for the question. The motivation we devise the mechanism is to keep the propagation of gradients in the Argmax operation in the equation (2) of the paper:

Ss=Scorers(P),Is=Argmax(Ss)\mathcal{S} ^s = \text{Scorer} ^s(\mathcal{P} ^\prime), \mathcal{I} ^s = \text{Argmax}(\mathcal{S} ^s)

Since the Argmax operation finds the patches with the max scores and return their indices, it disrupts the gradients. To avoid this phenomenon, we need to attach the gradients back to the selected patches. Note that the gradients are on the scores generated by the scorer, so we need to first retrieve the corresponding scores of the selected patches through equation (3):

Smaxs=Ss[Is],Sinvs=detach(1/Smaxs)\mathcal{S} ^s _{max} = \mathcal{S} ^s[\mathcal{I} ^s], \mathcal{S} ^s _{inv} = \text{detach}(1/\mathcal{S} _{max} ^s)

We then process the scores and normalize them to 1 values, while keeping the gradients on them. The detail is that we detach the reciprocal Sinvs\mathcal{S} ^s _{inv} to preserve the gradients.

Pmaxs=P[Is],Es=SmaxsSinvs\mathcal{P} _{max} ^s = \mathcal{P} ^\prime [\mathcal{I} ^s], E ^s = \mathcal{S} ^s _{max} \odot \mathcal{S} ^s _{inv} Finally, we attach the normalized scores to the patches through Hadamard product: P~maxs=PmaxsEs\tilde{\mathcal{P}} ^s _{\text{max}} = \mathcal{P} _{max} ^s \odot E ^s

Thanks for your sincere suggestions again! We will release the full results when the discussion phase starts. If you have any additional question, we can make further discussion!

评论

Dear Reviewer Gwnu, since the discussion phase has started, we have enough space to show our full experimental results. We first report the full results of plugin experiments here, including four kinds of forecasting horizon each dataset. We hope this can be treated as additional evidence. In this comment, we list the results of NASDAQ and NYSE.

DatasetsNASDAQNASDAQNASDAQNASDAQ
Horizons24364860
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
PatchTST0.6490.5670.8210.6821.1690.7931.2470.843
SRS0.4970.4440.6660.5710.8820.5800.9840.680
Improved23.40%21.73%18.93%16.22%24.53%26.80%21.10%19.37%
Crossformer1.1490.7451.4140.8852.1081.1362.2761.201
SRS0.8180.5451.0870.6701.5650.8251.5570.795
Improved28.85%26.84%23.15%24.33%25.78%27.41%31.61%33.82%
PatchMLP0.7940.6481.1850.7681.7740.9451.6530.878
SRS0.5840.4540.9730.6201.3830.7131.2100.636
Improved26.48%29.94%17.93%19.24%22.03%24.51%26.82%27.52%
xPatch0.5870.5360.8850.6661.2620.7861.5060.845
SRS0.4860.4490.7040.5140.9730.5881.2360.702
Improved17.22%16.31%20.45%22.81%22.91%25.25%17.95%16.92%
MLP0.8820.7271.1560.7961.2430.8191.5190.870
SRS0.6450.5120.9150.6220.8830.5631.2560.703
Improved26.91%29.64%20.85%21.86%28.95%31.25%17.29%19.25%
DatasetsNYSENYSENYSENYSE
Horizons24364860
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
PatchTST0.2260.2960.3800.3890.5750.4920.7490.572
SRS0.2020.2590.3160.3280.4790.4240.6760.504
Improved10.82%12.55%16.79%15.70%16.62%13.91%9.75%11.84%
Crossformer0.8200.8410.9420.9041.0490.9551.1210.937
SRS0.6720.6790.8050.7780.9350.8361.0040.846
Improved18.03%19.25%14.55%13.92%10.88%12.41%10.45%9.76%
PatchMLP0.2420.3170.3590.3810.5430.4850.7250.578
SRS0.2060.2740.2990.3230.4830.4200.6350.503
Improved14.85%13.56%16.72%15.25%11.05%13.44%12.45%13.04%
xPatch0.2050.2920.3640.3860.5290.4651.0820.752
SRS0.1700.2360.3020.3150.4770.4290.9270.653
Improved16.92%19.26%17.12%18.27%9.92%7.83%14.28%13.17%
MLP0.2080.2880.3530.3720.5170.4740.7750.593
SRS0.1750.2350.2820.2980.4430.4160.6800.515
Improved15.68%18.36%20.05%19.88%14.25%12.28%12.25%13.22%
评论

In this comment, we list the results of AQShunyi and PEMS08.

DatasetsAQShunyiAQShunyiAQShunyiAQShunyi
Horizons96192336720
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
PatchTST0.6460.4780.6880.4980.7100.5130.7680.539
SRS0.6020.4400.6200.4500.6320.4620.7100.491
Improved6.87%8.03%9.85%9.72%11.03%9.88%7.57%8.92%
Crossformer0.6520.4840.6740.4990.7040.5150.7470.518
SRS0.6000.4350.6150.4520.6120.4370.7120.498
Improved8.03%10.22%8.82%9.48%13.02%15.23%4.67%3.89%
PatchMLP0.6680.4920.7110.5110.7320.5240.7760.537
SRS0.6020.4410.6590.4660.6790.4880.7240.505
Improved9.82%10.47%7.35%8.84%7.22%6.94%6.72%5.92%
xPatch0.6740.4860.7150.5060.7410.5200.8080.523
SRS0.6420.4510.6310.4400.6800.4790.7600.487
Improved4.82%7.29%11.71%12.95%8.26%7.91%5.91%6.86%
MLP0.6740.4870.7040.5040.7230.5180.7680.542
SRS0.6290.4440.6690.4670.6420.4640.7080.494
Improved6.72%8.92%4.95%7.28%11.27%10.52%7.79%8.85%
DatasetsPEMS08PEMS08PEMS08PEMS08
Horizons96192336720
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
PatchTST0.2480.2870.3190.3100.3610.3240.3990.368
SRS0.2340.2740.2990.2880.3430.3050.3710.332
Improved5.82%4.68%6.29%7.22%4.86%5.82%7.12%9.82%
Crossformer0.2300.2600.2390.2640.2720.2890.3200.316
SRS0.2230.2520.2350.2570.2500.2590.3090.300
Improved3.26%2.97%1.87%2.74%8.22%10.28%3.55%5.21%
PatchMLP0.2030.2420.3140.2730.3340.2880.3730.329
SRS0.1970.2320.3080.2670.3110.2720.3580.307
Improved2.96%4.22%1.85%2.04%6.92%5.42%3.98%6.82%
xPatch0.1710.2140.2600.2330.3050.2460.3400.274
SRS0.1690.2100.2520.2260.2870.2310.3250.260
Improved1.02%2.05%3.20%2.88%5.92%6.22%4.28%5.29%
MLP0.1840.2410.3060.2680.3410.2810.3800.320
SRS0.1710.2260.2910.2540.3130.2580.3570.298
Improved7.22%6.03%4.99%5.21%8.11%8.35%6.02%6.87%
评论

In this comment, we list the results of ZafNoo and AQWan.

DatasetsZafNooZafNooZafNooZafNoo
Horizons96192336720
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
PatchTST0.4290.4050.4940.4490.5380.4750.5730.486
SRS0.3720.3370.4210.3820.4740.4130.4970.410
Improved13.22%16.67%14.82%14.99%11.82%12.96%13.28%15.68%
Crossformer0.4300.4180.4790.4490.5050.4640.5600.494
SRS0.3820.3760.3980.3600.4510.3980.4850.414
Improved11.06%9.95%16.92%19.72%10.67%14.29%13.42%16.25%
PatchMLP0.4430.4160.5150.4590.5630.4840.6030.505
SRS0.3930.3510.4050.3820.4910.4100.5370.438
Improved11.28%15.58%21.28%16.79%12.84%15.25%10.96%13.28%
xPatch0.5120.4360.5820.4740.6030.4870.6750.517
SRS0.4560.3810.4820.3830.5440.4320.6020.474
Improved10.88%12.68%17.25%19.25%9.79%11.26%10.85%8.29%
MLP0.4340.4030.4980.4410.5540.4710.6040.503
SRS0.3680.3350.3990.3420.4640.3800.5240.426
Improved15.22%16.83%19.96%22.46%16.26%19.28%13.26%15.27%
DatasetsAQWanAQWanAQWanAQWan
Horizons96192336720
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
PatchTST0.7450.4680.7930.4900.8190.5020.8900.533
SRS0.6110.3790.6330.3800.7140.4300.6850.388
Improved17.95%18.92%20.17%22.35%12.83%14.25%23.02%27.26%
Crossformer0.7500.4650.7620.4790.8020.5040.8290.512
SRS0.5330.3470.5300.3290.6480.4050.6560.395
Improved28.88%25.27%30.44%31.27%19.25%19.58%20.87%22.85%
PatchMLP0.7710.4840.8150.4990.8350.5100.8960.526
SRS0.5950.3650.5870.3650.6580.4140.7070.407
Improved22.86%24.54%27.92%26.87%21.20%18.75%21.04%22.54%
xPatch0.7750.4740.8240.4940.8570.5100.9380.546
SRS0.6570.3890.6490.3840.7230.4170.7760.436
Improved15.28%17.92%21.26%22.26%15.67%18.26%17.28%20.17%
MLP0.7670.4760.8060.4950.8310.5110.9010.538
SRS0.5270.3180.5810.3520.6740.3990.6640.378
Improved31.29%33.28%27.92%28.88%18.92%21.82%26.29%29.78%
评论

The author's rebuttal, especially the additional experiments, addressed all of my concerns. I will raise my score from 4 to 5 and support the acceptance of this paper.

评论

Dear Reviewer Gwnu, we are thrilled that our responses have effectively addressed your questions and comments. We would like to express our sincerest gratitude for taking the time to review our paper and provide us with such detailed feedback.

审稿意见
5

This work surfaces potential constraints with the standard approach of considering adjacent time series partitions as patches, which may be susceptible to including anomalies, misrepresenting distribution shifts, and irregular periodicity within the time series. The authors hypothesize that this inflexibility impairs the forecasting model's ability to learn truly representative spaces, which in turn impacts accuracy. In order to address this issue, the authors present a novel patching approach that (i) selects informative patches that can overlap and no longer be adjacent, and (ii) concatenates the partitions in a manner that no longer assumes them to maintain their original adjacency. Acknowledging the value of the original patching approach, the authors then introduce a fusion module that combines adjacent patches with those resulting from the selective patching/dynamic reassembly module described in this work, along with position embeddings. Experiments against competing techniques indicate nearly consistent performance improvements across all datasets, while ablation studies teasing apart the various components included in the final architecture demonstrate that these all complement each other.

优缺点分析

The paper makes an incremental yet meaningful contribution to the literature on patch-based forecasting models, and I appreciated the clarity with which the authors presented the various ways in which standard patching strategies are inflexible to potential anomalies or distribution shifts (both of which are highly prevalent in real-world datasets). The diagrams are especially effective for communicating the described techniques, and there is an appropriate level of technical detail throughout the main paper. I also appreciated the variety of competing techniques evaluated in the experiments section, along with the various datasets. The ablation study validating the incremental value of each component, along with the analysis on computational and memory complexity, make the paper even more complete.

One aspect I found lacking was more insight into where this approach may not have the desired effect, or the extent to which the weight parameter in Eq. 13 tends to settle on across different datasets after training. Including a dedicated "Discussion and Limitations" section to go before the Conclusion section could address this current gap, although these insights could also be integrated within the methodology and experiments sections themselves.

问题

See note in section above on adding more insights/examples of where the approach may not be as beneficial as adjacent partitioning

局限性

N/A

最终评判理由

The paper is well-written, with clearly-motivated and illustrated contributions as well as extensive experiments. The authors have demonstrated they are committed to incorporate the feedback that emerged in the review period, giving me confidence in supporting the paper for acceptance.

格式问题

N/A

作者回复

Thanks for your in-depth insights and suggestions. Through Adaptive Fusion, the SRS can integrate both the representations from Adjacent Patching and Selective Patching with learnable weights, thus can combine the advantages of both methods.

As you mentioned, the weight parameter in Eq. 13 can actually reflect which scenarios the two techniques separately excel at. We manage to study this by devising experiments on specific datasets and report the weight values. Intuitively, the Selective Patching may excel at datasets with shifting phenomenon and the Adjacent Patching may excel at datasets with strong seasonality. So that we conduct additional experiments on datasets with varying values of shifting and seasonality from TFB, a well-recognized time series forecasting benchmark. Specifically, we choose NASDAQ, NYSE, and AQShunyi with descending shifting values in TFB, and choose PEMS08, ZafNoo, and AQWan with descending seasonality values in TFB. The statistical details of them are as follows:

datasetsNASDAQNYSEAQShunyi
Shifting Values0.932 (Strong)0.620 (Modest)0.019 (Weak)
datasetsPEMS08ZafNooAQWan
Seasonality Values0.850 (Strong)0.757 (Modest)0.119 (Weak)

We conduct comprehensive plugin experiments on them, reporting the average MSE and MAE metrics, and how the weight values converged.

DatasetsNASDAQNYSEAQShunyi
MetricsMSEMAEMSEMAEMSEMAE
PatchTST0.9720.7210.4830.4370.7030.507
+ SRS0.7570.5690.4180.3790.6410.461
Improved21.99%21.03%13.50%13.50%8.83%9.14%
Crossformer1.7370.9920.9830.9090.6940.504
+ SRS1.2560.7090.8540.7850.6350.455
Improved27.35%28.10%13.48%13.84%8.64%9.71%
PatchMLP1.3520.8100.4670.4400.7220.516
+ SRS1.0370.6060.4060.3800.6660.475
Improved23.31%25.30%13.77%13.82%7.78%8.04%
xPatch1.0600.7080.5450.4740.7350.509
+ SRS0.8500.5630.4690.4080.6780.464
Improved19.63%20.32%14.56%14.63%7.68%8.75%
MLP1.2000.8030.4630.4320.7170.513
+ SRS0.9250.6000.3950.3660.6620.467
Improved23.50%25.50%15.56%15.94%7.68%8.89%
DatasetsPEMS08ZafNooAQWan
MetricsMSEMAEMSEMAEMSEMAE
PatchTST0.3320.3220.5090.4540.8120.498
+ SRS0.3120.3000.4410.3860.6610.395
Improved6.02%6.89%13.29%15.08%18.49%20.70%
Crossformer0.2650.2820.4940.4560.7860.490
+ SRS0.2540.2670.4290.3870.5920.369
Improved4.23%5.30%13.02%15.05%24.86%24.74%
PatchMLP0.3060.2830.5310.4660.8290.505
+ SRS0.2940.2700.4570.3950.6370.388
Improved3.93%4.63%14.09%15.23%23.26%23.18%
xPatch0.2690.2420.5930.4790.8490.506
+ SRS0.2580.2320.5210.4170.7010.406
Improved3.61%4.11%12.19%12.87%17.37%19.65%
MLP0.3030.2780.5230.4550.8260.505
+ SRS0.2830.2590.4390.3710.6110.362
Improved6.59%6.62%16.18%18.46%26.11%28.44%

Based on the above results, we have two main observations:

  1. The SRS excels at datasets with high shifting values (NASDAQ) and low seasonality values (AQWan), because there exists more special samples about changeable periods, anomalies and shifting in these cases. The SRS can improve all patch-based models significantly by 17% — 28% on these datasets.
  2. When the datasets are closer to normal cases with higher seasonality values and lower shifting values, the performance improved by the SRS gradually decreases. On NYSE and ZafNoo, the performance is improved by 12% — 18%. Note that even on AQShunyi and PEMS08, the improvement still exists and is about 4% — 10%. This provides evidence that the SRS can handle both the normal and special cases well.

We then report the converged mean weight values (1α)[0,1](1-\alpha) \in [0, 1] on them. (higher values indicate more dependence on representations from Selective Patching, lower values indicate more dependence on representations from Adjacent Patching).

DatasetsNASDAQNYSEAQShunyiPEMS08ZafNooAQWan
PatchTST0.790.580.420.330.560.69
Crossformer0.760.550.340.320.580.73
PatchMLP0.720.570.380.290.610.74
xPatch0.810.630.450.280.550.72
MLP0.790.620.480.360.570.67

We also have two main observations:

  1. When the seasonality values are higher and the shifting values are lower, models have less dependence on representations from the Selective Patching. This implies that the Adjacent Patching may well handle these common cases or the Selective Patching is gradually converged to the Adjacent Patching.
  2. When the seasonality values are lower and the shifting values are higher, models have more dependence on representations from the Selective Patching. It is because Adjacent Patching may not well process these special cases, while our proposed Selective Patching can adaptively select the patches with more useful information for forecasting.

Thanks for your sincere suggestions again! We will release the full results when the discussion phase starts. If you have any additional question, we can make further discussion!

评论

Dear Reviewer 7m7L, since the discussion phase has started, the space is enough to list the full results. We first report the full weights 1α1-\alpha of all 4 forecasting horizons for each dataset.

DatasetsHorizonsNASDAQNYSEAQShunyiPEMS08ZafNooAQWan
PatchTST96/240.790.570.420.310.560.68
192/360.770.580.420.360.580.68
336/480.810.600.420.290.550.71
720/600.780.570.430.320.540.70
Crossformer96/240.750.550.320.330.590.75
192/360.760.560.360.330.600.73
336/480.740.560.350.300.570.72
720/600.780.540.340.320.570.73
PatchMLP96/240.730.570.380.290.580.73
192/360.720.550.380.310.630.75
336/480.720.580.380.290.600.77
720/600.720.570.370.280.590.72
xPatch96/240.800.620.430.280.540.70
192/360.800.610.460.270.540.77
336/480.810.650.440.270.560.69
720/600.810.640.420.310.530.71
MLP96/240.790.640.500.360.580.67
192/360.800.630.490.380.580.66
336/480.780.640.470.370.560.69
720/600.810.550.480.350.570.68
评论

Dear Reviewer 7m7L, we report the full results of plugin experiments here, including four kinds of forecasting horizon each dataset. We hope this can be treated as additional evidence. In this comment, we list the results of NASDAQ and NYSE.

DatasetsNASDAQNASDAQNASDAQNASDAQ
Horizons24364860
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
PatchTST0.6490.5670.8210.6821.1690.7931.2470.843
SRS0.4970.4440.6660.5710.8820.5800.9840.680
Improved23.40%21.73%18.93%16.22%24.53%26.80%21.10%19.37%
Crossformer1.1490.7451.4140.8852.1081.1362.2761.201
SRS0.8180.5451.0870.6701.5650.8251.5570.795
Improved28.85%26.84%23.15%24.33%25.78%27.41%31.61%33.82%
PatchMLP0.7940.6481.1850.7681.7740.9451.6530.878
SRS0.5840.4540.9730.6201.3830.7131.2100.636
Improved26.48%29.94%17.93%19.24%22.03%24.51%26.82%27.52%
xPatch0.5870.5360.8850.6661.2620.7861.5060.845
SRS0.4860.4490.7040.5140.9730.5881.2360.702
Improved17.22%16.31%20.45%22.81%22.91%25.25%17.95%16.92%
MLP0.8820.7271.1560.7961.2430.8191.5190.870
SRS0.6450.5120.9150.6220.8830.5631.2560.703
Improved26.91%29.64%20.85%21.86%28.95%31.25%17.29%19.25%
DatasetsNYSENYSENYSENYSE
Horizons24364860
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
PatchTST0.2260.2960.3800.3890.5750.4920.7490.572
SRS0.2020.2590.3160.3280.4790.4240.6760.504
Improved10.82%12.55%16.79%15.70%16.62%13.91%9.75%11.84%
Crossformer0.8200.8410.9420.9041.0490.9551.1210.937
SRS0.6720.6790.8050.7780.9350.8361.0040.846
Improved18.03%19.25%14.55%13.92%10.88%12.41%10.45%9.76%
PatchMLP0.2420.3170.3590.3810.5430.4850.7250.578
SRS0.2060.2740.2990.3230.4830.4200.6350.503
Improved14.85%13.56%16.72%15.25%11.05%13.44%12.45%13.04%
xPatch0.2050.2920.3640.3860.5290.4651.0820.752
SRS0.1700.2360.3020.3150.4770.4290.9270.653
Improved16.92%19.26%17.12%18.27%9.92%7.83%14.28%13.17%
MLP0.2080.2880.3530.3720.5170.4740.7750.593
SRS0.1750.2350.2820.2980.4430.4160.6800.515
Improved15.68%18.36%20.05%19.88%14.25%12.28%12.25%13.22%
评论

In this comment, we list the results of AQShunyi and PEMS08.

DatasetsAQShunyiAQShunyiAQShunyiAQShunyi
Horizons96192336720
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
PatchTST0.6460.4780.6880.4980.7100.5130.7680.539
SRS0.6020.4400.6200.4500.6320.4620.7100.491
Improved6.87%8.03%9.85%9.72%11.03%9.88%7.57%8.92%
Crossformer0.6520.4840.6740.4990.7040.5150.7470.518
SRS0.6000.4350.6150.4520.6120.4370.7120.498
Improved8.03%10.22%8.82%9.48%13.02%15.23%4.67%3.89%
PatchMLP0.6680.4920.7110.5110.7320.5240.7760.537
SRS0.6020.4410.6590.4660.6790.4880.7240.505
Improved9.82%10.47%7.35%8.84%7.22%6.94%6.72%5.92%
xPatch0.6740.4860.7150.5060.7410.5200.8080.523
SRS0.6420.4510.6310.4400.6800.4790.7600.487
Improved4.82%7.29%11.71%12.95%8.26%7.91%5.91%6.86%
MLP0.6740.4870.7040.5040.7230.5180.7680.542
SRS0.6290.4440.6690.4670.6420.4640.7080.494
Improved6.72%8.92%4.95%7.28%11.27%10.52%7.79%8.85%
DatasetsPEMS08PEMS08PEMS08PEMS08
Horizons96192336720
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
PatchTST0.2480.2870.3190.3100.3610.3240.3990.368
SRS0.2340.2740.2990.2880.3430.3050.3710.332
Improved5.82%4.68%6.29%7.22%4.86%5.82%7.12%9.82%
Crossformer0.2300.2600.2390.2640.2720.2890.3200.316
SRS0.2230.2520.2350.2570.2500.2590.3090.300
Improved3.26%2.97%1.87%2.74%8.22%10.28%3.55%5.21%
PatchMLP0.2030.2420.3140.2730.3340.2880.3730.329
SRS0.1970.2320.3080.2670.3110.2720.3580.307
Improved2.96%4.22%1.85%2.04%6.92%5.42%3.98%6.82%
xPatch0.1710.2140.2600.2330.3050.2460.3400.274
SRS0.1690.2100.2520.2260.2870.2310.3250.260
Improved1.02%2.05%3.20%2.88%5.92%6.22%4.28%5.29%
MLP0.1840.2410.3060.2680.3410.2810.3800.320
SRS0.1710.2260.2910.2540.3130.2580.3570.298
Improved7.22%6.03%4.99%5.21%8.11%8.35%6.02%6.87%
评论

In this comment, we list the results of ZafNoo and AQWan.

DatasetsZafNooZafNooZafNooZafNoo
Horizons96192336720
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
PatchTST0.4290.4050.4940.4490.5380.4750.5730.486
SRS0.3720.3370.4210.3820.4740.4130.4970.410
Improved13.22%16.67%14.82%14.99%11.82%12.96%13.28%15.68%
Crossformer0.4300.4180.4790.4490.5050.4640.5600.494
SRS0.3820.3760.3980.3600.4510.3980.4850.414
Improved11.06%9.95%16.92%19.72%10.67%14.29%13.42%16.25%
PatchMLP0.4430.4160.5150.4590.5630.4840.6030.505
SRS0.3930.3510.4050.3820.4910.4100.5370.438
Improved11.28%15.58%21.28%16.79%12.84%15.25%10.96%13.28%
xPatch0.5120.4360.5820.4740.6030.4870.6750.517
SRS0.4560.3810.4820.3830.5440.4320.6020.474
Improved10.88%12.68%17.25%19.25%9.79%11.26%10.85%8.29%
MLP0.4340.4030.4980.4410.5540.4710.6040.503
SRS0.3680.3350.3990.3420.4640.3800.5240.426
Improved15.22%16.83%19.96%22.46%16.26%19.28%13.26%15.27%
DatasetsAQWanAQWanAQWanAQWan
Horizons96192336720
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
PatchTST0.7450.4680.7930.4900.8190.5020.8900.533
SRS0.6110.3790.6330.3800.7140.4300.6850.388
Improved17.95%18.92%20.17%22.35%12.83%14.25%23.02%27.26%
Crossformer0.7500.4650.7620.4790.8020.5040.8290.512
SRS0.5330.3470.5300.3290.6480.4050.6560.395
Improved28.88%25.27%30.44%31.27%19.25%19.58%20.87%22.85%
PatchMLP0.7710.4840.8150.4990.8350.5100.8960.526
SRS0.5950.3650.5870.3650.6580.4140.7070.407
Improved22.86%24.54%27.92%26.87%21.20%18.75%21.04%22.54%
xPatch0.7750.4740.8240.4940.8570.5100.9380.546
SRS0.6570.3890.6490.3840.7230.4170.7760.436
Improved15.28%17.92%21.26%22.26%15.67%18.26%17.28%20.17%
MLP0.7670.4760.8060.4950.8310.5110.9010.538
SRS0.5270.3180.5810.3520.6740.3990.6640.378
Improved31.29%33.28%27.92%28.88%18.92%21.82%26.29%29.78%
评论

Thank you for preparing the rebuttal and conducting additional experiments during this period.

After reading the other reviews and associated rebuttals, I maintain my positive view of the paper. As per my original suggestion, I would suggest including a dedicated section in the updated version of the paper where these deeper insights into how the method performs under different scenarios are more clearly illustrated.

评论

Dear Reviewer 7m7L, thank you very much for your positive and encouraging feedbacks! Following your advice, we have carefully integrated all the clarifications and supporting results into the revised manuscript. Thank you once again for your valuable time and guidance, which have been instrumental in improving our paper.

评论

Dear Reviewer 7m7L,

Please help to go through the rebuttal and participate in discussions with authors. Thank you!

Best regards, AC

审稿意见
5

This paper proposes a plug-and-play module named SRS for time series forecasting. SRS considers the selection and sequence of patches, which can construct representation spaces with richer semantics, thus enhancing the performance of time series forecasting. The authors provide comprehensive experiments to demonstrate the effectiveness of SRSNet, and also demonstrate the capability of SRS to work as a plugin on patch-based models.

优缺点分析

Strengths: S1. The SRS module provides a novel perspective to consider the representation learning process in time series forecasting. Through introducing the Selective Patching and Dynamic Reassembly techniques, SRS can constructs selective patches and consider the appropriate sequence of them. S2. The SRS module devises a gradient-based mechanism to ensure the efficient patch selections and sequence decisions, with lightweight MLP structures. S3. Comprehensive experiments are conduct to demonstrate the effectiveness of the SRSNet and SRS module. Weaknesses: W1. Why should authors retain the representations generated by the Conventional Adjacent Patching instead of only using the representations constructed by the SRS module? Please provide reasonable explanations. W2. Explain in detail the specific meaning of the matrix below Selective Patching in Figure 3. What is the main difference from the mechanism in Dynamic Reassembly? W3. Please ensure that all citations are of the officially accepted versions.

问题

Please see W1-W3.

局限性

Yes

最终评判理由

The article is particularly good. rebuttal perfectly solved my problem. I suggest accepting this article.

格式问题

N/A

作者回复

Reply to W1:

Thanks for the question. There are two main reasons we retain the representations generated by the Conventional Adjacent Patching:

(1) Since the SRS is a trainable plugin, the scorers may have some randomness in the early stage of training, so it may not accurately find suitable patches. Therefore, retaining the representations generated by the Conventional Adjacent Patching and performing adaptive fusion can effectively enhance the stability of the training. As the training continues, the ability of the scorer gradually improves, enabling it to select better patches and increase the weight during fusion, thereby improving the end-to-end prediction performance.

(2) The SRS is proposed for some special cases such as changeable periods, anomalies, and shifting, where the Selective Patching and Dynamic Reassembly can help retrieve useful information in the contextual time series. In common scenarios where the sample is relatively normal, the Conventional Adjacent Patching can also well handle so that we retain the representations generated by it. Through adaptively fusing the representations, SRS can handle both the common and special cases.

Reply to W2:

Thanks for the question. We first explain the details of the matrix of Selective Patching. The shape of the matrix is RK×n\mathbb{R}^{K\times n} , where KK denotes the number of all candidate patches (generated by stride=1), and nn denotes the number of patches we need to pick up from all KK ones. Since the Selective Patching supports selection with replacement, which means one patch can be selected multiple times, so we make Scorers\text{Scorer}^s score nn scores for each patch, thus forming the matrix with the shape of RK×n\mathbb{R}^{K\times n} . To select the nn appropriate patches, we only need to check the scores of all KK patches each time and pick the highest ones.

We then shed light to the main differences between Selective Patching and Dynamic Reassembly. Different from Selective Patching, the nature of Dynamic Reassembly is to sort the selected patches and then reassemble them in the determined order, so that the scores are not formed like a list instead of a matrix.

Reply to W3:

Thanks for your remind, we will fix these citations.

Thanks for your sincere suggestions again! If you have any additional question, we can make further discussion!

评论

Thank you for the detailed response. Your clarifications on the motivations behind retaining the Conventional Adjacent Patching representations and the distinctions between Selective Patching and Dynamic Reassembly fully address my concerns. I will maintain my positive score.

评论

Dear Reviewer Hryh, we are very excited that you maintain the positive attitude! Thank you again for your recognition of our work!

审稿意见
4

This paper proposes the Selective Representation Space (SRS) module, a plug-and-play component for enhancing patch-based time series forecasting models. Unlike conventional adjacent patching that divides time series into fixed adjacent patches, SRS introduces adaptive patch selection through two key techniques: Selective Patching (which adaptively selects the most informative patches) and Dynamic Reassembly (which determines the optimal ordering of selected patches). The authors demonstrate the effectiveness of their approach through SRSNet, a simple model combining SRS with an MLP head, which achieves state-of-the-art performance on multiple benchmarks.

优缺点分析

Strengths: (1) The paper identifies a genuine limitation in existing patch-based methods - the fixed nature of adjacent patching - and proposes an innovative solution through adaptive patch selection. (2) The gradient-friendly implementation of Selective Patching and Dynamic Reassembly through clever use of score-based selection and gradient propagation techniques is well-designed. (3) The paper provides extensive experimental validation across 8 datasets from multiple domains, with thorough ablation studies demonstrating the effectiveness of each component.

Weaknesses: (1) While the motivation for selective patching is intuitive, the paper lacks theoretical analysis of why and when selective patching should outperform adjacent patching. The claim that "all information useful for forecasting is evenly distributed" is not necessarily true for adjacent patching. (2) Although consistent, the performance improvements are relatively modest (typically 2-10%) compared to existing methods. For a NeurIPS paper, more substantial gains would be expected. (3) The additional complexity introduced by the scoring networks and dynamic reassembly may not be justified by the marginal performance gains in many cases.

问题

(1) How sensitive is the method to the initialization of the scoring networks? Have you experimented with different initialization strategies? (2) Can you provide more analysis on the computational overhead introduced by the patch selection process during training and inference? (3) How does the method perform on datasets with strong seasonal patterns where adjacent patching might naturally align with the periodicity? (4) Could you provide more detailed analysis of the types of patches that are typically selected across different datasets and forecasting horizons?

局限性

Yes.

最终评判理由

Based on the positive feedback from other reviewers and the author's meaningful responses, I have decided to raise my score.

格式问题

NO.

作者回复

Reply to W1

Thanks for your in-depth insights and suggestions. We admit that some expressions are not quite appropriate, and we will modify them later.

We first provide some intuitions. Assuming the contextual time series is a continuous interval [1, T] covering T points, the useful informative intervals in normal cases (like strong seasonality) are also the [1, T]. Adjacent Patching can well handle these normal cases. In our work, we mainly consider the cases that not all of the contextual time series is useful for forecasting, such as anomalies, changeable periods, and shifting. In these cases, the contextual time series are Interrupted as [1,T1][T1,T2],,[Tk1,T][1, T_1] \cup [T_1, T_2], \cdots, \cup [T_{k-1}, T], containing kk informative sub-intervals. If the kk sub-intervals cover x% of the contextual time series, only x% of representations created by Adjacent Patching are useful for forecasting. In contrast, our proposed Selective Patching can choose the useful patches in the kk sub-intervals, and support re-selection to amplify useful patches. The Dynamic Reassembly then introduces positional information for better representational learning. We also fuse the representations from both the Selective Patching and Adjacent Patching, to ensure the ratio of useful information is greater than x%, thus handling both the normal and special cases.

To provide more empircal evidence about when the SRS outperforms the Adjacent Patching, we conduct additional experiments on datasets from TFB, a well-recognized time series forecasting benchmark. Since datasets with strong seasonality values are close to normal cases while those with strong shifting values are special ones, we choose two groups of datasets separately with descending seasonality values and shifting values. The higher the values, the stronger seasonality/shifting the time series data have. We list the statistical information:

datasetsNASDAQNYSEAQShunyi
Shifting Values0.932 (Strong)0.620 (Modest)0.019 (Weak)
datasetsPEMS08ZafNooAQWan
Seasonality Values0.850 (Strong)0.757 (Modest)0.119 (Weak)

We conduct plugin experiments on these datasets and report the mean results:

DatasetsNASDAQNYSEAQShunyi
MetricsMSEMAEMSEMAEMSEMAE
PatchTST0.9720.7210.4830.4370.7030.507
+ SRS0.7570.5690.4180.3790.6410.461
Improved21.99%21.03%13.50%13.50%8.83%9.14%
Crossformer1.7370.9920.9830.9090.6940.504
+ SRS1.2560.7090.8540.7850.6350.455
Improved27.35%28.10%13.48%13.84%8.64%9.71%
PatchMLP1.3520.8100.4670.4400.7220.516
+ SRS1.0370.6060.4060.3800.6660.475
Improved23.31%25.30%13.77%13.82%7.78%8.04%
xPatch1.0600.7080.5450.4740.7350.509
+ SRS0.8500.5630.4690.4080.6780.464
Improved19.63%20.32%14.56%14.63%7.68%8.75%
MLP1.2000.8030.4630.4320.7170.513
+ SRS0.9250.6000.3950.3660.6620.467
Improved23.50%25.50%15.56%15.94%7.68%8.89%
DatasetsPEMS08ZafNooAQWan
MetricsMSEMAEMSEMAEMSEMAE
PatchTST0.3320.3220.5090.4540.8120.498
+ SRS0.3120.3000.4410.3860.6610.395
Improved6.02%6.89%13.29%15.08%18.49%20.70%
Crossformer0.2650.2820.4940.4560.7860.490
+ SRS0.2540.2670.4290.3870.5920.369
Improved4.23%5.30%13.02%15.05%24.86%24.74%
PatchMLP0.3060.2830.5310.4660.8290.505
+ SRS0.2940.2700.4570.3950.6370.388
Improved3.93%4.63%14.09%15.23%23.26%23.18%
xPatch0.2690.2420.5930.4790.8490.506
+ SRS0.2580.2320.5210.4170.7010.406
Improved3.61%4.11%12.19%12.87%17.37%19.65%
MLP0.3030.2780.5230.4550.8260.505
+ SRS0.2830.2590.4390.3710.6110.362
Improved6.59%6.62%16.18%18.46%26.11%28.44%

We have two main observations:

  1. The SRS excels at datasets with high shifting values (NASDAQ) and low seasonality values (AQWan), because there exists more special samples about changeable periods, anomalies and shifting in these cases. The SRS improves all models significantly by 17% — 28% on these datasets.
  2. When the datasets are closer to normal cases with higher seasonality values and lower shifting values, the performance improved by the SRS gradually decreases. On NYSE and ZafNoo, the performance is improved by 12% — 18%. Note that even on AQShunyi and PEMS08, the improvement still exists and is about 4% — 10%. This demonstrates that the SRS excels at both the normal and special cases.

Reply to W2

Since the SRS is proposed for special cases like changeable periods, anomalies, and shifting, we think the proportion of special samples in the dataset will affect the performance. In the Reply to W1, we make additional experiments on datasets having more special samples (NASDAQ, AQWan, NYSE, ZafNoo), and find the improvement of SRS can achieve 12% — 28% (quite non-trivial). We hope this can be considered as additional evidence.

Reply to Q3

As you mentioned, on datasets with strong seasonality, the Adjacent Patching can well handle the information in contextual time series and form useful representations. In the reply to W1, our experiments reveal the phenomenon that the improvement of SRS decreases as the seasonality increases, while the improvement still exists and is about 4% — 10%. It is because the Dynamic Reassembly still works and reassembles the patches for better positional information, and the Adaptive Fusion can adaptively integrate the representations, which ensures the improvement even in normal cases.

Reply to W3 & Q2

We make an analysis on the overhead introduced by the SRS on above 6 datasets. Due to the space limitation, we report the results on PEMS08 (biggest) and NYSE (smallest) here, and will release full results when the discussion phase starts. Specifically, we report the Number of Parameters, Max GPU Memory, MACs, Training Time, and Inference Time:

DatasetsVariantsMemory (MB)Params (M)Inference Time (s/batch)Training Time (s/batch)MACs (G)
NYSEPatchTST9000.7166.3927.9412.490
SRS9280.7207.2468.7842.524
Overhead3.11%0.64%13.36%10.63%1.36%
Crossformer6540.12919.86221.5690.251
SRS6780.13020.56822.1510.258
Overhead3.67%0.57%3.55%2.70%2.67%
PEMS08PatchTST221612.6727.6737.80225.653
SRS252212.6828.7668.93426.584
Overhead13.81%0.08%14.25%14.52%3.63%
Crossformer1041411.08545.10045.320161.376
SRS1069811.09350.37247.478162.014
Overhead2.73%0.07%11.69%4.76%0.40%

It is observed that the overhead introduced by the SRS is relatively trivial. The Memory, MACs, and Params increase little. And the Training Time and Inference Time are also controllable.

Reply to Q1

We further study on the initialization strategies, including Constant, Xaiver, Kaiming, and Randn. We also report the standard deviation of 5 rounds:

DatasetsETTh1ETTm2SolarTraffic
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
Constant0.413±0.0020.435±0.0020.259±0.0010.324±0.0010.187±0.0020.243±0.0010.392±0.0010.271±0.001
Xaiver0.405±0.0010.425±0.0010.252±0.0010.315±0.0010.184±0.0020.241±0.0020.394±0.0010.272±0.001
Kaiming0.405±0.0010.424±0.0010.254±0.0020.316±0.0020.183±0.0020.238±0.0020.393±0.0010.271±0.001
Randn0.404±0.0010.424±0.0010.252±0.0010.314±0.0010.183±0.0020.239±0.0020.392±0.0010.270±0.001

The results show that random-based initialization strategies such like Xaiver, Kaiming, and Randn (ours) are relatively stable. The Constant initialization strategy causes unstable training in some cases.

Reply to Q4

We have provided some showcases in Appendix A.2 of the paper. For clarity, we label the sub-figures with (a)-(f) in the order from left to right and from top to bottom.

Figure 5 (d), (e), (f), and Figure 7 (c) show the selected patches under changeable periods, which accurately picks up the correct periods most related to the forecasting horizons.

Figure 6 (a), (b), and Figure 7 (a), (e), (f) show the selected patches when anomalies occur, which avoid including the abnormal patches.

Figure 5 (b), (c), and Figure 6 (c), (e), (f) show the selected patches when shifting occurs, which selects the in-distribution patches for forecasting.

We also showcase some normal cases such as Figure 5 (a), Figure 6 (d), and Figure 7 (d), where the selected patches are similar to those produced by adjacent patching, demonstrating the flexibility of the Selective Patching.

Thanks for your valuable suggestions again! We will release the full results when the discussion phase starts. If you have any additional question, we can make further discussion!

评论

Dear Reviewer xNzv, since the discussion phase has started, we have enough space to show our full experimental results. We first report the full results of plugin experiments here, including four kinds of forecasting horizon each dataset. We hope this can be treated as additional evidence. In this comment, we list the results of NASDAQ and NYSE.

DatasetsNASDAQNASDAQNASDAQNASDAQ
Horizons24364860
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
PatchTST0.6490.5670.8210.6821.1690.7931.2470.843
SRS0.4970.4440.6660.5710.8820.5800.9840.680
Improved23.40%21.73%18.93%16.22%24.53%26.80%21.10%19.37%
Crossformer1.1490.7451.4140.8852.1081.1362.2761.201
SRS0.8180.5451.0870.6701.5650.8251.5570.795
Improved28.85%26.84%23.15%24.33%25.78%27.41%31.61%33.82%
PatchMLP0.7940.6481.1850.7681.7740.9451.6530.878
SRS0.5840.4540.9730.6201.3830.7131.2100.636
Improved26.48%29.94%17.93%19.24%22.03%24.51%26.82%27.52%
xPatch0.5870.5360.8850.6661.2620.7861.5060.845
SRS0.4860.4490.7040.5140.9730.5881.2360.702
Improved17.22%16.31%20.45%22.81%22.91%25.25%17.95%16.92%
MLP0.8820.7271.1560.7961.2430.8191.5190.870
SRS0.6450.5120.9150.6220.8830.5631.2560.703
Improved26.91%29.64%20.85%21.86%28.95%31.25%17.29%19.25%
DatasetsNYSENYSENYSENYSE
Horizons24364860
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
PatchTST0.2260.2960.3800.3890.5750.4920.7490.572
SRS0.2020.2590.3160.3280.4790.4240.6760.504
Improved10.82%12.55%16.79%15.70%16.62%13.91%9.75%11.84%
Crossformer0.8200.8410.9420.9041.0490.9551.1210.937
SRS0.6720.6790.8050.7780.9350.8361.0040.846
Improved18.03%19.25%14.55%13.92%10.88%12.41%10.45%9.76%
PatchMLP0.2420.3170.3590.3810.5430.4850.7250.578
SRS0.2060.2740.2990.3230.4830.4200.6350.503
Improved14.85%13.56%16.72%15.25%11.05%13.44%12.45%13.04%
xPatch0.2050.2920.3640.3860.5290.4651.0820.752
SRS0.1700.2360.3020.3150.4770.4290.9270.653
Improved16.92%19.26%17.12%18.27%9.92%7.83%14.28%13.17%
MLP0.2080.2880.3530.3720.5170.4740.7750.593
SRS0.1750.2350.2820.2980.4430.4160.6800.515
Improved15.68%18.36%20.05%19.88%14.25%12.28%12.25%13.22%
评论

In this comment, we list the results of AQShunyi and PEMS08.

DatasetsAQShunyiAQShunyiAQShunyiAQShunyi
Horizons96192336720
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
PatchTST0.6460.4780.6880.4980.7100.5130.7680.539
SRS0.6020.4400.6200.4500.6320.4620.7100.491
Improved6.87%8.03%9.85%9.72%11.03%9.88%7.57%8.92%
Crossformer0.6520.4840.6740.4990.7040.5150.7470.518
SRS0.6000.4350.6150.4520.6120.4370.7120.498
Improved8.03%10.22%8.82%9.48%13.02%15.23%4.67%3.89%
PatchMLP0.6680.4920.7110.5110.7320.5240.7760.537
SRS0.6020.4410.6590.4660.6790.4880.7240.505
Improved9.82%10.47%7.35%8.84%7.22%6.94%6.72%5.92%
xPatch0.6740.4860.7150.5060.7410.5200.8080.523
SRS0.6420.4510.6310.4400.6800.4790.7600.487
Improved4.82%7.29%11.71%12.95%8.26%7.91%5.91%6.86%
MLP0.6740.4870.7040.5040.7230.5180.7680.542
SRS0.6290.4440.6690.4670.6420.4640.7080.494
Improved6.72%8.92%4.95%7.28%11.27%10.52%7.79%8.85%
DatasetsPEMS08PEMS08PEMS08PEMS08
Horizons96192336720
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
PatchTST0.2480.2870.3190.3100.3610.3240.3990.368
SRS0.2340.2740.2990.2880.3430.3050.3710.332
Improved5.82%4.68%6.29%7.22%4.86%5.82%7.12%9.82%
Crossformer0.2300.2600.2390.2640.2720.2890.3200.316
SRS0.2230.2520.2350.2570.2500.2590.3090.300
Improved3.26%2.97%1.87%2.74%8.22%10.28%3.55%5.21%
PatchMLP0.2030.2420.3140.2730.3340.2880.3730.329
SRS0.1970.2320.3080.2670.3110.2720.3580.307
Improved2.96%4.22%1.85%2.04%6.92%5.42%3.98%6.82%
xPatch0.1710.2140.2600.2330.3050.2460.3400.274
SRS0.1690.2100.2520.2260.2870.2310.3250.260
Improved1.02%2.05%3.20%2.88%5.92%6.22%4.28%5.29%
MLP0.1840.2410.3060.2680.3410.2810.3800.320
SRS0.1710.2260.2910.2540.3130.2580.3570.298
Improved7.22%6.03%4.99%5.21%8.11%8.35%6.02%6.87%
评论

In this comment, we list the results of ZafNoo and AQWan.

DatasetsZafNooZafNooZafNooZafNoo
Horizons96192336720
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
PatchTST0.4290.4050.4940.4490.5380.4750.5730.486
SRS0.3720.3370.4210.3820.4740.4130.4970.410
Improved13.22%16.67%14.82%14.99%11.82%12.96%13.28%15.68%
Crossformer0.4300.4180.4790.4490.5050.4640.5600.494
SRS0.3820.3760.3980.3600.4510.3980.4850.414
Improved11.06%9.95%16.92%19.72%10.67%14.29%13.42%16.25%
PatchMLP0.4430.4160.5150.4590.5630.4840.6030.505
SRS0.3930.3510.4050.3820.4910.4100.5370.438
Improved11.28%15.58%21.28%16.79%12.84%15.25%10.96%13.28%
xPatch0.5120.4360.5820.4740.6030.4870.6750.517
SRS0.4560.3810.4820.3830.5440.4320.6020.474
Improved10.88%12.68%17.25%19.25%9.79%11.26%10.85%8.29%
MLP0.4340.4030.4980.4410.5540.4710.6040.503
SRS0.3680.3350.3990.3420.4640.3800.5240.426
Improved15.22%16.83%19.96%22.46%16.26%19.28%13.26%15.27%
DatasetsAQWanAQWanAQWanAQWan
Horizons96192336720
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
PatchTST0.7450.4680.7930.4900.8190.5020.8900.533
SRS0.6110.3790.6330.3800.7140.4300.6850.388
Improved17.95%18.92%20.17%22.35%12.83%14.25%23.02%27.26%
Crossformer0.7500.4650.7620.4790.8020.5040.8290.512
SRS0.5330.3470.5300.3290.6480.4050.6560.395
Improved28.88%25.27%30.44%31.27%19.25%19.58%20.87%22.85%
PatchMLP0.7710.4840.8150.4990.8350.5100.8960.526
SRS0.5950.3650.5870.3650.6580.4140.7070.407
Improved22.86%24.54%27.92%26.87%21.20%18.75%21.04%22.54%
xPatch0.7750.4740.8240.4940.8570.5100.9380.546
SRS0.6570.3890.6490.3840.7230.4170.7760.436
Improved15.28%17.92%21.26%22.26%15.67%18.26%17.28%20.17%
MLP0.7670.4760.8060.4950.8310.5110.9010.538
SRS0.5270.3180.5810.3520.6740.3990.6640.378
Improved31.29%33.28%27.92%28.88%18.92%21.82%26.29%29.78%
评论

Dear Reviewer xNzv, we then report the full efficiency experiments here, including all 6 datasets, i.e., AQShunyi, AQWan, NASDAQ, NYSE, PEMS08, ZafNoo

DatasetsVariantsMemory (MB)Params (M)Inference Time (s/batch)Training Time (s/batch)MACs (G)
AQShunyiPatchTST18226.2065.9156.39010.120
SRS19986.2146.6377.27811.008
Overhead9.66%0.14%12.19%13.89%8.78%
Crossformer618211.08542.27645.336161.376
SRS631011.09345.16347.519162.014
Overhead2.07%0.07%6.83%4.81%0.40%
AQWanPatchTST19466.2076.8967.39610.141
SRS22026.2157.8278.46311.029
Overhead13.16%0.14%13.50%14.43%8.76%
Crossformer344611.10128.34132.82481.341
SRS354811.11231.72334.82781.884
Overhead2.96%0.10%11.94%6.10%0.67%
NASDAQPatchTST5600.0234.5145.9720.064
SRS6060.0265.1336.8010.073
Overhead8.21%14.67%13.73%13.88%13.98%
Crossformer11227.91327.81428.96615.646
SRS11447.91530.66731.74415.657
Overhead1.96%0.03%10.26%9.59%0.07%
NYSEPatchTST9000.7166.3927.9412.490
SRS9280.7207.2468.7842.524
Overhead3.11%0.64%13.36%10.63%1.36%
Crossformer6540.12919.86221.5690.251
SRS6780.13020.56822.1510.258
Overhead3.67%0.57%3.55%2.70%2.67%
PEMS08PatchTST221612.6727.6737.80225.653
SRS252212.6828.7668.93426.584
Overhead13.81%0.08%14.25%14.52%3.63%
Crossformer1041411.08545.10045.320161.376
SRS1069811.09350.37247.478162.014
Overhead2.73%0.07%11.69%4.76%0.40%
ZafNooPatchTST6672429.543265.138281.0132132.979
SRS7380929.557282.721301.1272148.672
Overhead10.62%0.05%6.63%7.16%0.74%
Crossformer635492.330223.509238.386432.868
SRS680762.336236.595252.900442.390
Overhead7.12%0.26%5.85%6.09%2.20%
评论

Dear Reviewer xNzv, we then report the full results of initialization strategy here.

DatasetsETTh1ETTh1ETTh1ETTh1
Horizons96192336720
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
Constant0.370±0.0020.398±0.0020.406±0.0010.427±0.0020.435±0.0020.442±0.0010.441±0.0010.472±0.001
Xaiver0.364±0.0010.394±0.0010.401±0.0010.415±0.0010.424±0.0010.430±0.0010.432±0.0010.461±0.001
Kaiming0.365±0.0010.392±0.0010.402±0.0010.416±0.0010.425±0.0010.433±0.0020.426±0.0010.455±0.001
Randn0.366±0.0010.394±0.0010.400±0.0010.415±0.0010.424±0.0020.430±0.0010.426±0.0010.455±0.001
DatasetsETTm2ETTm2ETTm2ETTm2
Horizons96192336720
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
Constant0.166±0.0010.261±0.0010.226±0.0010.299±0.0010.282±0.0010.342±0.0010.360±0.0010.392±0.002
Xaiver0.162±0.0020.250±0.0030.220±0.0010.296±0.0010.268±0.0020.326±0.0020.357±0.0020.388±0.003
Kaiming0.165±0.0010.258±0.0010.222±0.0010.297±0.0010.273±0.0010.327±0.0020.355±0.0010.380±0.001
Randn0.164±0.0010.254±0.0010.220±0.0010.296±0.0010.271±0.0010.327±0.0010.353±0.0010.380±0.002
DatasetsSolarSolarSolarSolar
Horizons96192336720
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
Constant0.167±0.0020.223±0.0010.182±0.0020.238±0.0010.197±0.0020.249±0.0020.203±0.0020.262±0.001
Xaiver0.167±0.0020.222±0.0010.184±0.0020.241±0.0020.194±0.0020.252±0.0010.191±0.0020.248±0.002
Kaiming0.168±0.0020.224±0.0010.181±0.0020.234±0.0020.188±0.0030.244±0.0020.193±0.0010.248±0.002
Randn0.167±0.0020.222±0.0010.182±0.0030.237±0.0010.188±0.0020.245±0.0030.195±0.0020.251±0.002
DatasetsTrafficTrafficTrafficTraffic
Horizons96192336720
MetricsMSEMAEMSEMAEMSEMAEMSEMAE
Constant0.363±0.0010.258±0.0010.382±0.0010.265±0.0010.390±0.0010.269±0.0010.434±0.0020.293±0.002
Xaiver0.370±0.0010.259±0.0010.380±0.0010.263±0.0010.390±0.0010.271±0.0020.435±0.0010.295±0.001
Kaiming0.363±0.0010.256±0.0010.381±0.0020.263±0.0010.391±0.0010.269±0.0010.436±0.0010.297±0.001
Randn0.361±0.0010.254±0.0010.380±0.0010.263±0.0010.392±0.0010.270±0.0010.434±0.0010.293±0.001
评论

Dear Reviewer xNzv,

Please help to go through the rebuttal and participate in discussions with authors. Thank you!

Best regards, AC

评论

Dear Reviewer xNzv,

Since the discussion phase ends soon, we sincerely request you to participate in the discussion. We hope our feedbacks could well handle your previous concerns, and if you have other questions we can also make further discussions!

Best wishes!

评论

The responses have addressed my concerns and I will raise their scores to the level of Borderline accept.

评论

Dear Reviewer xNzv, we are thrilled that our responses have effectively addressed your questions and comments. We would like to express our sincerest gratitude for taking the time to review our paper and provide us with such detailed feedback!

最终决定

This paper proposes the Selective Representation Space (SRS) module, a plug-and-play component that adaptively selects and reorders patches for time series forecasting, addressing the rigidity of conventional adjacent patching. The strengths lie in its novel and gradient-friendly design, consistent performance improvements across benchmarks, and the ability to enhance both a simple baseline and existing patch-based models. After rebuttal, the authors further clarified implementation details, provided more experiments and addressed concerns on efficiency. All the reviewers were satisfied with the rebuttal. Therefore, I would like to recommend accepting this paper.