PaperHub
5.8
/10
withdrawn4 位审稿人
最低5最高8标准差1.3
8
5
5
5
4.3
置信度
正确性2.5
贡献度2.8
表达2.8
ICLR 2025

XXLTraffic: Expanding and Extremely Long Traffic forecasting beyond test adaptation

OpenReviewPDF
提交: 2024-09-26更新: 2025-02-24

摘要

关键词
spatio-temporaltraffic forecastingtime-series

评审与讨论

审稿意见
8

This paper present an XXLTraffic traffic dataset spanning up to 23 years in different countries, which can be useful for the other researchers. They have also implemented the datasets with many advanced approaches to confirm the range and different experiments directions.

优点

  1. This paper introduces an XXLTraffic dataset that spans up to 23 years from different regions, which means the dataset include sufficient information for the traffic feature.
  2. This paper presents many potential conditions that the dataset could be applied to such as short-term traffic prediction, long-term multivariate traffic prediction and extremely long term prediction with gaps. All of these domains are worthwhile problem to focus.

缺点

  1. The dataset is valuable, but performing extremely long-term predictions is challenging, as traffic patterns can change significantly over the years, and even the road infrastructure itself may be reshaped over a 23-year span. Such factors could potentially reduce the impact of the dataset's contribution, because many open datasets are enough for normal short and long term prediction.
  2. Some typos, e.g., ‘beyongd’ should be ‘beyond’ around line 130;

问题

How is the gap dataset collected, given that "gap" typically refers to missing data? Can this dataset be directly used to address traditional missing data scenarios?

伦理问题详情

NA

评论

We appreciate the reviewer's acknowledgment of the dataset’s impact and the careful review for our manuscript.

Q1: Road infrastructure may influence the long time span dataset.

The dataset is valuable, but performing extremely long-term predictions is challenging, as traffic patterns can change significantly over the years, and even the road infrastructure itself may be reshaped over a 23-year span. Such factors could potentially reduce the impact of the dataset's contribution, because many open datasets are enough for normal short and long term prediction.

We appreciate your feedback and understand your concern regarding the factor of traffic patterns changing over a 23-year span. Therefore, we set the gap years to several years, with the gap length referencing the time it takes for a highway to be planned and finally completed. Predicting the traffic volume at those locations several years in advance is of great significance. As you can see from Table 1 in the paper, the performance of all baselines is relatively poor, indicating the considerable difficulty of accurately predicting this kind of setting. Thus, our dataset spanning up to 23 years can provide more samples for training and offer more potential for modeling temporal features in the models.

Q2: Some typos.

Some typos, e.g., ‘beyongd’ should be ‘beyond’ around line 130;

Thank you for your feedback. We will carefully review the entire manuscript to correct any grammatical errors and typos. These are some error corrections from our self-review as follows (the full table could be seen in the author global rebuttal):

(1) In Section 1.1, "Our dataset are evolving and longer than existing datasets." should be "Our dataset is...".

(2) In Section 4.1, "tfNSW is an open-source data platform provided by Transport for NSW" should be "The tfNSW is...." because "tfNSW" is the first word of the sentence.

(3) In Section. 5.4, "SOTA" should be "the state of the art" since it is being used for the first time.

Additionally, Based on Reviewer ytp5’s suggestion, we have added five additional baselines shown in the table:

STGCNASTGCNGWNAGCRNPDFormer
MethodsMetricsMSEMAEMSEMAEMSEMAEMSEMAEMSEMAE
960.5560.5360.7650.6180.6760.5740.5960.5620.6210.576
1-yeargap1920.5620.5390.7640.6260.5810.5460.5650.5430.5450.536
3360.5610.5380.7170.5980.5820.5430.5800.5520.5740.548
961.2560.8841.4410.9591.2500.8751.1680.8661.2560.899
PEMS03_gap1.5-yeargap1921.1870.8721.3950.9471.1940.8561.1220.8431.1170.842
3361.1670.8531.2730.8771.10150.8321.1760.8621.1820.871
962.0591.1762.2361.2341.9871.1481.9501.1502.0681.189
2-yeargap1922.0551.1842.1801.2131.8771.1171.7461.0701.8971.131
3362.0221.1691.8891.1171.8471.1071.9301.1401.9821.161

Based on Reviewer 6qSU’s suggestion, we have provided statistics on training times for the largest and smallest datasets shown in the following table, demonstrating that our dataset and the settings we propose can be easily trained on a standard single V100 GPU.

MambaiTransformerDLinearAutoformerInformerFEDFormerMICNPatchTST
MethodsMetricsMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAE
961.4800.8451.2360.8221.1660.7481.3200.8791.3540.7771.3770.8951.2190.7780.9460.778
1-yeargap1921.5430.8671.0880.7331.1840.7581.5990.9611.2290.7571.3120.8901.2620.7991.0260.821
3361.4590.8471.0990.7391.1960.7631.4770.9341.2420.7761.2190.8381.3660.8330.9490.774
961.7450.9441.3550.8771.3720.8461.6581.0161.6150.8791.4400.9191.2040.7781.0250.807
tfNSW_gap1.5-yeargap1921.7840.9681.2600.8181.3670.8461.6660.9971.4840.8541.4520.9321.2510.7891.0580.823
3361.6280.9391.2840.8351.3850.8541.5150.9451.5000.8831.3460.8711.2030.7700.9950.788
961.3640.8571.0190.7821.0080.7421.3190.9071.1010.7131.3180.8791.0330.7450.8480.727
2-yeargap1921.1950.7830.8840.7011.0240.7491.3190.8911.0540.7211.2220.8551.0940.7370.9090.757
3361.5880.9030.9070.7301.1660.7481.2330.8491.7870.6391.0560.7660.9590.6910.8650.714
评论

Thanks for the authors reply. They have answered most of my questions. I will maintain my score.

审稿意见
5

This paper proposed large-scale datasets, named XXLTrafic, to address limitations in existing traffic forecasting datasets. It provides extensive, real-world datasets with long timespans to support research in extremely long forecasting and adaptive model development for evolving traffic patterns.

优点

  1. The proposed XXLTraffic is the largest publicly available traffic dataset so far. The PEMS part spans up to 23 years and the NSW part also has around 11 years’ data.

  2. The authors evaluated the performance of existing baselines on the proposed datasets with various settings. The overall results verify the importance of introducing large-scale datasets with extremely long time spans.

缺点

  1. The most significant weakness is that this paper has no novelty. Although this conference has a dataset and benchmarks area, I do not think, for a top-tier conference, it is enough to propose a large-scale dataset without any model contribution. Given the fact that there are existing large-scale datasets available with just the number of nodes or time spans fewer than XXTraffic, my concern becomes more intense.

  2. Some motivations remain unclear. In the introduction section, the authors mentioned ‘beyond test adaptation’. However, it is not easy to understand how this relates to the proposed datasets. Is the concept of ‘beyond test adaptation’ better or more realistic than test time adaptation?

  3. The setting of Extremely Long Prediction with Gaps is a bit strange to me. Here, the authors used an example of highway route planning prediction to justify the necessity of the gap. It is possible that we need to stimulate the traffic flows of uninstalled sensors to facilitate road planning. Thus, the optimal setting should be using existing sensors to predict traffic flows of uninstalled sensors, plus the gap. However, it seems the authors still used historical signals of uninstalled sensors in this setting. If we already know their historical readings, we do not have to plan anything.

问题

The authors can address the above concerns.

伦理问题详情

NA

评论

We sincerely thank the reviewer for taking the time to provide valuable suggestions and detailed comments. We are pleasant to hear that the reviewer'acknowledges of importance of introducing large-scale datasets with extremely long time spans.

Q1: Lack of model contribution

The most significant weakness is that this paper has no novelty. Although this conference has a dataset and benchmarks area, I do not think, for a top-tier conference, it is enough to propose a large-scale dataset without any model contribution. Given the fact that there are existing large-scale datasets available with just the number of nodes or time spans fewer than XXTraffic, my concern becomes more intense.

We appreciate your feedback and understand your concern regarding the novelty of our submission. However, we would like to point out that many esteemed dataset contributions in the field have focused primarily on the datasets themselves and the associated baseline experiments, without necessarily introducing a new model. Examples include the Large-ST dataset from NIPS 2023 and the DL-Traffic dataset, which was a runner-up for the best paper at CIKM 2021. Neither of these proposed their own models, yet they are recognized for their significant contributions to the research community.

We firmly believe that the data processing, experimental setup, and the extensive baseline comparisons we have presented hold practical research value. In response to the concerns raised by Reviewer ytp5 and Reviewer 6qSU, we have added five additional baselines and included statistics on training times for the largest and smallest datasets, demonstrating that our dataset and the settings we propose can be easily trained on a standard single V100 GPU.

We maintain that our work provides a substantial contribution to the field by offering a unique dataset that enables long-term traffic forecasting with gaps, a challenging domain that has not been sufficiently explored. The practical implications of our dataset and experimental framework are significant, and we are confident that they will facilitate further research and innovation in traffic forecasting.

[1] Liu X, Xia Y, Liang Y, et al. Largest: A benchmark dataset for large-scale traffic forecasting[J]. Advances in Neural Information Processing Systems, 2024, 36.

[2] Jiang R, Yin D, Wang Z, et al. Dl-traff: Survey and benchmark of deep learning models for urban traffic prediction[C]//Proceedings of the 30th ACM international conference on information & knowledge management. 2021: 4515-4525.

Q2: Unclear destription of some motivation

Some motivations remain unclear. In the introduction section, the authors mentioned ‘beyond test adaptation’. However, it is not easy to understand how this relates to the proposed datasets. Is the concept of ‘beyond test adaptation’ better or more realistic than test time adaptation?

Thank you for your suggestion. We have clarified the concepts of 'beyond test adaptation' and 'test time adaptation' in the introduction section to distinguish our approach from test time adaptation. Our aim is to provide a dataset with an extended time span that can accommodate various durations of gap settings, such as 1 year, 5 years, and in this work, we have recommended 1, 1.5, and 2 years for baseline comparison experiments. Under different gap settings, we can consider them as different distributions. We will expand Figure 1 in subsequent versions to include gap settings and clarify the concept more clearly.

评论

Q3: Unclear destription of some motivation

The setting of Extremely Long Prediction with Gaps is a bit strange to me. Here, the authors used an example of highway route planning prediction to justify the necessity of the gap. It is possible that we need to stimulate the traffic flows of uninstalled sensors to facilitate road planning. Thus, the optimal setting should be using existing sensors to predict traffic flows of uninstalled sensors, plus the gap. However, it seems the authors still used historical signals of uninstalled sensors in this setting. If we already know their historical readings, we do not have to plan anything.

Our experiments with gap settings are based on the historical datasets of installed sensors, as only an extremely long time span dataset can support such configurations. Utilizing historical data from existing sensors to predict future traffic flows during gap years is a relatively simpler scenario. Our baseline comparison experiments reveal that even under this simplified setting, accurate prediction is quite challenging. If we use data from installed sensors to predict traffic flows for uninstalled sensors shown in Table 9 (Appendix part), the results would be several times worse compared to the settings in Table 1 of our work. The extremely time span dataset we propose offers more potential to address the aforementioned challenges, both by enriching the data samples on the temporal dimension and by providing more possibilities for temporal feature extraction, as demonstrated in Table 4 (in the manuscript) where lengthening the data input enhances forecasting performance.

Additionally, Based on Reviewer ytp5’s suggestion, we have added five additional baselines shown in Table 1 shown in global rebuttal. And based on Reviewer 6qSU’s suggestion, we have provided statistics on training times for the largest and smallest datasets shown in Table 2 in global rebuttal, demonstrating that our dataset and the settings we propose can be easily trained on a standard single V100 GPU.

评论

The authors clarified the motivations and some ambiguous of this paper, which is helpful. However, the major concern of novelty still cannot be (fully) addressed.

Large-ST was published on the NeurIPS dataset and benchmark track and DL-Traffic was published on the CIKM resource track. These tracks specifically aim at dataset papers with their unique review criteria and submission channels. The proceedings also indicate the track explicitly. However, ICLR just mentions the datasets and benchmarks as a submittable topic. All papers are submitted to the same channel. Therefore, it is not enough to use Large-ST and DL-Traffic to justify the contributions for an ICLR submission.

This is the major consideration for me to give the current score. If the paper is rejected, I recommend the authors to submit it to a better venue.

Thanks.

评论

Dear reviewer YRBw, we thank you again for your constructive review and comments. We hope we have addressed all your concerns. If you believe our comments and revisions have satisfied your concerns, would you please reconsider raising your score?

审稿意见
5

The authors present XXLTraffic, the longest available public traffic dataset with the longest timespan collected from Los Angeles, USA, and New South Wales, Australia, curated to support research in extremely long forecasting beyond test adaptation. The benchmark includes both typical time-series forecasting settings with hourly and daily aggregated data and novel configurations that introduce gaps and down-sample the training size to better simulate practical constraints.

优点

  1. A brand-new dataset is proposed, which is the largest available public traffic dataset in extremely long traffic forecasting.

  2. Numerous comparative experiments validate the differences in performance of different models on this dataset, providing a good guide for peers to understand the latest technological developments.

缺点

  1. The maximum number of nodes in the proposed dataset is less than the Large-ST, which seems to need improvement, after all, the real road network is very large.

  2. I think it is necessary for the authors to explain in more depth the motivation for constructing such a long-term dataset, because as transportation infrastructure advances and human travel modes change, traffic data that is too ancient is not always helpful enough to understand future transportation patterns.

  3. Due to the sheer size of the proposed dataset, it requires more computational resources. Therefore, whether this dataset can be popularized and whether peers can easily adopt it in their own research is also a concern.

问题

See Weaknesses.

评论

We sincerely thank for the constructive comments and suggestions. We are encouraged that the reviewer acknowledges that our work provide a good guide for peers to understand the latest technological developments.

Q1: Spatial size of the dataset

The maximum number of nodes in the proposed dataset is less than the Large-ST, which seems to need improvement, after all, the real road network is very large.

Thank you for your review. We appreciate the feedback provided. In the hyperlink in our manuscript, we have made available the raw dataset which contains a substantial number of nodes. However, our focus is on conducting extremely long-term forecasting (i.e., our gap setting, such as gaps of 1 year, 1.5 years, and 2 years) based on nodes that span over twenty years. However, most of these nodes do not have data spanning over 20 years, which is essential for our gap setting. After conducting data analysis and preprocessing, we have established that each district contains between 100 and 800 nodes. This is why our spatial node count does not reach the substantial numbers seen in the Large-ST dataset.

Despite the relatively lower number of nodes per district in our dataset, we offer datasets from various geographical areas, including 9 districts from the PeMS dataset and 1 traffic dataset from the state of New South Wales, Australia. This diverse regional coverage allows for a comprehensive exploration of extremely long-term traffic forecasting across different spatial scales.

Q2: explain in more depth the motivation for dataset

I think it is necessary for the authors to explain in more depth the motivation for constructing such a long-term dataset, because as transportation infrastructure advances and human travel modes change, traffic data that is too ancient is not always helpful enough to understand future transportation patterns.

Our motivation for creating a extremely long-term dataset is to provide a robust basis for future traffic pattern analysis, which is essential for strategic planning in urban development and infrastructure investment. Our dataset, combined with our gap setting, allows for the assessment of traffic volumes over extended periods, enabling better planning for future road construction and traffic management. This extremely long-term perspective is crucial for aligning transportation infrastructure with future demands and ensuring the sustainability of urban environments.

Q3: computational resources for the baselines experiments

**Due to the sheer size of the proposed dataset, it requires more computational resources. Therefore, whether this dataset can be popularized and whether peers can easily adopt it in their own research is also a concern. **

Refer to your Question 1, our dataset is designed to filter out nodes with a time span exceeding 20 years, resulting in a relatively small spatial dimension that meets the criteria. Consequently, the dataset primarily has a longer temporal dimension, making the training cost acceptable. We have compiled the training time for one epoch of all the baselines on the largest dataset, PEMS04_gap, and the smallest dataset, tfNSW, as shown in the table below. (Additionally, we have included five traffic forecasting domain baselines: STGCN, ASTGCN, GWN, AGCRN, PDFormer according to Reviewer ytp5’s suggestion, as shown in Table 1 in the global comments.)

Table 2. Training time per epoch pf all the baselines in largest PEMS04_gap and smallest tfNSW

baselinesMambaiTransformerDLinearAutoformerInformerFEDFormerMICNPatchTSTSTGCNASTGCNGWNAGCRNPDFormer
PEMS04_gap1396.31310.2237.91656.7652.5900.9409.83649.42425.97921.94110.413369.18013.9
tfNSW_gap113.744.17.4148.273.3380.1101.255.337.3228.850.4658.647.9
评论

Dear reviewer 6qSU, we thank you again for your constructive review and comments. We hope we have addressed all your concerns. If you believe our comments and revisions have satisfied your concerns, would you please reconsider raising your score?

评论

Thanks for your detailed responses. After reviewing the comments of other reviewers, I think my current score is appropriate.

评论

Thanks for your response, we believe that we have addressed all your concerns, we have also extended some baselines as requested by reviewer ytp5. We’d greatly appreciate it if you could review our rebuttal and consider adjusting your scores based on our updates.

审稿意见
5

This paper presents a traffic dataset containing extensive data records from multiple regions over a long-term period, aimed at supporting research in ultra-long-term traffic prediction.

优点

(1) A new traffic dataset is proposed, comprising data from numerous regions across California and New South Wales over an extended time period.

(2) The temporal distribution evolution of selected data was visualized, revealing the evolutionary characteristics of the data.

(3) Tests were conducted on relevant prediction tasks, including scenarios with time gaps and varying input lengths.

缺点

(1) Compared to existing works like LargeST, this work's contribution is insufficient as it merely expands the temporal and spatial scope of data collection. These data can be obtained through the open-source PEMS system in the same way.

(2) Although the number of regions and time span are substantial, the vast majority of data is limited to California. Including data from more cities and countries might have been a better approach.

(3) Most baselines in the experiments are specifically designed for time series prediction, lacking models specifically designed for traffic prediction/spatiotemporal prediction, such as PDFormer[1], ASTGCN[2], STGCN[3], AGCRN[4], GWN[5], etc. It is important to explore the performance of these types of baselines on traffic datasets.

[1] PDFormer: Propagation Delay-Aware Dynamic Long-Range Transformer for Traffic Flow Prediction. AAAI 2023.

[2] Attention Based Spatial-Temporal Graph Convolutional Networks for Traffic Flow Forecasting. AAAI 2019.

[3] Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. IJCAI 2018.

[4] Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting. NeurIPS 2020.

[5] Graph WaveNet for Deep Spatial-Temporal Graph Modeling. IJCAI 2019.

问题

Please refer to the Weaknesses.

评论

Q3: Additional traffic forecasting baselines

Therefore, based on the five baselines you provided, We have observed that most traffic forecasting baselines indeed perform worse compared to those in long-term forecasting, and their performance deteriorates as the gap increases, as shown in the following table (We also provided a traing time statistics icluding the aforementioned baselines and your suggested baselines in Table 2 of global rebuttal):

STGCNASTGCNGWNAGCRNPDFormer
MethodsMetricsMSEMAEMSEMAEMSEMAEMSEMAEMSEMAE
960.5560.5360.7650.6180.6760.5740.5960.5620.6210.576
1-yeargap1920.5620.5390.7640.6260.5810.5460.5650.5430.5450.536
3360.5610.5380.7170.5980.5820.5430.5800.5520.5740.548
961.2560.8841.4410.9591.2500.8751.1680.8661.2560.899
PEMS03_gap1.5-yeargap1921.1870.8721.3950.9471.1940.8561.1220.8431.1170.842
3361.1670.8531.2730.8771.10150.8321.1760.8621.1820.871
962.0591.1762.2361.2341.9871.1481.9501.1502.0681.189
2-yeargap1922.0551.1842.1801.2131.8771.1171.7461.0701.8971.131
3362.0221.1691.8891.1171.8471.1071.9301.1401.9821.161
961.2330.9041.3110.9291.3100.871.2900.8651.1010.851
1-yeargap1921.3981.0251.4431.0231.2550.8791.3180.9071.0780.816
3361.5051.1041.3250.9381.2920.8651.3020.8991.0850.821
961.3691.0821.3500.9371.6841.0631.2220.9611.0970.849
PEMS04_gap1.5-yeargap1921.6421.0351.5491.0751.5010.9881.3460.9891.2140.903
3361.3680.8621.1990.8321.5840.9970.2310.9631.1300.854
961.6531.0741.2470.8941.6691.0571.2360.9011.0990.861
2-yeargap1921.5451.0231.3560.9481.5541.0411.2690.9151.1590.891
3361.5561.0111.1740.8421.1280.8691.3140.9471.0320.816
962.5311.3253.1191.4042.1851.2802.1581.2201.9211.148
1-yeargap1922.5601.3372.4051.2762.2231.3112.1441.2132.2481.260
3362.4821.3172.0861.1812.2281.3122.0361.1851.9941.179
961.7721.0842.3701.1921.3090.8961.1660.8081.6641.055
PEMS08_gap1.5-yeargap1921.5220.9791.7301.0431.0740.8051.1440.8101.5181.012
3361.4950.9661.4440.9481.5330.9991.0780.7841.0490.801
961.3360.8681.9621.0691.7891.0171.3930.8561.0710.792
2-yeargap1921.2180.8191.2920.7951.0070.7611.2290.8131.1710.826
3361.1810.8061.2270.8421.1810.8131.2460.8141.0030.768
961.1760.7761.4040.8741.4510.8711.2830.8241.0700.756
1-yeargap1921.4950.8801.5470.8891.5800.9041.2670.8281.0190.753
3361.5170.9011.6170.9101.5340.8921.0480.7860.8220.710
961.2200.7961.2860.8131.5980.9401.2490.8091.1530.790
TfNSW_gap1.5-yeargap1921.6240.9341.6150.9021.6200.9501.2280.8081.0950.792
3361.6210.9451.6210.9001.2200.8141.1630.8111.0490.775
961.2010.7851.3310.8231.4140.8621.1240.7490.9690.702
2-yeargap1921.2870.8311.0610.7261.1090.7551.0580.7220.9410.737
3361.2600.8100.9960.6931.0110.6950.9600.7490.9170.715
评论

Thank you for the authors' response, which addressed some of my concerns. The authors supplemented extensive experimental results of spatiotemporal baselines on the datasets, making the work more complete, and I will increase my score to acknowledge this improvement. However, I still believe that the main contribution of this paper lies in expanding the temporal range of data collection and releasing some new datasets. Compared to technical papers at ICLR, the contribution seems insufficient. As a dataset paper, I think it might be more suitable for publication in dedicated tracks (such as NeurIPS's dataset and benchmark track).

评论

We thank the reviewer for taking the time to assess our manuscript and offering valuable suggestions.

Q1: Insufficient contribution

Compared to existing works like LargeST, this work's contribution is insufficient as it merely expands the temporal and spatial scope of data collection. These data can be obtained through the open-source PEMS system in the same way.

We acknowledge your concern regarding the comparison with existing works like LargeST and the perceived insufficiency in our contribution. It is true that our dataset, like LargeST, is derived from the open-source PEMS system. However, our primary contribution lies in the unprecedented expansion of the temporal dimension, which is a significant departure from datasets that focus more on spatial expansion, such as LargeST. Our dataset's unique value proposition is its ability to support extremely long-term traffic forecasting with gaps, a capability not commonly found in other datasets. This feature is particularly relevant for practical applications such as long-range urban planning and infrastructure development, which require traffic volume assessments years in advance. By providing data that spans over two decades, we enable researchers and planners to assess traffic patterns and trends over an extended period, which is crucial for strategic planning and decision-making. Furthermore, we have enriched our dataset with various baselines to highlight the challenges of the traffic forecasting domain, especially with our gap setting. This approach not only demonstrates the dataset's utility but also contributes to the field by providing a benchmark for evaluating the performance of different models in long-term forecasting scenarios. In summary, while our dataset may not exceed LargeST in terms of spatial dimensions, its extensive temporal coverage and the inclusion of gap settings offer a novel and valuable resource for traffic forecasting research and applications.

Q2: Regional limitation of the dataset

Although the number of regions and time span are substantial, the vast majority of data is limited to California. Including data from more cities and countries might have been a better approach.

Yes, we agree with your perspective that focusing solely on traffic data from California is limiting. Datasets from different countries and regions would indeed enrich the understanding of traffic conditions in various cities. Therefore, our dataset also includes a 13-year traffic dataset from New South Wales, Australia, which is complemented by the same set of baselines to ensure comparability and analysis across different geographical contexts.

Q3: Additional traffic forecasting baselines

Most baselines in the experiments are specifically designed for time series prediction, lacking models specifically designed for traffic prediction/spatiotemporal prediction, such as PDFormer[1], ASTGCN[2], STGCN[3], AGCRN[4], GWN[5], etc. It is important to explore the performance of these types of baselines on traffic datasets.

We agree with your suggestion. The reason we chose baselines based on the long-term forecasting domain for our experiments is that our experimental setup is more closely aligned with multivariate long-term forecasting. Since the introduction of Informer, the development of baselines in this field has been particularly vibrant, and we have benefited from the widely available open-source repositories that facilitate easy migration and reproduction. However, we acknowledge that it is also necessary to consider baselines from the traffic forecasting domain (12-step forecasting).

评论

We sincerely thank all the reviewers for the comments that have greatly improved the paper quality. We have taken all your comments seriously. We have added the new baselines, new analysis and clarify some confusing questions. The following global rebuttal provides 5 more baselines and computational statistics done during the rebuttal period.

  • (1) For Reviewer ytp5’s suggestion, we have added five additional baselines shown in Table 1

Table 1. Compasirons with different traffic forecasting baselines

STGCNASTGCNGWNAGCRNPDFormer
MethodsMetricsMSEMAEMSEMAEMSEMAEMSEMAEMSEMAE
960.5560.5360.7650.6180.6760.5740.5960.5620.6210.576
1-yeargap1920.5620.5390.7640.6260.5810.5460.5650.5430.5450.536
3360.5610.5380.7170.5980.5820.5430.5800.5520.5740.548
961.2560.8841.4410.9591.2500.8751.1680.8661.2560.899
PEMS03_gap1.5-yeargap1921.1870.8721.3950.9471.1940.8561.1220.8431.1170.842
3361.1670.8531.2730.8771.10150.8321.1760.8621.1820.871
962.0591.1762.2361.2341.9871.1481.9501.1502.0681.189
2-yeargap1922.0551.1842.1801.2131.8771.1171.7461.0701.8971.131
3362.0221.1691.8891.1171.8471.1071.9301.1401.9821.161
961.2330.9041.3110.9291.3100.871.2900.8651.1010.851
1-yeargap1921.3981.0251.4431.0231.2550.8791.3180.9071.0780.816
3361.5051.1041.3250.9381.2920.8651.3020.8991.0850.821
961.3691.0821.3500.9371.6841.0631.2220.9611.0970.849
PEMS04_gap1.5-yeargap1921.6421.0351.5491.0751.5010.9881.3460.9891.2140.903
3361.3680.8621.1990.8321.5840.9970.2310.9631.1300.854
961.6531.0741.2470.8941.6691.0571.2360.9011.0990.861
2-yeargap1921.5451.0231.3560.9481.5541.0411.2690.9151.1590.891
3361.5561.0111.1740.8421.1280.8691.3140.9471.0320.816
962.5311.3253.1191.4042.1851.2802.1581.2201.9211.148
1-yeargap1922.5601.3372.4051.2762.2231.3112.1441.2132.2481.260
3362.4821.3172.0861.1812.2281.3122.0361.1851.9941.179
961.7721.0842.3701.1921.3090.8961.1660.8081.6641.055
PEMS08_gap1.5-yeargap1921.5220.9791.7301.0431.0740.8051.1440.8101.5181.012
3361.4950.9661.4440.9481.5330.9991.0780.7841.0490.801
961.3360.8681.9621.0691.7891.0171.3930.8561.0710.792
2-yeargap1921.2180.8191.2920.7951.0070.7611.2290.8131.1710.826
3361.1810.8061.2270.8421.1810.8131.2460.8141.0030.768
961.1760.7761.4040.8741.4510.8711.2830.8241.0700.756
1-yeargap1921.4950.8801.5470.8891.5800.9041.2670.8281.0190.753
3361.5170.9011.6170.9101.5340.8921.0480.7860.8220.710
961.2200.7961.2860.8131.5980.9401.2490.8091.1530.790
TfNSW_gap1.5-yeargap1921.6240.9341.6150.9021.6200.9501.2280.8081.0950.792
3361.6210.9451.6210.9001.2200.8141.1630.8111.0490.775
961.2010.7851.3310.8231.4140.8621.1240.7490.9690.702
2-yeargap1921.2870.8311.0610.7261.1090.7551.0580.7220.9410.737
3361.2600.8100.9960.6931.0110.6950.9600.7490.9170.715
  • (1) For Reviewer 6qSU’s suggestion, we have provided statistics on training times for the largest and smallest datasets shown in Table 2, demonstrating that our dataset and the settings we propose can be easily trained on one V100 GPU.

Table 2. Training time for every epoch in the largest PEMS04_gap and smallest tfNSW

MambaiTransformerDLinearAutoformerInformerFEDFormerMICNPatchTST
MethodsMetricsMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAE
961.4800.8451.2360.8221.1660.7481.3200.8791.3540.7771.3770.8951.2190.7780.9460.778
1-yeargap1921.5430.8671.0880.7331.1840.7581.5990.9611.2290.7571.3120.8901.2620.7991.0260.821
3361.4590.8471.0990.7391.1960.7631.4770.9341.2420.7761.2190.8381.3660.8330.9490.774
961.7450.9441.3550.8771.3720.8461.6581.0161.6150.8791.4400.9191.2040.7781.0250.807
tfNSW_gap1.5-yeargap1921.7840.9681.2600.8181.3670.8461.6660.9971.4840.8541.4520.9321.2510.7891.0580.823
3361.6280.9391.2840.8351.3850.8541.5150.9451.5000.8831.3460.8711.2030.7700.9950.788
961.3640.8571.0190.7821.0080.7421.3190.9071.1010.7131.3180.8791.0330.7450.8480.727
2-yeargap1921.1950.7830.8840.7011.0240.7491.3190.8911.0540.7211.2220.8551.0940.7370.9090.757
3361.5880.9030.9070.7301.1660.7481.2330.8491.7870.6391.0560.7660.9590.6910.8650.714
评论

Dear Reviewers,

Thank you very much for your effort. As the discussion period is coming to an end, please acknowledge the author responses and adjust the rating if necessary.

Sincerely, AC

评论

Dear Reviewers,

As you are aware, the discussion period has been extended until December 2. Therefore, I strongly urge you to participate in the discussion as soon as possible if you have not yet had the opportunity to read the authors' response and engage in a discussion with them. Thank you very much.

Sincerely, Area Chair

AC 元评审

This paper presents XXLTraffic with the longest timespan collected from Los Angeles, USA, and New South Wales, Australia, curated to support research in extremely long forecasting beyond test adaptation. The reviewers and I believe that providing this benchmark is a very meaningful contribution to the research community. However, as some reviewers pointed out, the benchmark itself should have enough novelty to open or support a new research problem. The reviewers feel that increasing the temporal range of the benchmark dataset may not be sufficient for that perspective. I also agree with the reviewers' opinions. When I submitted a benchmark paper to NeurIPS Datasets & Benchmarks track, I have often received similar comments. Overall, considering the reviewers' opinions, I am sorry to recommend a reject.

审稿人讨论附加意见

Reviewers ytp5 and YRBw pointed out a lack of contribution, but the authors disagreed with their points. I understand the perspectives both from the reviewers and authors. I also believe that a benchmark paper should have novelty in the perspective of the problem that the dataset can be utilized.

最终决定

Reject

撤稿通知

withdraw