A Large-scale Dataset and Benchmark for Commuting Origin-Destination Flow Generation
This paper provides a large-scale dataset including commuting origin-destination matrices of over 3000 areas in the United States for training and benchmarking origin-destination flow modeling methods.
摘要
评审与讨论
This paper introduces a large-scale dataset, LargeCommutingOD, which captures commuting origin-destination (OD) flows for 3,333 diverse regions across the United States, spanning 9,372,610 square kilometers and encompassing a variety of urban environments. Compared to existing datasets, LargeCommutingOD offers broader and more comprehensive coverage and is made publicly available. The authors benchmark several baseline models using a standardized evaluation protocol, uncovering that network-based generative models could be a promising avenue for future research in this domain.
优点
- The authors present a large-scale and comprehensive dataset, LargeCommutingOD, which includes commuting OD flows for 3,333 regions, covering 9,372,610 square kilometers across diverse urban environments in the United States. This dataset provides more extensive coverage than existing resources and is openly accessible, promoting transparency and reproducibility.
- By comparing LargeCommutingOD to previous datasets, the authors demonstrate that it captures a broader range of urban characteristics and types, further supporting its utility for various research purposes.
- The authors implement and evaluate several baseline models to demonstrate the feasibility and relevance of their dataset. Notably, they identify that network-based generative models show considerable potential, which opens avenues for further investigation.
- Upon reviewing the proposed code, I observed that the authors have implemented a comprehensive set of baselines, which could even serve as a robust library for OD flow generation. I hope the authors can commit to releasing this code if the paper is accpeted.
缺点
- Given the dataset's coverage of 9,372,610 square kilometers and its focus on approximately 3,000 regions, the OD metrics may primarily reflect macro-level commuting patterns rather than fine-grained dynamics within individual cities. This limitation may restrict the dataset's applicability for studies requiring detailed intra-city analyses.
- The OD matrix, denoted as where exceeds 3,000, may be sparse. Providing a visual representation, such as a heatmap of the OD matrix, would clarify its density and structure, which could help readers better understand the data distribution.
- The authors included a non-anonymized GitHub link (https://github.com/tsinghuafib-lab/CommutingODGen-Dataset) in the submission, potentially compromising anonymity. And the authors have specified that only their names need to be anonymized.
问题
Please explain the questions.
Thank you for appreciating our work and for your careful reading of the GitHub link. We apologize for the oversight in not anonymizing this link.
After double checking the ICLR's double-blind submission policy, we have confirmed that although the link indeed leads to a page that is not completely anonymized; however, we can confirm that the contents of the link, which are consistent with the anonymized repository, in fact, did not reveal our identity. Therefore, the information on said page does not violate the guidelines about the double-blind rule, which state: "Submissions will be double blind: reviewers cannot see author names when conducting reviews, and authors cannot see reviewer names. Having papers on arxiv is allowed per the dual submission policy outlined below.". As such, we believe that our submission still has the right to receive a review.
Therefore, if after reviewing the policy, you have no further concerns regarding the anonymization, then we would still like to request your full review.
Sorry for the misunderstanding. I have changed the review to the right one. And I have increased my score since you addressed my concerns of anonymity.
Thank you for your timely comment. We will upload new review as soon as possible
Thank you for your valuable time and careful review of our manuscript. We have carefully considered your insightful comments and look forward to further discussions. The revision in the manuscript based on your suggestions have been marked in red for clarity.
W1: The dataset reflects macro-level commuting patterns but may lack detail for intra-city analyses.
Thank you for your insightful comment. Our dataset indeed primarily captures macro-level commuting OD flows, which describe peak-period mobility patterns between sub-regions within urban areas. Specifically, the dataset includes over 3,000 urban areas, further subdivided into more than 70,000 internal sub-regions. The commuting OD flows detail the interactions between these sub-regions within individual cities, providing a comprehensive representation of intra-city commuting dynamics.
However, as you pointed out, our dataset currently lacks finer-grained data on other types of intra-city dynamics beyond commuting, such as movements for shopping, leisure, healthcare, and dining. This limitation may restrict the dataset's applicability for studying these detailed urban activities. Following your suggestion, we have explicitly discussed this limitation in Section 3.4 of the manuscript. We also plan to address this gap in future work by incorporating data that captures a broader range of urban mobility behaviors, further enhancing the dataset's utility.
W2: Heatmaps of sparse OD matrices could clarify density and structure, aiding data understanding.
Thank you for the valuable suggestion. We have visualized the OD matrices of different types of cities using heatmaps, which indeed highlight their sparsity. These visualizations and corresponding analyses are now included as Figure 8 in Appendix A.1 of the revised version.
Thank you for your response. Your response addressed most of my concerns. Regarding W2, I would like to further discuss the sparsity of the collected OD matrices. Specifically, are the other datasets you evaluated also sparse? My concern is that if the OD matrices are inherently sparse, generating full OD matrices directly could result in increased time and space complexity compared to directly generating OD transactions. This observation might point toward a more flexible and efficient generation paradigm worth exploring.
We sincerely appreciate this insightful and essential question. We discuss it from the following two perspectives:
- Not all OD matrices are inherently sparse. The sparsity of OD matrices primarily arises in large cities, where many distant region pairs lack flow interactions. However, not all cities are that large. Smaller cities, with fewer regions and closer spatial proximity, often have denser OD matrices. As shown in Figure 2(a) of the manuscript, city sizes follow a long-tail distribution, where most cities are small or medium-sized rather than large metropolitans. Consequently, most OD matrices are not sparse, and sparsity depends significantly on city size.
- You are absolutely correct, and your suggestion is highly valuable. For sparse OD matrices, directly generating the entire matrix may indeed be less efficient, as many zero flows carry limited significance but are treated equally with large OD flows during training. Addressing efficient modeling for sparse OD matrices and developing a flexible approach that considers cities of varying sizes are promising directions for future research. However, these challenges are beyond the scope of this work.
We greatly appreciate your thoughtful feedback. Please do not hesitate to share any further questions or concerns.
Thank you for your response. My concerns are addressed.
This paper constructs a large-scale dataset (LargeCommuingOD) containing commuting OD flows for 3,333 diverse areas around the United States covering 9,372,610 km2 including a wide range of urban environments. Based on the LargeCommuingOD, the authors benchmark the existing widely used models for commuting ODflow generation. These efforts make up for the lack of comprehensive dataset and the absence of unified and systematic evaluation. Furthermore, this paper find out that network-based modeling for commuting OD flow supported by their dataset gives a promising performance. Thus, training on a large number of commuting OD networks can help the generative models to capture universal as well as distinct mobility patterns at city level, and therefore enhance the generalization ability.
优点
S1. This paper constructs dataset covers 3,333 areas around the United States, providing a much broader spatial scale comparing to existed datasets. S2. The dataset constructed in this paper covers metropolitan areas, towns, and rual areas, which is more comprehensive then existing datasets the focuses on usually a single type of urban environment. S3. Based on the LargeCommuingOD, the authors benchmark the existing widely used models for commuting ODflow generation, which makes up for the absence of unified and systematic evaluation.
缺点
W1. Please clarify whether WEDAN is a newly proposed model, or the idea of it has already been proposed somewhere else. W2. In Sec. 2.2, the link of the released dataset is given in plaintext, which may violate the anonymous policy.
问题
Please refer to Weakness.
Thank you for your valuable time and careful review of our manuscript. We have carefully considered your insightful comments and look forward to further discussions. The revision in the manuscript based on your suggestions have been marked in red for clarity.
W1: Is WEDAN a newly proposed model or an existing idea?
Thank you for your question. We agree that our paper could benefit from a more detailed discussion of the related works of the WEDAN model. Following your advice, we have added the following discussion to the revised version. WEDAN (Weighted Edges Diffusion condition on Attributed Nodes) is a novel and exploratory model that applies denoising diffusion-based graph generation models from a network perspective to commuting OD flow generation. The key novelties of WEDAN lies in two aspects:
- It models all OD flows within a city as a directed weighted network, considering the entire OD network as a single data sample.
- It utilizes the features of all regions (nodes) in the OD network as guidance for the diffusion model, enabling the generation of all edges and their corresponding weights.
It is worth noting that GraphMaker [1] also generates attributed graphs, but they differ significantly: WEDAN is specifically designed for the commuting OD flow generation task, emphasizing that each OD flow is influenced by the attributes of its origin and destination nodes, resulting in continuous flow volumes. In contrast, [1] focuses on generating large, sparse graphs by determining the existence of edges between nodes. Additionally, some works [2,3] generate both nodes and edge weights simultaneously, emphasizing the coupling between nodes and edges rather than using node attributes to guide edge generation.
[1] Li Mufei, et al. "Graphmaker: Can diffusion models generate large attributed graphs?." arXiv preprint arXiv:2310.13833 (2023).
[2] Jo, Jaehyeong, Seul Lee, and Sung Ju Hwang. "Score-based generative modeling of graphs via the system of stochastic differential equations." International conference on machine learning. PMLR, 2022.
[3] Clement Vignac, et al. Digress: Discrete denoising diffusion for graph generation. ICLR 2023.
W2: The plaintext dataset link in Sec. 2.2 may violate the anonymity policy.
Thank you for carefully reviewing and pointing out our oversight. We apologize for forgetting to anonymize the link in the main text in Sec. 2.2.
After thoroughly re-examining ICLR's double-blind submission policy, we have confirmed that, although the link leads to a page that is not fully anonymized, its content aligns with the anonymized repository and does not reveal the identity of authors. Therefore, the information on the linked page does not violate the double-blind policy, which states: "Submissions will be double blind: reviewers cannot see author names when conducting reviews, and authors cannot see reviewer names. Having papers on arxiv is allowed per the dual submission policy outlined below." Consequently, we believe our submission follows the guidelines and its evaluation should not be affected by this matter.
Thank you for your understanding. Please let us know if you have any further concerns or aspects you would like to discuss in more detail. We would greatly appreciate the opportunity for a deeper discussion.
Thanks for your response, and my concerns have been addressed. I have updated my score.
Commuting OD flows are critical inputs for urban planning and transportation, while the high data collection cost results in a lack of high-quality datasets. This paper introduces a large-scale dataset containing commuting OD flows for 3,333 areas including a wide range of urban environments around the United States. Based on this dataset, authors further benchmark widely used models for commuting OD flow generation. They find that, owing to the rich information contained in the constructed OD dataset, a generative method that considers each OD matrix collected from a county/city as a network and learns to generate it given conditional information can significantly outperform other methods. This may point to a new direction in this domain, which relies on collection of high-quality OD datasets.
优点
- Commuting OD data is valuable in building smart cities, while the data is generally lacked in many areas around the world. The synthetic data generation technique is a promising solution for this problem, requiring high-quality and high-volume data for training ML models. This paper makes a good contribution towards this objective.
- According to Table 1, the constructed dataset is more useful than existing ones, in terms of size and richness. The data analysis is sufficient, as in Figure 2-4.
- The benchmark is designed in a reasonable manner. It covers a wide range of mainstreaming approaches in this field, and evaluate the model performance in a comprehensive way, including both flow generation accuracy and property distribution similarity.
- The observation based on benchmark results is insightful. Based on the constructed large-scale OD dataset that is unseen in previous works, authors demonstrate the powerfulness of graph generative modeling in terms of generating OD flows for diverse cities. This may point to a new direction in this domain.
缺点
- The presented results are informative, but the explanations are rather insufficient. For example, Figures 6 and 7 should be explained in detail.
- Although authors discuss the limitations of the constructed dataset, the spatial scale (limited to the United States) and temporal scale (one year) can degrade its usefulness in applications.
问题
- Can authors demonstrate that models developed based on this US dataset can manage to transfer to other countries?
- In Figure 1, why Commuting Flows are processed to generate Regional Socio-demographics?
Q1: Can the models trained on this US dataset transfer to other countries?
Thank you for your question. To answer this, we conducted a transfer experiment from the US to the UK based on representative models. The details of experiments are shown as follows and added in the revised version of our manuscript in Appendix C.4. Specifically, we generated commuting OD flows for 326 Local Authority Districts (LAD), where the flows among Middle Layer Super Output Areas (MSOA). The ground truth was obtained from the Office for National Statistics (ONS) of the UK. It is very difficult to obtain regional features in the UK with the same format and semantics as in the US. Therefore, we used satellite images of regions to represent the input features consistently across these two countries.
| Model | CPC↑ | RMSE↓ | NRMSE↓ |
|---|---|---|---|
| GM-P | 0.240 | 101.6 | 1.752 |
| RF | 0.334 | 223.2 | 3.847 |
| DGM | 0.359 | 157.0 | 2.706 |
| GMEL | 0.362 | 149.1 | 2.570 |
| NetGAN | 0.331 | 198.9 | 3.429 |
| WEDAN | 0.485 | 72.68 | 1.253 |
The results indicate that models trained on the US dataset exhibit some transferability to other countries, particularly to developed countries like the UK. WEDAN benefits from graph generative modeling, achieving the best performance. However, the transferability cannot always be guaranteed, as there may be significant differences between countries. We aim to explore this direction in future work.
Q2: In Figure 1, why Commuting Flows are processed to generate Regional Socio-demographics?
Thank you for raising this point. We apologize if Figure 1 led to any misunderstanding. As detailed in lines 193–195 of the manuscript, we conducted regression analysis using OD flows as a reference to identify the most relevant and representative indicators from a large pool of potential mobility-related urban attributes. This process ensures that the selected indicators best capture the underlying mobility patterns.
To address this potential confusion, we update Figure 1 by adding "correlation & selection" to the arrow. This adjustment will more accurately reflect the data collection pipeline and clarify the meaning of Figure 1.
Thank you for your response. My concerns are addressed. It would be great to add the content in the latter version.
Thank you for your valuable time and careful review of our manuscript. We have carefully considered your insightful comments and look forward to further discussions. The revision in the manuscript based on your suggestions have been marked in red for clarity.
W1: The results are informative, but the explanations, such as for Figures 6 and 7, need more detail.
Thanks for your valuable suggestion. We have carefully reviewed Figures 6 and 7 and would like to provide more detailed explanations in Appendix C.5 and C.6, which are as follows:
- Figure 6: From Figure 6(a), we observe that all models tend to perform better in smaller cities in terms of CPC, with performance declining as city size increases. This trend can be attributed to the increasing heterogeneity in OD flow distributions in larger cities. Smaller cities often have more homogeneous region-pairs with short-distance flows, making predictions relatively easier. In contrast, larger cities have both short-distance and long-distance commuting, leading to a long-tailed distribution of OD flows and higher heterogeneity, which increases prediction difficulty. Figure 6(b) further illustrates that smaller cities tend to have higher RMSE values. This is because smaller cities typically exhibit higher flow volumes due to a prevalence of short-distance commuting, which increases the absolute prediction error. Conversely, larger cities often have sparser OD flows between many distant regions, with certain extreme flows contributing large values but overall lower flow volumes, resulting in smaller RMSE. Figures 6(c) and (d) support similar conclusions for cities of varying structures. For larger monocentric and polycentric cities, models like DiffODGen, which incorporate hierarchical designs for large cities, perform well. However, DiffODGen struggles with the "others" category, typically smaller cities, where its performance is less reliable. In contrast, WEDAN, benefiting from large-scale training data, demonstrates robust performance across all city sizes and structures.
- Figure 7: Figure 7 reveals that cities of different types share certain common human mobility patterns, supporting the feasibility of using a unified model to learn mobility patterns across diverse cities. Modeling both commonalities and distinctions between cities helps enhance the model's generalization capability. Figures 7(a) and (b) show that both monocentric and polycentric cities achieve high performance during training, likely because these city types cover a wide range of human mobility patterns. However, Figures 7(c) and (d) highlight that training solely on small or large cities fails to achieve strong transferability across each other. This result demonstrates the existence of differences in human mobility patterns across city types while also highlighting the value of training on a diverse set of city types. Such diverse training data enables the model to effectively capture both the differences and the shared mobility patterns between cities.
Based on these detailed insights, we hope our findings clarify the results and enhance their presentation.
W2: The spatial scale (limited to US) and temporal scale (one year) can limit the applicability.
Thank you for your insightful comments. While our dataset already represents a significant advancement over existing datasets in terms of scale and diversity, we agree that including data from other countries and multiple years would greatly enhance the dataset’s applicability, such as enabling global OD flow generation and long-term trend analysis. Therefore, these limitations have been discussed in Section 3.4 of the manuscript.
We will continuously maintain and expand this dataset and benchmark by incorporating data from additional countries, covering multiple years, and integrating the latest models. This will make the dataset more accessible and applicable to a wider range of fields over time.
This paper introduces a large-scale dataset called LargeCommuingOD for commuting Origin-Destination (OD) flow generation and provides a benchmark for evaluating models in this domain. The key contributions are:
-
Construction of a comprehensive dataset covering 3,333 diverse areas across the United States, including metropolitan areas, towns, and rural regions. The dataset spans 9,372,610 km² and contains commuting OD flows, sociodemographic data, and point-of-interest distributions for each area.
-
Development of a benchmark framework to evaluate and compare different commuting OD flow generation models, addressing the lack of standardized evaluation in existing research.
-
Benchmarking of 9 existing models, including physical models, classical machine learning approaches, and graph neural network models, using the new dataset.
-
Introduction of a preliminary adaptation of graph diffusion models called WEDAN (Weighted Edges Diffusion condition on Attributed Nodes) for commuting OD flow generation.
-
Analysis of model performance in terms of precision and generalizability across different urban environments.
The paper demonstrates that network-based generative models, particularly those leveraging graph diffusion techniques, achieve the best performance in both precision and generalization ability. This finding suggests new research directions in graph generative modeling for commuting OD flow generation.
The authors argue that their dataset and benchmark provide a valuable resource for researchers in urban planning, transportation, and related fields, enabling more comprehensive evaluation and development of commuting OD flow generation models.
优点
Originality:
-
The creation of a large-scale, comprehensive dataset (LargeCommuingOD) for commuting Origin-Destination (OD) flow generation covering 3,333 diverse areas across the United States is highly original. This dataset significantly expands on previous efforts in terms of scale and diversity.
-
The paper introduces a novel adaptation of graph diffusion models called WEDAN (Weighted Edges Diffusion condition on Attributed Nodes) for commuting OD flow generation, exploring a new paradigm in this field.
Quality:
-
The dataset construction process is rigorous, combining multiple reliable data sources including the U.S. Census Bureau, American Community Survey, and OpenStreetMap.
-
The analysis is thorough, examining both precision and generalizability of the models across different urban environments.
Clarity:
-
The paper is well-structured, clearly defining the problem, describing the dataset, and presenting the benchmark results.
-
Figures and tables effectively illustrate the dataset characteristics and model comparisons.
缺点
-
Limited Exploration of Model Interpretability: The paper emphasizes model performance but lacks a thorough examination of model interpretability, particularly for the network-based generative models that achieve the best results. Understanding the reasons behind their success could offer valuable insights for urban planners and policymakers.
-
Insufficient Exploration of Edge Cases: The paper would benefit from a more in-depth analysis of how the models perform in extreme or atypical urban environments within the dataset. This could reveal potential limitations or areas for improvement.
-
Lack of Discussion on Model Fairness and Bias: Given the diverse nature of the dataset, an analysis of potential biases in the models' predictions across different urban environments (e.g., rural vs. urban, high-income vs. low-income areas) would be beneficial.
-
Basic Benchmark Methods: Many of the selected benchmark methods are too basic. The paper should include more advanced models based on transformers and diffusion, such as those discussed in this paper and this one.
-
Questionable Significance of Claims: Are the claims regarding the significance of the research truly justified? Without this dataset, researchers in the field might only need about a week to conduct their own benchmarks. Additionally, there are already open-source frameworks, like LibCity, that have collected these benchmarks (see LibCity).
问题
What do the authors see as the most promising directions for future research based on their findings? Are there specific areas where they believe the dataset and benchmark could be most impactful?
Q1: What do the authors see as the most promising directions for future research based on their findings?
We appreciate the reviewer’s question regarding future research directions. Based on our findings, we identify the following promising directions for future exploration:
- Advancing Graph Generative Modeling for OD Flow Generation: The results with WEDAN trained on our large scale dataset, which includes 3,333 commuting OD networks, demonstrate that leveraging graph generative methods to perform OD network modeling significantly enhances performance. Future research can further refine these techniques, exploring advanced generative models to improve both precision and scalability based on our dataset.
- Toward Larger and More Diverse Datasets: Our study reveals that training models on larger scale and more diverse datasets can substantially improve their ability to generalize across different urban environments. This represents a key direction for future work, focusing on building even richer datasets to push the boundaries of model performance.
- Enabling Global OD Flow Modeling: An exciting future direction lies in developing models capable of generating OD flows around the world. This requires integrating data from multiple continents, as well as enhancing model architectures to handle such heterogeneous spatial contexts.
Q2: Are there specific areas where they believe the dataset and benchmark could be most impactful?
Thank you for this insightful question. We believe our dataset and benchmark have the potential to make significant contributions across multiple domains:
- Our dataset is a valuable resource for advancing commuting OD flow generation techniques, providing both training data and a standardized benchmark for model evaluation. It fills the gap of benchmarks in this domain, enabling researchers to develop more accurate and generalizable methods under a fair and standardized framework.
- The dataset also contributes to graph generation research by serving as a large-scale collection of directed weighted graphs. Unlike many existing directed weighted graph datasets, such as web visitation networks (e.g., Google PageRank Graphs) and social networks (e.g., Epinions Trust Network, Twitter Retweet Graph), which focus on digital user interaction-based relationships, our dataset is rooted in real-world transportation systems, offering a rich resource for studying transportation and connectivity. Our dataset bridges the gap between theoretical graph research and practical urban applications. Compared to existing weighted transportation networks like OpenFlights, our dataset includes a larger number of networks, as it is constructed at the city level, with each city represented as an independent graph. This structure not only increases the sample size but also enables more granular and diverse analyses, offering significant value to graph generation and learning research.
- Beyond flow generation, the dataset facilitates broader SDGs (Sustainable Development Goals) research on mobility, such as urban mobility resilience [1], community detection [2], and epidemic control[5,6]. The commuting OD flows, which capture the most routine and frequent human movements within cities, provide an intuitive basis for tracing and inferring interactions between communities and the spread of diseases. Moreover, improving the generalization ability of OD generation tasks to unknown urban areas expands the boundaries of data accessibility for researchers. This breaks the limitations of studies relying solely on existing data and enables broader research to support the future development of previously understudied regions.
[1] Boyeong Hong, et al. Measuring inequality in community resilience to natural disasters using large-scale mobility data. Nature Communications 2021.
[2] Hamed Nilforoshan, et al. Human mobility networks reveal increased segregation in large cities. Nature 2023.
[3] Jayson S. Jia, et al. Population flow drives spatio-temporal distribution of COVID-19 in China. Nature 2020.
Thank you for your valuable time and careful review of our manuscript. We have carefully considered your insightful comments and look forward to further discussions. The revision in the manuscript based on your suggestions have been marked in red for clarity.
Ongoing Experiments on Model Interpretability, Edge Cases, and Fairness
Thank you for your valuable feedback. In response to your suggestions, we are planning to include additional experiments on model interpretability, edge case analyses, and fairness. Due to the substantial amount of work involved, we are currently conducting these experiments on a priority basis and will provide the results within the next three days.
W4: Basic Benchmark Methods
Many of the selected benchmark methods are too basic. The paper should include more advanced models based on transformers and diffusion, such as this paper and this one.
Response: Thank you for pointing out the need to include more advanced models, such as those based on transformers and diffusion, to strengthen the benchmark's comprehensiveness. Regarding the specific references provided by you (e.g., Comparing Fairness of Generative Mobility Models), we carefully read this paper. While it focuses on fairness evaluation and introduces metrics such as Demographic Parity, the models evaluated in this works (e.g., Gravity, Radiation, Deep Gravity, and Non-linear Gravity) has also been collected in our benchmark.
In response to your valuable suggestion, we add TransFlower that is suggested by you. The experimental results are added into Table 4 and corresponding introduction and analysis of the model are added into Section 4. For your convenience, we include a part of Table 4 below.
| Model | CPC ↑ | RMSE ↓ | NRMSE ↓ | inflow ↓ | outflow ↓ | ODflow ↓ |
|---|---|---|---|---|---|---|
| DGM | 0.431 | 92.9 | 1.186 | 0.469 | 0.561 | 0.230 |
| Transflower | 0.488 | 97.8 | 1.249 | 0.356 | 0.337 | 0.269 |
| WEDAN | 0.593 | 68.6 | 0.876 | 0.291 | 0.269 | 0.147 |
We can see that TransFlower incorporates a Transformer network structure based on DeepGravity, resulting in a significant performance improvement on CPC compared to DeepGravity. However, WEDAN, which adopts a graph generative modeling framework combined with a Transformer-based denoising network, still demonstrates a substantial performance advantage over TransFlower.
W5: Questionable Significance of Claims
Are the claims regarding the significance of the research truly justified? Without this dataset, researchers in the field might only need about a week to conduct their own benchmarks. Additionally, there are already open-source frameworks, like LibCity that have collected these benchmarks.
Response: We sincerely appreciate the reviewer’s thoughtful comments and opportunity to clarify the significance of our work. Below, we address the concerns regarding the justification of our claims:
- Extensive Effort in Dataset Construction and Benchmarking: The dataset we introduce involves substantial effort in data collection, cleaning, alignment, and preprocessing. It covers the entire United States, spanning diverse geographic areas (covering 9,372,610 km²) and urban environments (including 3,333 diverse urban areas), and integrates multiple reliable data sources. The final dataset exceeds 3.5GB. Constructing this dataset required addressing numerous technical and logistical challenges to align, clean, and filter the data semantically and ensure its accuracy and usability. Additionally, we designed and implemented a comprehensive benchmark that evaluates a wide range of models, including physical models, statistical models, deep learning approaches, and generative models. This process demanded not only significant domain expertise but also considerable time and computational resources. Replicating the scale and comprehensiveness of our work would be highly challenging within such a short time.
- Limitations of Existing Frameworks like LibCity in OD Flow Generation: While works like LibCity offer valuable contributions in related domains, they primarily aggregate fragmented datasets from other different works solving different problems. These datasets are not specifically curated or structured for commuting OD flow generation. Specifically, LibCity includes only one dataset, NYCTAXI_OD, with origin and destination information, which is limited to a single city (New York City) and a specific mode (taxi). In contrast, our dataset provides a systematically organized, large-scale collection of commuting OD flows covering the entire United States, including diverse urban areas and a variety of commuting modes. This comprehensive scope fills a critical gap in the field and enables research that existing benchmarks cannot support.
W2: Insufficient Exploration of Edge Cases
The paper would benefit from a more in-depth analysis of how the models perform in extreme or atypical urban environments within the dataset. This could reveal potential limitations or areas for improvement.
Response: We agree that analyzing edge cases is crucial to understanding model robustness. Following your great suggestion, we design experiments to evaluate their performance on extreme large OD flows, to evaluate whether the models in our benchmark demonstrate good robustness. These extreme large OD flows account for only a small fraction of the total flows but are highly impactful in capturing anomalous or uncommon mobility patterns. Specifically, we measured the percentage of CPC on the top 5% largest OD flows in the test set relative to the overall CPC reported in Table 4. The results are presented in Figure 15 and detailed analyses are added in Appendix B.2 of the revised manuscript.
From the results, we observe that as model complexity increases, the ability to handle edge cases also improves, likely due to stronger nonlinear fitting capabilities. Graph generative models perform best on the edge cases. However, these models continuously and smoothly model the distribution of commuting OD flows in urban spaces within the latent space. Therefore, for edge cases, performance degradation is still observed to some extent. This is partly due to the strong long-tailed distribution of OD flows, where only a small number of extremely large flows are present, making it difficult to collect sufficient training data for these cases. We can conclude that the robustness of edge cases remains a challenge for such continuous modeling approaches in this field.
W3: Lack of Discussion on Model Fairness and Bias
Given the diverse nature of the dataset, an analysis of potential biases in the models' predictions across different urban environments (e.g., rural vs. urban, high-income vs. low-income areas) would be beneficial.
Response: Thank you for highlighting this important point. We agree that fairness is a crucial yet often overlooked aspect in the previous research on commuting OD flow modeling. Following your suggestion, we conducted experiments to evaluate model performance across regions with different income levels for all benchmark methods, using metrics such as Demographic Parity as proposed in the paper you referenced in W4. Detailed descriptions of the experiments and in-depth discussions have been added to Appendix B.3 in the revised manuscript.
We observed that, based on the Demographic Parity metric, data-driven methods exhibited better fairness performance. This could be attributed to their ability to incorporate more diverse urban features, thereby being less influenced by population-based confounding factors.
However, when visualizing the CPC distributions across regions with different income levels in Figure 16, we found that while the accuracy metrics do not indicate that the model is more accurate for one group of samples, the overall CPC distributions between different groups still exhibit differences. As such, fairness evaluations solely based on Demographic Parity may not provide conclusive insights. We are excited to note that ensuring fairness in OD flow models is not only an interesting but also a highly challenging research direction.
W1: Limited Exploration of Model Interpretability
The paper emphasizes model performance but lacks a thorough examination of model interpretability, particularly for the network-based generative models that achieve the best results. Understanding the reasons behind their success could offer valuable insights for urban planners and policymakers.
Response: We appreciate the reviewer's feedback on model interpretability. Deeper interpretability analyses could strengthen the work. Following your feedback, we include the discussion on model interpretability and conduct additional experiments to provide insights into the operation of all models in our benchmark. Specifically, we discuss the physical intuition behind the physical models, focusing on the meaning of the variables in their formulas and the interpretation of their parameters. For SVR, we employ the authoritative method of visualizing support vectors to reveal the intuition behind its predictions. For tree-based models, we quantify feature importance by calculating each feature’s contribution to reducing impurity (e.g., Gini index or information gain) during splits, a widely accepted standard in the field. For NN-based predictive models, we leverage the SHapley Additive exPlanations (SHAP) method, along with commonly used feature visualization techniques, to provide a deeper understanding of their behavior. Finally, for graph diffusion models, we visualize their denoising process to trace how they generate OD flow data, which is widely used. The discussion and details on experiments are added in Appendix B.1 the revised manuscript. We summarize the conclusion as follows:
- Simpler and more elegant models tend to have stronger interpretability, but their performance is often limited.
- Complex modeling can improve performance but at the cost of reduced interpretability.
- Exploring interpretable complex models or techniques, such as DeepGravity and TransFlower, to balance performance and interpretability is an important direction for future research.
We attribute the strong performance of network-based generative models to their ability to capture city-level characteristics more effectively, thereby enhancing the consistency of OD flows across the entire city. Specifically, training these models on our large-scale dataset, which includes data from numerous cities, enables them to model the commonalities and differences in mobility patterns across cities at a higher, holistic level. This can be observed in Table 4, where graph diffusion-based models demonstrate distributions at both the node level and edge level that closely align with the real-world data while others do not. This will help the models achieve greater consistency without being affected by conflicting data from different cities. For example, OD pairs with similar features might exhibit different OD flows across cities due to city-wide influences, and these models are better equipped to handle such variations.
Dear Reviewer Zri5,
As the rebuttal period draws to a close in the next two days, we wanted to kindly check if our responses have adequately addressed your concerns. Following your suggestions, we have incorporated experiments with the latest suggested models TransFlower and conducted evaluations across three dimensions: interpretability, robustness, and fairness for all models in the benchmark. Additionally, we have elaborated on the significance of the dataset benchmark and the insights gained from the experimental results for future research directions.
If you have any remaining concerns, please let us know at your earliest convenience. Otherwise, we kindly ask if you would consider revisiting your score.
Thank you once again for your valuable time and efforts in helping us improve our work.
We look forward to hearing from you!
Best regards,
Authors
Dear Reviewer Zri5,
We hope this message finds you well. Your thoughtful feedback has been immensely helpful in improving our work, and we sincerely appreciate the valuable insights you have shared with us.
Nearly a week has passed, and as the rebuttal period is nearing its end, we would greatly value the opportunity to engage in a deeper discussion. Any additional suggestions or questions you might have would be truly invaluable in helping us further refine our research.
Best regards,
Authors
We sincerely thank all the reviewers for their valuable and constructive comments, as well as their detailed suggestions that have significantly improved the quality of our work. We are pleased that most reviewers acknowledged our key contributions, including the introduction of the large-scale dataset LargeCommutingOD, the comprehensive benchmarking of commuting OD flow generation models, and the insights into the potential of introducing network-based generative models.
Contributions and Significance of the Dataset and Benchmark
Reviewers (e.g., Reviewers Zri5, KbsU, and Rw5X) emphasized the originality and value of our dataset. We are glad the reviewers appreciated that LargeCommutingOD fills a critical gap by covering diverse urban environments across the U.S., enabling comprehensive research in commuting OD flow modeling. Additionally, as Reviewer KbsU highlighted, the benchmark offers a systematic and standardized framework for evaluating a variety of models, paving the way for more robust and generalizable techniques.
Detailed Responses to Key Concerns
- Inclusion of Advanced Models (Reviewer Zri5) We have incorporated an additional advanced model, TransFlower, into our benchmark. Results indicate that while TransFlower improves over traditional models, WEDAN still achieves superior generalization and precision thanks to the modeling from a network perspective, as shown in Table 4 of the revised manuscript.
- Model Interpretability, Robustness, and Fairness (Reviewer Zri5) We have conducted extensive experiments to enhance interpretability, robustness, and fairness analyses for the models in our benchmark. For interpretability, we employed SHAP and visualization from diverse perspectives to analyze model behavior; for edge cases, we evaluated model performance on extreme OD flows; and for fairness, we analyzed biases across income levels, finding that data-driven models generally performed better but still showed CPC distribution disparities. These findings underscore the need for further research in balancing performance, interpretability, and fairness.
- Deeper dataset analysis (Reviewers NxhM) For data granularity, we have clarified the dataset’s spatial division and accepted the suggestion to add the limitation regarding the lack of fine-grained dynamics beyond commuting into the manuscript. For sparsity, we visualize OD matrices using heatmaps and analyze variations across different city types, incorporating these insights as Figure 8 in Appendix A.1.
- Deeper Model Analysis (Reviewer KbsU) We provided detailed explanations for model performance analyses shown in Figures 6 and 7, highlighting how city size and structure affect model performance (Appendix C.5 and C.6).
- Global Transferability (Reviewers KbsU) To explore whether models trained on our comprehensive dataset have cross-country transferability, we conducted experiments using U.K. data. Results showed that models trained on U.S. data exhibit reasonable performance, and these insights are detailed in Appendix C.4.
We hope these updates address the reviewers' concerns and further clarify the significance of our contributions. Once again, we extend our heartfelt gratitude to all reviewers for their insightful feedback and constructive suggestions.
Best regards,
Authors
This paper introduces a large-scale dataset called LargeCommuingOD for commuting Origin-Destination (OD) flow generation and provides a benchmark for evaluating models. All the reviewers think this is an important work in this domain. The paper is mostly well written and clearly presented. The reviewers also provided some comments for the authors to further improve the paper. The authors should follow the comments and make modifications accordingly in the camera ready version.
审稿人讨论附加意见
The authors' rebuttal has addressed some concerns of the reviewers, and the reviewers raised their scores accordingly.
Accept (Poster)