SPOT-Trip: Dual-Preference Driven Out-of-Town Trip Recommendation
摘要
评审与讨论
The work proposes SPOT-Trip, a novel framework for out-of-town trip recommendation, whose goal is to generate personalized POI trajectories for users traveling from their hometowns to unfamiliar regions with contextual constraints such as origin, destination, and trip duration. The paper first identifies two major challenges: the sparsity of out-of-town check-in data and the dual nature of user preferences—comprising both static and dynamic interests. To mitigate the data sparsity and model static preferences, a POI attribute knowledge graph is constructed, enabling relation-aware aggregation. To capture dynamic and time-evolving preferences, SPOT-Trip leverages neural ODEs and temporal point processes. Finally, a static-dynamic fusion module is introduced to integrate both aspects of user preference for effective recommendation.
优缺点分析
Strengths:
S1.The paper addresses a practical and unexplored problem, out-of-town trip recommendation, with clear consideration of real-world constraints such as origin, destination, and trip duration, which has meaningful implications for enhancing user travel planning.
S2.The proposed SPOT-Trip framework is technically sound and novel. It introduces several innovative components, including a POI attribute knowledge graph for static preferences and neural ODEs with temporal point processes for modeling dynamic preferences.
S3.The manuscript is well-written and logically organized, which facilitates understanding.
S4.The experiments are comprehensive and well-structured, covering five different types of evaluations that assess various aspects of the proposed method. Notably, the case study is presented in an intuitive and visually engaging manner, effectively demonstrating the practical benefits of the model. The implementation details are clearly described to ensure reproducibility.
Weaknesses:
W1.The authors employ different activation functions (e.g., LeakyReLU in Eqs. (1) and (10) and SiLU in Eq. (2)), but the reason behind these choices is not clearly explained. It is recommended that the authors provide further justification or empirical evidence to support the use of these specific activations.
W2.Some redundant statements exist, which are suggested to be corrected. For instance, lines 599-600 contain redundant descriptions regarding baseline methods. It would be better to conduct a thorough proofreading of the entire manuscript to eliminate such errors.
W3.More recent related work are suggested to be included as follows [1,2,3].
[1] Evidential stochastic differential equations for time-aware sequential recommendation. NeurIPS 2024.
[2] Unleashing the power of knowledge graph for recommendation via invariant learning. WWW 2024.
[3] DR-VAE: Debiased and Representation-enhanced Variational Autoencoder for Collaborative Recommendation. AAAI 2025.
问题
Q1.Could the authors provide more detailed information about the datasets used, particularly the time span of the Yelp dataset, to help readers better understand and preprocess the raw data?
Q2.Is it possible to extend the SPOT-Trip framework to generate top‑K candidate trajectories, which may help enhance user choice and recommendation diversity?
局限性
Yes
最终评判理由
The authors have satisfactorily addressed my concerns.
格式问题
There are no formatting concern in the paper.
Thank you for your thoughtful and encouraging review. We sincerely appreciate your recognition of the novelty, technical soundness, and practical relevance of our work, as well as your positive comments on the manuscript’s clarity, experimental rigor, and reproducibility. We also thank you for your valuable time and effort in reviewing our paper. Below, we carefully address the weaknesses and questions you raised.
Response to Weaknesses: W1: Thank you for your suggestion. We have added a replacement-based ablation study to clarify the activation function choices. Specifically, we denote the activations in Eqs. (1), (2), and (10) as , , and , and conduct exhaustive experiments by replacing each with LeakyReLU, SiLU, or GELU. The results are summarized in the three tables provided. Although some alternative combinations slightly outperform our original settings on individual metrics, our chosen configuration still achieves consistently strong performance across both datasets and metrics. Therefore, we retain the original setup.
== LeakyReLU:
| Foursquare- | Foursquare- | Yelp- | Yelp- | ||
|---|---|---|---|---|---|
| LeakyReLU | LeakyReLU | 0.0386 | 0.0094 | 0.0378 | 0.0175 |
| LeakyReLU | SiLU | 0.0391 | 0.0101 | 0.0384 | 0.0181 |
| LeakyReLU | GELU | 0.0407 | 0.0097 | 0.0391 | 0.0178 |
| SiLU | LeakyReLU | 0.0400 | 0.0109 | 0.0399 | 0.0190 |
| SiLU | SiLU | 0.0397 | 0.0105 | 0.0394 | 0.0182 |
| SiLU | GELU | 0.0391 | 0.0103 | 0.0389 | 0.0179 |
| GELU | LeakyReLU | 0.0381 | 0.0085 | 0.0387 | 0.0193 |
| GELU | SiLU | 0.0387 | 0.0100 | 0.0391 | 0.0181 |
| GELU | GELU | 0.0381 | 0.0097 | 0.0392 | 0.0185 |
== SiLU:
| Foursquare- | Foursquare- | Yelp- | Yelp- | ||
|---|---|---|---|---|---|
| LeakyReLU | LeakyReLU | 0.0378 | 0.0095 | 0.0381 | 0.0187 |
| LeakyReLU | SiLU | 0.0389 | 0.0101 | 0.0394 | 0.0185 |
| LeakyReLU | GELU | 0.0389 | 0.0104 | 0.0398 | 0.0181 |
| SiLU | LeakyReLU | 0.0402 | 0.0107 | 0.0380 | 0.0184 |
| SiLU | SiLU | 0.0401 | 0.0103 | 0.0389 | 0.0181 |
| SiLU | GELU | 0.0392 | 0.0108 | 0.0379 | 0.0178 |
| GELU | LeakyReLU | 0.0384 | 0.0102 | 0.0385 | 0.0184 |
| GELU | SiLU | 0.0397 | 0.0103 | 0.0381 | 0.0175 |
| GELU | GELU | 0.0391 | 0.0103 | 0.0394 | 0.0191 |
== GELU:
| Foursquare- | Foursquare- | Yelp- | Yelp- | ||
|---|---|---|---|---|---|
| LeakyReLU | LeakyReLU | 0.0395 | 0.0108 | 0.0391 | 0.0179 |
| LeakyReLU | SiLU | 0.0381 | 0.0103 | 0.0394 | 0.0187 |
| LeakyReLU | GELU | 0.0394 | 0.0101 | 0.0397 | 0.0195 |
| SiLU | LeakyReLU | 0.0384 | 0.0107 | 0.0378 | 0.0178 |
| SiLU | SiLU | 0.0389 | 0.0102 | 0.0394 | 0.0184 |
| SiLU | GELU | 0.0394 | 0.0095 | 0.0388 | 0.0190 |
| GELU | LeakyReLU | 0.0381 | 0.0084 | 0.0398 | 0.0185 |
| GELU | SiLU | 0.0375 | 0.0098 | 0.0387 | 0.0183 |
| GELU | GELU | 0.0391 | 0.0105 | 0.0391 | 0.0188 |
We will include this new set of results and corresponding discussion in the revised appendix.
W2: Thank you for pointing this out. We will revise the mentioned lines and carefully proofread the entire manuscript to eliminate redundant statements and improve clarity.
W3: Thank you for the helpful suggestions. We acknowledge that the suggested recent works exhibit partial similarities with components of our framework. We will include these works [1–3] in the revised version and elaborate on their connections and distinctions with our method in the related work section.
Response to Questions:
Q1: Thank you for the valuable suggestion. The Yelp dataset covers data up to the year 2021, and the timestamps in the dataset (e.g., 1559768993) are recorded as Unix epoch time in seconds, which can be converted to standard date-time format for preprocessing. To improve clarity for readers, we will include these detailed descriptions in the revised manuscript.
Q2: Thank you for the insightful suggestion. Extending the SPOT-Trip framework to generate top-K candidate trajectories would indeed require incorporating appropriate search and trajectory scoring strategies to balance diversity and relevance. We consider this an important direction for future work and plan to explore it in subsequent research.
Thank you for the detailed response. The authors have satisfactorily addressed my concerns. I have also reviewed the authors’ latest reply to reviewer Mree and believe that the additional concerns have been adequately resolved. In particular, the authors reaffirmed the main contributions of their work, clarified the handling of time information at different stages and highlighted the advantage of using Neural ODEs under such conditions, and explained why traditional ranking-based metrics are not suitable for their sequence-level generation task. I have no more questions regarding the paper, and therefore keep strong positive attitude towards this paper.
Thank you very much for your thoughtful and supportive response. We truly appreciate your recognition of our efforts to address the raised concerns, as well as your constructive suggestions throughout the review process. Your feedback has been invaluable in helping us improve the clarity and quality of our work.
This paper proposes SPOT-Trip, a dual-preference (static and dynamic) recommendation framework for out-of-town travel, which addresses the challenges of data sparsity and user interest drift. Extensive experiments on real-world datasets demonstrate that SPOT-Trip significantly outperforms previous baseline models.
优缺点分析
Strengths:
-
The motivation of the paper is reasonable.
-
The methodology is described in detail.
-
The case study effectively demonstrates the effectiveness of the proposed method.
Weakness:
-
The paper needs to be carefully reviewed for details. For example, symbols like d are not explained, and formulas such as (3) and (8) are missing punctuation marks.
-
Some redundant content in Sections 1 and 3 could be streamlined. Completely removing the Related Work section from the main body is generally unacceptable. In addition, important statistics such as the size of the knowledge graph should be presented in the main text.
-
The proposed method is relatively complex, and it is unclear how efficient it is computationally. Some components also seem unnecessarily complicated—for example, the Static Aggregator could be replaced by simple mean pooling. It's unclear why a new concept is introduced here. Furthermore, the paper lacks comparisons with some standard out-of-town recommendation baselines.
-
I could not find the PPROC method mentioned in [21].
-
There appears to be no ablation study for the Static-Dynamic Preference Fusion module.
问题
-
See weakness.
-
The paper claims this is a new task. Does that mean both trip recommendation and out-of-town recommendation have been studied individually, but their combination has not been explored before?
-
According to Figure 1, I don’t quite understand the motivation for separating hometown check-in sequences from out-of-town ones. Doesn’t this split a complete trajectory into disjoint parts? For instance, if a user lives in the city and commutes to the suburbs for work, would those commuting behaviors be treated as part of the out-of-town trips? Or did the authors filter out such cases?
-
“However, due to the irregular sampling [32] of check-in data, these approaches fail to capture the dynamic evolution of user preferences over actual time. Given the success of neural ODE [8] in other research fields [30, 21], we develop a neural ODE to model the continuous dynamic drift of user out-of-town preferences in latent space.” — I’m not sure I fully understand this. For instance, wouldn’t it be simpler and more intuitive to compute the time gap and append it to the input in an RNN or Transformer-based model? Why is an ODE necessary here? Is it used just because other works have used it? What exactly are the advantages of using ODEs in this context?
-
From Table 1, the best performance is only around 0.04. Is this practically meaningful? In prior work on next POI prediction based on Foursquare, the typical Acc@1 can reach around 20%. Why didn’t the paper use accuracy as a metric? Also, why not report standard recommendation metrics such as NDCG or MRR?
-
How do you distinguish between hometown and out-of-town regions? How is a user’s hometown identified? Did the authors use the Foursquare category name (e.g., “home”)?
-
I'm a bit confused. Based on the statistics, the number of hometown check-ins is far greater than that of out-of-town check-ins. Does this mean traditional next POI recommendation methods are sufficient for hometown data, while a separate method must be developed specifically for out-of-town cases? Where exactly does the necessity of this distinction come from?
-
I’m curious whether it is possible to visualize where SPOT-Trip underperforms—for example, in particular types of trips or specific regions. People’s behavior likely differs significantly across countries and regions.
-
For trip recommendation, a natural thought is to add feasibility constraints, such as start time or economic viability of transportation. Does this mean current methods are still far from being truly applicable in real-world use cases?
局限性
The paper only discusses technical limitations but does not address potential negative societal impacts. Since this work requires identifying users' hometowns, I believe it could potentially pose certain negative implications for user privacy.
最终评判理由
Only some of the concerns were resolved
格式问题
N/A
Thank you for your constructive and detailed feedback on our paper. We sincerely appreciate your time and effort in reviewing our manuscript. We are grateful for your recognition of the motivation, methodology, and case study design. Below, we will address the weaknesses and questions you raised.
Response to Weaknesses:
W1: Thanks for pointing this out. We apologize for the oversight. In the revised version, we have carefully proofread the manuscript and added clear definitions for all symbols, including d (i.e., the dimension of the hidden representation), and corrected the punctuation in all equations, including Eq. (3) and Eq. (8).
W2: Thanks for the insightful suggestions. In the revised version, we will streamline Sections 1 and 3 to remove redundant content and improve clarity. Additionally, we agree that the Related Work section is important and will move it from the appendix to the main text. The statistics of the knowledge graph are already provided in the appendix--we will also relocate them to the main text for better visibility.
W3: Thanks for pointing this out. As you suggested, the proposed Static Aggregator indeed functions as a mean pooling mechanism. To avoid introducing an unnecessary new concept, we will rename it as Mean Aggregator in the revised version. Regarding the baseline comparison, many existing out-of-town recommendation models [1][2][3] focus on next POI prediction rather than full trip generation, and thus are not directly applicable to our task. KDDC [3] is a relatively suitable and SOTA approach among these methods, so we have adopted its knowledge-enhanced module in our Base framework (Base+KDDC) as a comparative baseline.
W4: We apologize for the confusion. The paper [4] does not explicitly name its method, so we referred to it as PPROC based on the name used in its official GitHub repository. We will clarify this naming choice in the baseline descriptions to avoid misunderstanding.
W5: In fact, removing the Static-Dynamic Preference Fusion module is equivalent to discarding both the Knowledge-Enhanced Static Preference Learning and the ODE-Based Dynamic Preference Learning components (i.e., the Base baseline reported in our paper). To address your concern, we will explicitly clarify this equivalence in the revised version.
Response to Questions:
Q2: Yes, your understanding is correct. Existing trip recommendation methods [4][5] generally do not consider user check-in behaviors in their hometown, while existing out-of-town recommendation works [1][2][3] are mostly limited to next POI prediction in an unfamiliar city. To the best of our knowledge, no prior work has addressed the generation of a full trip in an out-of-town scenario. This is the gap our work aims to fill.
Q3: Thanks for the question. Our motivation for separating hometown and out-of-town check-in sequences follows prior out-of-town recommendation works [1][2][3], which emphasizes the interest drift between users' hometown preferences and their behaviors in unfamiliar cities. The goal of out-of-town recommendation is to suggest POIs that users might be interested in when traveling in out-of-town regions. Moreover, the regions in our study are defined exactly as annotated in the original datasets (Foursquare and Yelp), without any modification. For example, users whose hometown is Los Angeles and who travel to San Diego are considered to be on out-of-town travel. The case reflects actual travel behaviors rather than routine suburban commutes. Therefore, daily commuting--such as traveling from a city to nearby suburbs--is unlikely to be captured as out-of-town behavior, given the dataset's regional granularity. For more details on how we distinguish the hometown and out-of-town regions, please refer to our response to Q6.
Q4: Thanks for the question. In our task, the time intervals between check-ins during a user’s out-of-town trip are irregular and unknown. Therefore, we cannot directly use actual time gaps as input. RNNs and Transformers generally assume that input sequences are equally spaced or have known time gaps. When we use step indices corresponding to the steps to be predicted (e.g., POI 1 → POI 2 → POI 3), these models may implicitly assume a constant rate of preference change, which does not reflect the true dynamics of user behavior during travel. In contrast, our use of Neural ODEs allows the model to learn nonlinear, continuous preference transitions over a latent time dimension. This enables more flexible modeling of preference drift at variable speeds between steps. Empirically, this design also leads to better performance (see below or our response to reviewer 8nVL Q3), confirming the advantage of Neural ODEs in handling such irregular, implicit temporal dynamics.
| Method | Foursquare- | Foursquare- | Yelp- | Yelp- |
|---|---|---|---|---|
| SPOT-Trip | 0.0400 | 0.0109 | 0.0399 | 0.0190 |
| Base+KSPL+Transformer | 0.0362 | 0.0063 | 0.0370 | 0.0172 |
| Base+KSPL+GRU | 0.0341 | 0.0070 | 0.0358 | 0.0169 |
Q5: Thank you for the question. The value may appear low at first glance. However, our task is fundamentally more challenging than next-POI prediction. Rather than predicting just the next check-in point, our model must generate an entire out-of-town trip and match multiple POIs in both content and exact position. This position-wise alignment across a sequence significantly increases difficulty, especially under real-world data sparsity. Despite the difficulty, compared to prior trip recommendation approaches, our method achieves a significant performance improvement under the same evaluation metrics. Regarding ACC, NDCG, and MRR using in traditional next POI prediction: These metrics usually rely on negative sampling, which are not applied in trip recommendation evaluation. More importantly, they are ranking-based and focus on whether the correct POI appears near the top of a list--not on position-wise sequence matching, which is essential in our trip recommendation task. Our goal is to assess whether the predicted sequence aligns step-by-step with the ground-truth travel trajectory. Therefore, following prior trip recommendation work [5][6], we use and , which better reflect sequential prediction quality.
Q6&Concern about user privacy in Limitations: Thanks for the question. As mentioned in the dataset section of our paper, we follow the hometown discovery approach introduced in prior work [2] to distinguish between a user's hometown and out-of-town regions. In brief, it first extracts a user's check-in sequences across different regions over continuous time windows. Then, based on factors such as the number of check-ins and time span, it assigns a score to regions and selects the highest-scoring one as the user's hometown. Detailed examples can be found in the dataset of our released code. As for location categories, we do make use of semantic POI types from the dataset (e.g., Restaurant, Mall), but the ‘home’ does not appear in the available taxonomy and is not used for identifying hometowns. Regarding the concern about user privacy, we would like to clarify that, consistent with prior out-of-town recommendation works [1][2][3], all user data used in this study has been fully anonymized. No personally identifiable information is involved, and the hometown identification is performed at the region level based solely on anonymous user IDs and aggregated check-in behavior.
Q7: As noted in our response to Q3, the motivation for treating out-of-town recommendation as a separate task lies in the drift between users’ preferences in their hometown and in unfamiliar regions. Traditional next-POI methods [7][8] typically model a user's check-ins within a single region (e.g., New York) and recommend the next POI in that same region. However, when users travel to a new region (e.g., from Los Angeles to San Diego), these models often fail to generalize due to region-specific patterns. While such methods may work for hometown prediction, they are insufficient for out-of-town scenarios, where behavior is driven more by travel preferences. Hence, distinguishing between hometown and out-of-town regions is necessary to capture this interest shift, as emphasized in prior works [1][2][3].
Q8: Thank you for the suggestion. We agree that user behaviors vary across regions, and visualizing underperformance is important. In the appendix (Case Study), we show that SPOT-Trip may generate repetitive POIs in some trips, reflecting a limitation in diversity modeling. While we do not conduct a systematic regional error analysis in this work, we see this as a valuable direction for future research.
Q9: Our work focuses on modeling user preferences in trip recommendation. Incorporating feasibility constraints (e.g., time, cost) is important for real-world use but typically requires extra data sources. We consider this an important direction and believe that combining preference learning with constraint-aware modeling is a promising future path.
Reference:
[1] Xin, Haoran, et al. "Out-of-town recommendation with travel intention modeling." AAAI. 2021.
[2] Xin, Haoran, et al. "Captor: A crowd-aware pre-travel recommender system for out-of-town users." SIGIR. 2022.
[3] Liu, Yinghui, et al. "KDDC: Knowledge-driven disentangled causal metric learning for pre-travel out-of-town recommendation." IJCAI. 2024.
[4] Iakovlev, Valerii, and Harri Lähdesmäki. "Learning Spatiotemporal Dynamical Systems from Point Process Observations." ICLR, 2025.
[5] Zhang, Jiale, et al. "Encoder-decoder based route generation model for flexible travel recommendation." TSC (2024).
[6] Shu, Wenzheng, et al. "Analyzing and mitigating repetitions in trip recommendation." SIGIR. 2024.
[7] Qin, Yifang, et al. "A diffusion model for POI recommendation." TOIS (2023).
[8] Zhou, Hailun, et al. "Disentangled Graph Debiasing for Next POI Recommendation." SIGIR. 2025.
Thank you for your response. The authors have partially addressed my concerns. However, regarding Q3, although the authors follow the settings of [1,2,3], I personally consider this a simplification of the task. What the authors should do is improve upon prior work, not merely follow it.
I also disagree with the response to Q4. While it is true that the time intervals between user check-ins during trips can be irregular, they are not unknown. The Foursquare dataset provides precise timestamps for each check-in. Many POI recommendation models, such as GETNext [a] and GraphFlashback [b], also use Transformers or RNNs without assuming equal or known time intervals between input steps.
As for Q5, the statement “Regarding ACC, NDCG, and MRR used in traditional next POI prediction: These metrics usually rely on negative sampling” is simply incorrect. I confirm that these metrics do not rely on negative sampling.
In conclusion, I will maintain my original score.
[a] Yang, Song, Jiamou Liu, and Kaiqi Zhao. "GETNext: Trajectory flow map enhanced transformer for next POI recommendation." Proceedings of the 45th International ACM SIGIR Conference on research and development in information retrieval. 2022.
[b] Rao, Xuan, et al. "Graph-flashback network for next location recommendation." Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining. 2022.
We sincerely thank the reviewers for their timely responses. We would like to take this opportunity to clarify a few remaining concerns raised as follows.
More concerns about Q3: Thank you for the feedback. In the current study, we adopt the established distinction between hometown and out-of-town regions primarily to maintain consistency and comparability with prior research, which also facilitates fair evaluation. Our main contribution lies in extending the out-of-town recommendation task to the more challenging out-of-town trip recommendation task, and proposing a novel method to address it. Exploring more fine-grained user behavior segmentation methods--such as leveraging multimodal large models--would be highly beneficial for advancing this field, and we plan to consider such directions in our future work.
More concerns about Q4: Thank you for your comment. We believe some clarification is needed, as our original explanation may have been too brief due to space limitations.
Our task consists of two stages: model training and recommendation (inference). During training, the time intervals between check-ins in both regions are known and can be used as input. However, during recommendation, only the user’s hometown history is accessible. The timestamps of future out-of-town check-ins are unknown, making it impossible to use real time gaps as input. Consequently, the model must rely on discrete step indices (i.e., equal intervals) when generating out-of-town recommendations.
This leads to a discrepancy: RNNs or Transformers may leverage actual time gaps during training, but must operate with uniform out-of-town step indices at inference time. Such inconsistency can introduce temporal mismatch and potentially degrade performance, as these models are sensitive to the encoding of time.
In contrast, Neural ODEs model user preference transitions as continuous functions over latent time, which enables smoother generalization even when explicit intervals are unavailable during recommendation. This inductive bias toward continuous dynamics proves empirically beneficial in our task, as demonstrated in our ablation studies.
More concerns about Q5: Thank you for the comment. We would like to further clarify some misunderstandings. Acc@k, NDCG, and MRR indeed do not inherently rely on negative sampling. What we intended to convey is that some prior works often employ negative sampling when dealing with large candidate sets, in order to compute these metric values. Nevertheless, this was also not our main reason for not choosing these metrics. Our core point is that these ranking-based metrics are not well-suited for evaluating sequence-level prediction, which is the focus of our task. Specifically, our model, like the baselines we compare against, generates entire POI sequences through sampling-based methods (e.g., top-p (nucleus) sampling), rather than producing explicit top-k ranked lists. Therefore, sequence-level metrics such as F1 and PairsF1 provide a more appropriate and fair evaluation of the generated trajectories in this study.
We again sincerely thank your for carefully reading our article and providing us with useful comments and helpful criticism. We will be glad to address any further concerns if you arise.
Thank you for your further feedback. For Q3 and Q5, I double check your rebuttal reference. Based on your rebuttal, [1][2][3] are out-of-town recommendation models and [4][5] (should be [5][6]) are trip recommendation methods. Only paper [5] uses the same metric F1, and pairs-F1 as this paper. Other papers use a lot of different metrics like Acc, NDCG, Recall, etc. In this paper, the score of pairs-F1 is very low, I think it is meaningless. I can't understand why other papers can use Acc, NDCD, but this paper is not suitable. The rebuttal shows the novelty of this paper is the combination of out-of-town recommendation and trip recommendation. Therefore, I cannot agree with your explanation. I think it is the lack of comparison of relevant metrics. At the same time, I also noticed that other papers, such as [1] and [3], report results above 20% on certain metrics. I believe that under some task settings, such values indicate that the model is reasonably effective. However, in this paper, both the F1 and pairs-F1 scores—regardless of whether it's the baseline or the proposed method—are very low. It gives me the impression that simply guessing one or two results correctly might outperform the proposed method. To put it another way, does this task setup even make sense? With such low values, the results hardly seem meaningful or practically useful. But for the existing works [1–6], based on the values they report, I think their results are, to some extent, usable in real-world applications.
As for Q4, I’m a bit confused. Isn’t the input supposed to include the starting and ending POIs? Then why can't time information be used during recommendation? Even in the test set, given the check-ins for the start and end POIs, wouldn’t their timestamps still be available? As I understand it, the goal of this paper is to recommend the intermediate POIs. So, I believe time information can and should be incorporated during inference as well.
Thanks for the responses. We would like to further clarify the fundamental differences between out-of-town POI recommendation and trip recommendation as follows.
Traditional out-of-town POI recommendation often predicts the next POI located outside a user’s hometown. Models designed for this task generally produce a ranked list of candidate POIs for the next location, thereby enabling the use of ranking-based evaluation metrics, such as Acc@k, NDCG@k. These metrics all rely on the availability of a ranked list. For instance, NDCG@k evaluates the quality of the recommendation ranking by assigning higher gains to correct items that appear earlier in the list, thus emphasizing the importance of placing relevant items in top positions.
In contrast, trip recommendation requires the generation of intermediate POIs forming a complete trip. These models typically employ sampling-based decoding strategies (e.g., greedy or beam search) over position-wise POI logits, producing the whole sequence in one shot rather than ranking the next POI in an autoregressive manner. Consequently, rank-based metrics are not applicable in this setting, as the goal is to evaluate the overall quality of the generated sequence. This study focuses on the end-to-end trip recommendation, with key innovations in leveraging hometown check-ins and addressing the distributional shift between hometown and out-of-town behaviors.
In addition to the study ‘[5]’, recent trip recommendation works [1, 2, 3] also employ the F1 and pairs-F1 metrics similar to ours. These metrics evaluate the content-level (F1) and order-level (Pairs-F1) alignment between the generated and ground-truth POI sequences, and are better suited for trip recommendation tasks.
Regarding the metric values, prior trip recommendation works such as [2] and [3] report evaluation results over the entire POI sequence, which includes known start and end points. This setup can be confirmed by examining the evaluation codes provided in their official GitHub repositories. The inclusion of ground-truth POIs at the boundaries naturally leads to higher scores. In our paper, we also report results under this evaluation setting (i.e., Full-F1 and Full-Pairs-F1), where our method achieves scores of no less than 19%.
However, we argue that including known endpoints during evaluation does not faithfully assess the model's ability to recommend the intermediate POIs, which are the core challenge of the task. Therefore, we report real results using stricter F1 and Pairs-F1, which exclude endpoints. As expected, these scores are lower but offer a more realistic and rigorous evaluation of model performance.
Moreover, our task setup corresponds to a cold-start scenario. Comparable cold-start settings in conventional recommendation tasks--such as those explored in [4] and [5]--also yield reduced metric values. This highlights the intrinsic difficulty and real-world relevance of our proposed setting.
As for Q4, we would like to clarify that our method can indeed incorporate the timestamps of the start and end POIs, and we have conducted corresponding experiments. Following [2] and [3], we encode the timestamps as hour embeddings, which are then concatenated with the query embedding as input to the recommender. However, our experiments below show that incorporating this information yields only a marginal improvement in performance. Therefore, we chose not to rely on it, in order to make our model more flexible and broadly applicable--since such timestamp information may not always be provided by users in real-world scenarios.
| Method | Foursquare- | Foursquare- | Yelp- | Yelp- |
|---|---|---|---|---|
| SPOT-Trip | 0.0400 | 0.0109 | 0.0399 | 0.0190 |
| SPOT-Trip(w time) | 0.0412 | 0.0115 | 0.0387 | 0.0194 |
When we mention that time information is unknown during recommendation, we specifically refer to the intermediate POIs. For instance, given a trip like POI0 → POI1 → POI2 → POI3, the timestamps of POI1 and POI2 are inherently unknown at inference. What we can access are only their positional indices (e.g., 1 and 2), not their exact visit times.
We hope our response could address your concerns. We are more than willing to discuss more.
Reference:
[1] Kuo, Ai-Te, Haiquan Chen, and Wei-Shinn Ku. "BERT-TRIP: Effective and scalable trip representation using attentive contrast learning." ICDE. 2023.
[2] Gao, Qiang, et al. "Dual-grained human mobility learning for location-aware trip recommendation with spatial-temporal graph knowledge fusion." Information Fusion (2023).
[3] Shu, Wenzheng, et al. "Analyzing and mitigating repetitions in trip recommendation." SIGIR. 2024.
[4] Wang, Xinfeng, et al. "EEDN: Enhanced encoder-decoder network with local and global context learning for poi recommendation." SIGIR. 2023.
[5] Yang, Yuhao, et al. "Knowledge graph self-supervised rationalization for recommendation." KDD. 2023.
Once again, we sincerely thank you for your review and the time you have devoted to our work. As the discussion period is drawing to a close, we would like to kindly ask whether our responses have addressed your concerns. We truly hope our clarifications have been helpful, and we would be glad to provide any further details should you have remaining questions.
We understand that the discussion period is short, and we sincerely appreciate your time and help!
Thank you for your feedback. I am not fully convinced by the authors’ response, as they did not add experiments regarding the metric, and the practical usefulness remains unclear. I think this is a borderline paper, but given the authors’ sincere attitude in the rebuttal, I have decided to raise my score.
Thank you very much for your thoughtful consideration and kind acknowledgment of our sincere attitude during the rebuttal process! We are grateful for the increased score and will work diligently to address the concerns you raised in the revised version.
This paper addresses the problem of recommending a sequence of places to visit in a region where they might not have travelled before. This problem is challenging due to the sparsity of data about user preferences in the new region. Their preference learning scheme models both long-term (static) preferences and short-term (dynamic) preferences, where the dynamic preferences are influenced by factors such as the time and location of the user. A method for static preference learning is proposed, which uses a knowledge-graph approach that can aggregate and align user preferences from different regions, thus addressing the data scarcity challenge. A method for dynamic preference learning is also proposed, using a novel combination of neural ODE and temporal point process approaches. Finally, a fusion framework is proposed to combine the static and dynamic preferences in order to recommend a sequence of places to visit, given the origin and destination of the requested tour. Empirical analysis shows an improvement in F1 score for the recommended places and pairs of places, in comparison to several other learning based methods.
优缺点分析
Strengths:
The paper presents a novel approach to modelling long-term and short-term user preferences when recommending points-of-interest to a user.
The paper provides an interesting knowledge-graph and embedding alignment approach to address the problem making out-of-town recommendations, where data on point-of-interest preferences is scarce.
Weaknesses:
The paper claims to be the first study of a new problem of out-of-town trip recommendation. While that might be the case for this particular style of deep learning recommenders, the paper does not seem to consider prior work on this problem that uses an optimisation approach, for example, Huang et al, “Multi-Task Travel Route Planning With a Flexible Deep Learning Framework” IEEE Transactions on Intelligent Transportation Systems 2020 or Lim et al, “Personalized Tour Recommendation based on User Interests and Points of Interest Visit Durations” IJCAI 2015, and others, which can make use of point of interest categories.
While the paper provides an empirical comparison of performance against several learning based approaches, it does not provide a comparison against some of these optimisation-based approaches that learn categories of interest and hence can be applied to new cities.
问题
Can you more carefully delineate the novelty of your work with respect to earlier methods that learn point of interest categories, which enable preferences to be learned from one region and applied to another region?
Can you empirically compare your approach against such systems in order to demonstrate its advantages?
You mention in Limitations section that your method sometimes makes repeated recommendations at proximate locations. Some previous approaches like those listed above take into consideration the diversity and distance between recommended locations, and try to design tours that reflect a given budget for time and distance. While such budgets are potentially out of scope for your current paper, can you quantify and discuss the quality of the tours that could be constructed from the recommended locations, in terms of diversity, time and distance travelled?
局限性
Please see suggestions in questions.
最终评判理由
While the authors have made a new contribution through this work, based on the discussion I consider that compared to the body of prior work in this field, the current paper is an incremental advance.
格式问题
Formatting is fine.
A minor typo at line 10: to explicitly learns -> to explicitly learn
Thank you for your detailed comments and feedback. We sincerely appreciate your time and effort in reviewing our manuscript. We will address the weaknesses and questions you raised below.
Response to Weaknesses and Questions:
W2&Q1: We agree that prior optimization-based approaches leveraging POI categories provide a way to generalize user preferences across regions. In fact, our method also makes use of POI category information, but in a more structured and expressive way. Specifically, as described in line 553 of our Appendix, we incorporate POI categories in the construction of a knowledge graph, where we generate entity-specific relations using multiple types of auxiliary information--including categories, star ratings, user reviews, and associated regions. This allows us to go beyond simple category matching and instead learn richer semantic representations of POIs through knowledge-enhanced embedding.
Moreover, relying solely on POI categories, as done in previous work, is insufficient to capture the nuanced interest drift [1] that occurs when users travel across regions--i.e., the same user may exhibit different check-in behaviors in different cities. To address this, we introduce a knowledge-enhanced preference alignment module that explicitly learns and adapts to such cross-region preference shifts. We hope this clarifies the distinction between our work and earlier methods, and highlights the novelty of our approach in addressing the unique challenges of out-of-town trip recommendation. We will include illustrative examples in the revised main text to clarify that POI categories have been incorporated into our framework.
W1&Q2: With respect to the PersTour method [2] mentioned by the reviewer, we acknowledge its relevance. However, due to its relatively early publication and the fact that it has been consistently outperformed by more recent methods such as GraphTrip [4], MatTrip [5], and AR-Trip [6] (which we already included as baselines), we initially did not include it in our comparison. Nevertheless, in response to the reviewer's suggestion, we have now additionally implemented and compared PersTour with our approach. The results are presented in the updated table below. The results further confirm the effectiveness of our proposed method over earlier approaches. We will also update Table 1 in the revised manuscript to include these new comparison results.
As for the MDTRP method [3] mentioned by the reviewer, we note that the authors have not released the source code, either in the paper or through public platforms such as GitHub or Kaggle, which makes it difficult to reproduce their approach within a short timeframe. However, we observed that the overall architecture of MDTRP is highly similar to that of MatTrip [5]--a baseline we have already included in our experiments. Both models adopt LSTM-based encoders and attention mechanisms, and both learn user interests over POI categories. As shown in Table 1 of our paper, our method consistently outperforms MatTrip, which indirectly demonstrates the advantage of our approach over methods in the style of MDTRP.
| Foursquare | Yelp | |||||||
|---|---|---|---|---|---|---|---|---|
| Method | (↑) | (↑) | -(↑) | -(↑) | (↑) | (↑) | -(↑) | -(↑) |
| PersTour [2] | 0.0258 | 0.0016 | 0.4421 | 0.1572 | 0.0251 | 0.0066 | 0.5059 | 0.2074 |
| SPOT-Trip | 0.0400 | 0.0109 | 0.4723 | 0.1960 | 0.0399 | 0.0190 | 0.5261 | 0.2347 |
Q3: Thank you for this thoughtful question. In our current task formulation, we focus on proactively generating personalized trip trajectories based on a user's historical hometown records and their issued query, which specifies a starting point, an ending point, and the desired number of intermediate POIs. While we acknowledge the importance of considering travel time and distance, not all users provide explicit budget constraints for these factors in practice. As such, our current framework does not assume such inputs, and instead recommends POI sequences based primarily on user preference modeling. Nevertheless, we agree that evaluating and optimizing tour quality in terms of diversity, total travel distance, and estimated time is a valuable future direction. In future work, we plan to incorporate additional modules to account for user-specified other constraints (e.g., time or distance budgets) and to design corresponding evaluation metrics that better capture the practical quality of generated trips.
A minor typo: Thank you for pointing out this typo. We will correct it in the revised version.
Reference:
[1] Xin, Haoran, et al. "Out-of-town recommendation with travel intention modeling." AAAI. 2021.
[2] Lim, Kwan Hui, et al. "Personalized tour recommendation based on user interests and points of interest visit durations." IJCAI. 2015.
[3] Huang, Feiran, Jie Xu, and Jian Weng. "Multi-task travel route planning with a flexible deep learning framework." TITS (2020).
[4] Gao, Qiang, et al. "Dual-grained human mobility learning for location-aware trip recommendation with spatial-temporal graph knowledge fusion." Information Fusion (2023).
[5] Zhang, Jiale, et al. "Encoder-decoder based route generation model for flexible travel recommendation." TSC (2024).
[6] Shu, Wenzheng, et al. "Analyzing and mitigating repetitions in trip recommendation." SIGIR. 2024.
Dear Reviewer q38N,
We sincerely appreciate the time and effort you have dedicated to reviewing our work. In response to your valuable feedback, we have provided detailed explanations addressing the issues you raised.
As the discussion period progresses, we would be eager to hear your thoughts on our responses--particularly whether they have adequately addressed your concerns. lf our revisions and discussions indicate the potential for a score adjustment, we would be very grateful for your consideration.
We remain committed to incorporating all of your suggestions to further improve the quality of our manuscript, and we look forward to further discussions.
Best regards,
The Authors
Thank you for responding to all of my questions.
Q1: Thank you for the clarification to this question.
Q2: The new comparison to existing work such as PersTour demonstrate the improvements of the proposed approach.
Q3: It would have been helpful to see results on metrics such as tour diversity, total travel distance, and estimated time.
Given that the response to Q3 is to consider it as future work, I am inclined to keep my scores the same.
Thank you for your response and time reviewing our work. We appreciate your positive feedback on our clarifications for Q1 and the new comparisons in Q2.
Regarding Q3, current public datasets lack information on users’ stay durations at POIs, travel distances between them, and transportation modes used, making it infeasible to compute metrics like estimated time or distance in a reliable way. Such evaluation would necessitate speculative assumptions, potentially leading to misleading conclusions about model performance. We see this as a promising future direction once higher-quality datasets become available.
As for tour diversity, there is currently no established benchmark offering standardized definitions, metrics, or annotations to support its consistent and reliable evaluation. In the absence of such resources, any assessment of diversity will increase the risk of subjective bias or misinterpretation. Therefore, in line with prior trip recommendation studies [1][2][3], our work prioritizes the generation of realistic, preference-aligned trips under practical constraints, leaving diversity evaluation for future work once better-supported benchmarks are available.
We hope our response can address your concerns. If there is potential for a score adjustment, we would be very grateful for your consideration. We also welcome any suggestions or references you might have for better ways to incorporate such evaluations under current data constraints.
Reference:
[1] Gao, Qiang, et al. "Dual-grained human mobility learning for location-aware trip recommendation with spatial-temporal graph knowledge fusion." Information Fusion (2023).
[2] Zhang, Jiale, et al. "Encoder-decoder based route generation model for flexible travel recommendation." TSC (2024).
[3] Shu, Wenzheng, et al. "Analyzing and mitigating repetitions in trip recommendation." SIGIR. 2024.
Once again, we sincerely thank you for your review and the time you have devoted to our work. As the discussion period is drawing to a close, we would like to kindly ask whether our responses have addressed your concerns. We truly hope our clarifications have been helpful, and we would be glad to provide any further details should you have remaining questions.
We understand that the discussion period is short, and we sincerely appreciate your time and help!
This paper addresses the out-of-town trip recommendation problem, which is of practical importance since users typically lack travel histories in the cities they plan to visit. The authors propose a method that first learns user representations from the source city, then transfers them to infer representations in the target city for recommendation. Temporal dynamics are encoded using Neural ODEs. The proposed model achieves SOTA results and is supported by extensive experiments and ablation studies. The paper is well-written, and the visualizations are of high quality. However, I have major concerns regarding novelty and over-claiming, particularly in comparison with a recent work KDDC [1], which appears to tackle a highly similar problem with overlapping methodology and visualizations. In its current form, this paper reads as an incremental extension of KDDC, and the claim of being the “first systematic study” seems inaccurate. [1] Liu, Yinghui, et al. "Kddc: Knowledge-driven disentangled causal metric learning for pre-travel out-of-town recommendation." Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, Jeju Island, Republic of Korea. 2024.
优缺点分析
Strengths:
-
The paper targets a practically important and realistic problem—users usually lack trajectory data in out-of-town scenarios.
-
The proposed framework is well-designed and technically sound, with each component thoughtfully integrated.
-
The model achieves SOTA performance and is evaluated through extensive experiments and ablations.
-
The paper is well-written, and its figures are aesthetically pleasing and professionally rendered.
-
The released codebase is of high quality and supports reproducibility of the main results.
Weaknesses:
-
The paper addresses the same problem as KDDC [1], which was published in IJCAI 2024. Despite this, it claims to be “the first systematic study to learn out-of-town trip recommendation,” which could be viewed as an over-claim.
-
Several figures and contents—including portions of the method diagram—bear a striking resemblance to KDDC, raising concerns about the level of novelty. As it stands, the paper feels like an incremental extension of KDDC.
问题
-
Please explicitly clarify the differences between your work and KDDC [1] in terms of problem formulation, methodology, and contributions.
-
The claim that this is “the first systematic study” on out-of-town recommendation is questionable. Please justify this statement carefully, or consider revising it to avoid misleading claims.
-
Neural ODEs can be computationally expensive to train. Have you explored alternative temporal encoding mechanisms (e.g., RNNs, transformers)? The current ablation study only removes the component but does not compare alternative encoders. This would help justify the use of Neural ODEs beyond novelty.
局限性
Yes
格式问题
NO
Thank you for the positive and encouraging evaluation of our work. We are especially grateful for your recognition of the practical value of the problem, the technical soundness of our framework, the thoroughness of our experiments, and the quality of both the writing and codebase. We also sincerely appreciate your constructive feedback and the time and effort you dedicated to reviewing our manuscript. Below, we carefully address the weaknesses and questions you raised.
Response to Weaknesses:
W1: While both our work and KDDC [1] address out-of-town scenarios, we would like to emphasize that our task formulation is fundamentally different from that of KDDC in terms of input conditions and recommendation granularity. Specifically, KDDC addresses a pre-travel decision-making problem, where the system recommends the next out-of-town POI from unknown regions to users who have not yet selected a travel destination. The goal is to suggest which POI to visit next, given the user's historical hometown records. In contrast, our work targets a post-destination trip recommendation task. We aim to recommend a sequence of POIs (i.e., a full trip trajectory) to users who have already chosen (or arrived at) a specific destination region. Our goal is to help these users plan meaningful intra-region journeys in response to user-issued queries that specify a starting point, an ending point, and a desired number of POIs to visit. As illustrated in Fig.1 of our paper, our setting reflects a significantly different application scenario that prior out-of-town recommendation works [2][3], including KDDC, do not cover, and the two problems serve users with distinct needs at different stages of their travel decision pipeline. We hope this clarification addresses the concern regarding the novelty of our problem definition.
W2: We acknowledge that a small portion of our method illustration (the design and color scheme of the semantic knowledge aggregation module) bears some resemblance to KDDC. This is because such an aggregation module is a common and useful component in knowledge-driven recommender systems [4][5]. Beyond this overlap, our methodology in technical implementation differs from KDDC's. While KDDC employs a segmented pre-training scheme and a knowledge aggregation module to enhance POI representations before applying disentangled causal learning to recommend the next out-of-town POI, our approach takes a different route. We adopt a more lightweight strategy by using TransE-based alternate training to integrate knowledge, and we restrict this component solely to modeling users’ static preferences.
More importantly, our framework introduces a novel ODE-based module to capture dynamic user preference transitions in out-of-town regions—another critical capability for our sequential trip recommendation setting. Compared to KDDC’s task, which only requires identifying a single POI that matches the user's static out-of-town preference, our task involves modeling how user preferences evolve over time and location within a destination region, making dynamic modeling essential. To better illustrate the difference, a tabular is included in our response to Q1.
Moreover, in Tables 1, 2, and 4 of our paper, we include several variants where we replace our knowledge-enhanced module with the one from KDDC. The consistent improvements across both datasets show that our approach is more effective.
Response to Questions:
Q1: To supplement our responses to W1 and W2 above, we summarize the distinctions below:
| Aspect | KDDC [1] | SPOT-Trip (Ours) |
|---|---|---|
| Problem Definition | Pre-travel out-of-town recommendation: Recommend an out-of-town POI matching user interest | Out-of-town trip recommendation: Plan a full POI sequence |
| User Input | Historical check-ins in hometown | Historical check-ins in hometown and user query (start/end points, number of POIs) |
| Recommendation Output | Next POI | Intermediate POI(s) |
| Knowledge Graph Training | Segmented pre-training | Alternative training using TransE |
| Preference Modeling | Static only | Static + dynamic |
| Technical Innovations | Serial modeling: semantic preference learning first, then disentangled causal inference | Parallel modeling: static semantic preference learning + ODE-based dynamic preference learning |
To further address your concern, we will incorporate explicit discussions of these differences in the Related Work and Baseline sections of the revised manuscript.
Q2: In addition to the above validation from the out-of-town recommendation perspective that we are the first to study the out-of-town trip recommendation problem, we further justify our novelty from the trip recommendation perspective. Prior trip recommendation methods have not yet distinguished local users from those arriving from other regions, nor solved the preference drift that occurs when a user travels outside their hometown. Therefore, we introduce the out-of-town trip recommendation task and devise a dedicated method to address it.
Q3: Thanks for the question. We choose Neural ODEs not for novelty, but because they offer continuous-time modeling capabilities that are critical in our task. Unlike RNNs and Transformers that operate over discrete and often fixed time steps, Neural ODEs, when integrated with point process tools, allow us to model travel trajectories with irregular time sampling from a probabilistic perspective, enabling a flexible characterization of continuous-time user out-of-town dynamic preferences.
Following the reviewer’s suggestion, we have implemented two additional temporal modeling variants: Base+KSPL+GRU (RNN-based) and Base+KSPL+Transformer. In both variants, we retain the overall architecture of our model, replacing only the ODE-based dynamic preference learning component with GRU and Transformer decoders, respectively. Specifically, both variants autoregressively generate the user’s dynamic preferences over a fixed number of time steps. As shown in the following table, our ODE-based method (SPOT-Trip) consistently outperforms both the RNN variant (Base+KSPL+GRU) and the Transformer variant (Base+KSPL+Transformer) on both datasets across all evaluation metrics:
| Method | Foursquare- | Foursquare- | Yelp- | Yelp- |
|---|---|---|---|---|
| SPOT-Trip | 0.0400 | 0.0109 | 0.0399 | 0.0190 |
| Base+KSPL+Transformer | 0.0362 | 0.0063 | 0.0370 | 0.0172 |
| Base+KSPL+GRU | 0.0341 | 0.0070 | 0.0358 | 0.0169 |
This demonstrates the superiority of Neural ODEs in modeling continuous-time dynamics, which is particularly suitable for sparse and irregularly sampled POI trajectory in out-of-town scenarios. We will include these additional results in Table 4 of our revised manuscript to substantiate our design choices. Thank you once again for this valuable and constructive suggestion.
Reference:
[1] Liu, Yinghui, et al. "KDDC: Knowledge-driven disentangled causal metric learning for pre-travel out-of-town recommendation." IJCAI. 2024.
[2] Xin, Haoran, et al. "Captor: A crowd-aware pre-travel recommender system for out-of-town users." SIGIR. 2022.
[3] Xin, Haoran, et al. "Out-of-town recommendation with travel intention modeling." AAAI. 2021.
[4] Yang, Yuhao, et al. "Knowledge graph self-supervised rationalization for recommendation." KDD. 2023.
[5] Yang, Yuhao, et al. "Knowledge graph contrastive learning for recommendation." SIGIR. 2022.
Dear Reviewer 8nVL,
We sincerely appreciate the time and effort you have dedicated to reviewing our work. In response to your valuable feedback, we have provided detailed explanations addressing the issues you raised.
As the discussion period progresses, we would be eager to hear your thoughts on our responses--particularly whether they have adequately addressed your concerns. lf our revisions and discussions indicate the potential for a score adjustment, we would be very grateful for your consideration.
We remain committed to incorporating all of your suggestions to further improve the quality of our manuscript, and we look forward to any additional comments you may have.
Best regards,
The Authors
Thank you for your feedback.
However, I still think the definition shift from KDDC to this work is not novel enough. So as the framework. As NeuralPS is a very top-tier conference, which is not like IJCAI or AAAI, incremental work is not good enough to have a positive score, although this good work has a good quality itself. Therefore, I decided to maintain my score.
Thank you for your continued feedback and for recognizing the overall quality of our work.
We would like to clearly clarify that our paper addresses a fundamentally new task with substantial technical innovations. Existing studies often focus on next POI recommendation. However, we address the new out-of-town trip recommendation that requires generating the intermediate POI(s) under real-world user constraints, including origin, destination, and trip length. Our task is both more complex and more practical than previous tasks.
To solve this new task, we propose a novel framework that distinguishes between static and dynamic user preferences within the task setting. Specifically, we model static preferences using a knowledge graph-enhanced preference alignment method, and for the first time introduce Neural ODEs combined with temporal point processes to learn task-specific users' irregularly evolving dynamic trip preferences. These two types of preferences are then jointly leveraged to generate complete POI sequences through a sampling-based method. None of these components exist in combination in prior works, and together they form a coherent solution tailored to the challenges of the new task.
Some existing papers at NeurIPS [1], [2], [3] advance existing tasks with novel techniques. Nevertheless, we contribute a new task and propose a novel framework to solve it. We are confident that our submission meets the requirements of NeurIPS.
We sincerely appreciate your time and thoughtful review. Your feedback helps us improve our paper, and we remain open to any further suggestions you may have.
Reference:
[1] Dai, Zhihao, et al. "SARAD: Spatial association-aware anomaly detection and diagnosis for multivariate time series." NeurIPS. 2024.
[2] Moreno-Pino, Fernando, et al. "Rough transformers: Lightweight and continuous time series modelling through signature patching." NeurIPS. 2024.
[3] Liu, Qidong, et al. "LLM-ESR: Large language models enhancement for long-tailed sequential recommendation." NeurIPS. 2024.
Thank you for your feedback. However, I also think using Neural ODE to embed temporal point processes is not a novel contribution in 2025. The framework needs to be revised to meet the novelty standard of NeurIPS. As the work you mentioned, could you please give a more detailed explanation on how these works extend existing tasks? Because they seem to be well-defined tasks and novel techniques.
Thank you for your comments. We respectfully clarify that our technical novelty does not lie in the standalone use of Neural ODEs or temporal point processes (TPPs), but rather in their task-specific integration to model continuous-time user out-of-town preferences for out-of-town trip recommendation. To the best of our knowledge, this is the first work that leverages Neural ODEs in conjunction with TPPs for user dynamic preference representation learning, going beyond prior uses focused solely on event time prediction [4].
In addition, we complement this with a knowledge-graph-enhanced static preference modeling module, resulting in a dual-view framework that jointly captures both static and dynamic user preferences.
This type of targeted, architectural advancement is consistent with these mentioned NeurIPS works. For instance, SARAD [1] incrementally introduces spatial correlation modeling and a dual-module design to enhance multivariate time-series anomaly detection; Rough Transformers [2] extend the standard Transformer by integrating rough path theory and multi-view attention to better capture irregularly sampled, continuous-time dynamics; LLM-ESR [3] incorporates semantic signals from large language models and dual-view modeling to improve sequential recommendation under long-tail user/item distributions.
Our work follows a similar innovation pattern, introducing a principled adaptation of continuous-time modeling tools to a practically meaningful but technically underexplored recommendation setting.
We hope our response can address your concerns.
Reference:
[1] Dai, Zhihao, et al. "SARAD: Spatial association-aware anomaly detection and diagnosis for multivariate time series." NeurIPS. 2024.
[2] Moreno-Pino, Fernando, et al. "Rough Transformers: Lightweight and continuous time series modelling through signature patching." NeurIPS. 2024.
[3] Liu, Qidong, et al. "LLM-ESR: Large language models enhancement for long-tailed sequential recommendation." NeurIPS. 2024.
[4] Iakovlev, Valerii, and Harri Lähdesmäki. "Learning Spatiotemporal Dynamical Systems from Point Process Observations." ICLR, 2025.
Once again, we sincerely thank you for your review and the time you have devoted to our work. As the discussion period is drawing to a close, we would like to kindly ask whether our responses have addressed your concerns. We truly hope our clarifications have been helpful, and we would be glad to provide any further details should you have remaining questions.
We understand that the discussion period is short, and we sincerely appreciate your time and help!
This paper proposes a new task called out-of-town trip recommendation and presents a novel model, SPOT-Trip, to solve it. The authors clearly explain how their framework uses both static and dynamic user preferences. They model static preferences with a knowledge graph and dynamic preferences with neural ODEs and temporal point processes. Reviewer XSsn strongly supports the paper and considers the problem important and the technical solution well designed. The experiments are well done, and the writing is clear. This reviewer finds the paper a strong contribution.
Other reviewers raised concerns about novelty and metrics. The authors responded in detail and tried to clarify the differences from past work. They explained why some standard metrics are not suitable in their setup and also added new baselines. While not all reviewers changed their scores, I think the paper brings a meaningful contribution to the trip recommendation area. It introduces a new problem, uses a solid framework, and offers practical value. For these reasons, I recommend acceptance.