Neural Combinatorial Optimization for Robust Routing Problem with Uncertain Travel Times
摘要
评审与讨论
The authors propose a dual multi-head cross attention mechanism to extract problem features represented by the inputted uncertainty sets for the robust vehicle routing problem with uncertain travel times under the min-max regret criterion. The experimental results on the robust TSP and VRP demonstrate the efficacy of our neural combinatorial optimization method, showcasing its ability to efficiently handle the robust routing approaches.
优点
The authors have proposed an end-to-end neural model to capture the features of the robust routing problem and have used a pre-trained neural model to efficiently calculate the reward with respect to the worst-case scenario during training. They conducted extensive experiments on Robust Traveling Salesman Problem (RTSP) and Robust Capacitated Vehicle Routing Problem (RCVRP) instances. The results confirm the effectiveness of their approach in efficiently handling robust routing problems across diverse scales within a shorter computation time.
缺点
[1] The algorithm has not been compared to other existing neural combinatorial optimization works.
[2] It is well known that the POMO algorithm is very effective in solving VRPs. What improvements have you made in the POMO network structure, and whether these improvements have brought improved results?
[3] The coding method of uncertain sets proposed in this paper is not very innovative.
问题
See the weaknesses.
局限性
yes
Q1: The algorithm has not been compared to other existing neural combinatorial optimization works.
A1: To the best of our knowledge, this paper represents the initial application of neural combinatorial optimization to address RTSP/RVRP. Nonetheless, we have tried to compared different network architectures within the proposed methodology.
Q2: It is well known that the POMO algorithm is very effective in solving VRPs. What improvements have you made in the POMO network structure, and whether these improvements have brought improved results?
A2: The POMO approach mainly exploit the symmetries in the representation of a combinatorial optimization solution. In our solution framework, we incorporate the principles of the POMO approach by sampling trajectories during the training stage and employing instance augmentation during inference stage. Our results reported in Table 5 and Figure 5 can partly verify the training and inference efficiency.
Q3: The coding method of uncertain sets proposed in this paper is not very innovative.
A3: We have tried our best to study a range of encoding methods to capture the characteristics of uncertain sets. The ablation experiment presented in Appendix 7.4, "Effects of the Encoding Approaches", indicates that the uncertainty set encoding method employed in this paper is simple yet effective.
Thank you for your considerate reply and supplementary analysis. The majority of my apprehensions have been resolved.
We appreciate the reviewer for acknowledging our work. Please feel free to let us know if you still have any other concerns.
The paper concerns robust routing problem with uncertain travel times under the min-max regret criterion, which represents an extended and robust version of the classic traveling salesman problem (TSP) and vehicle routing problem (VRP). The authors proposed a dual multi-head cross-attention mechanism to extract problem features represented by the inputted uncertainty sets and achieved very good results in experiments.
优点
The introduced model seems to be quite innovative, especially in the domain of routing problems. The paper is well-written and the quality of the review of related works, theoretical background, setup of experiments, and the presentation of results are very good. Taking into account that the authors tackle a more realistic variant of TSP/VRP and look for robust solutions, the methodology may have some practical applications.
缺点
I haven't identified significant weaknesses. The only one I have in mind is the fact that neither the code, nor the dataset are available, so it is not possible to verify the results. The authors promised to make the code publicly accessible once the paper is accepted. I am not sure about the datasets. I've also found a minor writing issue: it seems that in definition 3.1, it is not clear what exactly x_ij is (one can guess, but it would be good to clarify).
问题
I do not have any questions now.
局限性
Yes, the authors have addressed the limitations in the Conclusions section.
Q: I've also found a minor writing issue: it seems that in definition 3.1, it is not clear what exactly is (one can guess, but it would be good to clarify).
A: Thank you for your advice. We will include the explanation for in the revised version.
Here, is a binary variable that indicates whether the edge is selected for the route: 1 for selection, and 0 for non-selection.
Thank you for the answer. Are you planning to publish the dataset (in addition to the code) once the paper is accepted?
Thank you for your timely response. To preserve anonymity, we will release the datasets, codes, and results on our GitHub after the final decision on the paper has been made.
The min-max regret criterion is employed, and the routing problems are solved when the moving cost of an edge varies (perhaps uniformly) within a certain range. The authors have discussed two (possibly) uncertain classes and proposed a theoretical analysis and a learning-based solver (on TSP and CVRP).
优点
- [S1] The whole concept and problems are interesting. The interval-based uncertain is one of the standard formulations but it is applied to various real problems.
- [S2] The learning method (i.e., REINFORCE + POMO training approach) achieved a promising performance, though the method is possibly a combination of existing concepts.
- [S3] The proposed method can generate at least better solutions faster than simple solvers (e.g., SA-based).
缺点
- [W1] In contrast to [S1], there appear to be some gaps concerning the uncertainty of edge times (for example, another standard way is modeling them by some distributions).
- [W2] The novelty of the solver is unclear.
- [W3] To my understanding, the MatNet solver is used to compute the reward; however, the reward quality may not be high or unclear to me (of course, some solutions at high speed are essential for actual learning, but I wanted a discussion of this area).
问题
- [Augmentation] I'm curious what you are doing; please explain augmentation. If it is mentioned anywhere, please let me know where.
- Two questions from [W1].
- I feel that it could be generalized to cases like following some normal distribution, not discrete intervals. I think the strength of the method can be used in a setting where the elapsed time of an edge is given by a sample from a certain distribution, as this seems to be a common setting in reality. In fact, I felt that it could be handled well in other locations and theoretical developments.
- From Def. 3.1 and the idea of using intervals, I feel that the interval is regarded everywhere as having the same importance. However, considering the actual application, it seems likely that the closer to , the heavier the penalty may be. I was curious about the generalization and future development in this area.
- I am unclear about the quality of the reward from MatNet [W3]. Can you explain this point more?
- I did not understand the treatment of .
- controls the amount of how much each side is inflated from , but since , I think it is a sum of real numbers. So I was confused by some explanations that seem to be natural number, for example l.133 so
exactly $\Gamma$ edges. Not related to the main point, but it bothers me.
- controls the amount of how much each side is inflated from , but since , I think it is a sum of real numbers. So I was confused by some explanations that seem to be natural number, for example l.133 so
- [minor comment].
- I understand that this is a space issue, but the location of Table 1 came before the explanation, and it was difficult to understand at first what x8 and x128 meant.
After the rebuttal, I have updated my score.
局限性
In my opinion, the authors have addressed the issue adequately and mentioned these points.
Q1: In contrast to [S1], there appear to be some gaps concerning the uncertainty of edge times (for example, another standard way is modeling them by some distributions).
A1: In the field of robust optimization, the modeling of robust uncertainty sets can exhibit significant variation. Apart from the classic robust optimization support sets addressed in the paper, there are also approaches such as stochastic programming with parameter distributions, and distributionally robust optimization where random variables' probability distributions lie within fuzzy sets. While these methods are beyond the scope of the present study, they hold promise as potential topics for our future research endeavors.
Q2: The novelty of the solver is unclear.
A2: Through our problem analysis and research, we found that robust problems under the min-max-regret criterion have a property that the max-regret value can only be computed when a complete solution is obtained, which aligns with the concept of delayed reward in reinforcement learning. Therefore, our work builds upon this foundation to explore the use of neural combinatorial optimization methods. In contrast to traditional robust optimization approaches, neural combinatorial optimization methods do not require complicated and professional combinatorial optimization knowledge to convert robust optimization objectives and constraints. Furthermore, they provide efficient problem-solving capabilities.
Q3: To my understanding, the MatNet solver is used to compute the reward; however, the reward quality may not be high or unclear to me (of course, some solutions at high speed are essential for actual learning, but I wanted a discussion of this area). I am unclear about the quality of the reward from MatNet [W3]. Can you explain this point more?
A3: Referring to the section "Effects of the Built-in TSP Solvers" in Appendix 7.4, it can be observed that the reward calculation methods can influence the overall performance of the model. The MatNet-based pre-trained solver, ultimately adopted in the study, outperforms approaches such as LKH and CMA-ES in terms of both the average training time per epoch and the final performance of the guided model.
Q4: [Augmentation] I'm curious what you are doing; please explain augmentation. If it is mentioned anywhere, please let me know where.
A4: Instance augmentation is a standard technique in neural combinatorial optimization. In our study, we discussed it from line 242 to line 252 in Section 4.4. ''Instance augmentation [19] during inference can also be adjusted to improve the solution quality. Concretely, we employ multiple independent randomizations on the initial one-hot node embeddings for each instance to explore various paths towards the optimal solution. Since two matrices are inputted to our model, they share the same randomized one-hot vector. Ultimately, the best solution is selected from the multiple generated solutions.''
Q5: I feel that it could be generalized to cases like following some normal distribution, not discrete intervals. I think the strength of the method can be used in a setting where the elapsed time of an edge is given by a sample from a certain distribution, as this seems to be a common setting in reality. In fact, I felt that it could be handled well in other locations and theoretical developments.
A5: In classic robust optimization, when dealing with uncertainty following specific probability distributions, the approach typically involves incorporating this probabilistic information into the optimization model. Instead of focusing solely on worst-case scenarios, the optimization process takes into account the likelihood or probability of different outcomes based on the known distributions of the uncertain parameters.
There are several ways to cope with uncertainty following specific distributions, e.g., stochastic programming, distributionally robust optimization, chance-constrained optimization. However, our current research is limited to investigating classic robust optimization with interval uncertainty framework.
Q6: From Def. 3.1 and the idea of using intervals, I feel that the interval is regarded everywhere as having the same importance. However, considering the actual application, it seems likely that the closer to , the heavier the penalty may be. I was curious about the generalization and future development in this area.
A6: Classic robust optimization considers the impact of uncertainty by focusing on the most unfavorable situations. By accounting for these worst-case scenarios during the optimization process, the resulting solutions are designed to be robust and resilient, offering a level of performance guarantees even under adverse conditions.
We acknowledge the possibility of associating different penalties with different interval segments. However, our methodology is centered on achieving a resilient solution across all adverse scenarios. It appears that alternative robust modeling techniques may be better suited to address your specific concerns.
Q7: I did not understand the treatment of . controls the amount of how much each side is inflated from , but since , I think it is a sum of real numbers. So I was confused by some explanations that seem to be natural number, for example 1.133 so exactly edges. Not related to the main point, but it bothers me.
A7: In practical applications, is a parameter commonly designed to be an integer controlling the number of affected edges in the worst case. Interestingly, if is set to a real number, the worst-case scenario can still be identified with a minor adjustment to Theorem 1: edges reach their upper bound values, one edge takes a proportional value within , and the remaining edges reach their lower bound values.
Dear authors, I appreciate the detailed responses.
From some Q&A related to uncertain modeling (e.g., robust optimization), I understood the authors focused on some different concept; the responses clarified my concerns. Some uncertain parts (e.g., Q3, Q4, and Q7) have been addressed by the responses as well. Particularly from A7, my concerns have been explained.
After reading the responses, I'll update my score.
Thank you very much for recognizing our work. We will endeavor to expand our research based on your feedback to broaden the scope of our study.
The paper proposes a neural network based approach and applied it to the robust version of the traveling salesman problem (TSP) and the related vehicle routing problem (VRP) where interval uncertainty is taken into account and the goal is to minimise the max-regret value. The architecture (encoder, decoder) is described as well as the training process, and for the POMO approach is used for fast inference with a pre-trained model. Then experiments are conducted to compare the proposed approach to existing approaches from the literature.
优点
-
The robust version of TSP and VRP have been widely studied in the literature, so it is good to see a paper that continues the existing work
-
The proposed approach has been implemented and used to conduct experiments
-
The paper is written in good English
缺点
-
The paper is not easy to follow; at the end of the introduction (i.e., Section 1) there should be an outline of the organisation of the paper to give the reader orientation
-
The discussion of related work is too brief to give a comprehensive overview of the state-of-the-art of ML-based approaches, including neural network based methods. The paper below is a recent survey paper that discusses supervised as well as reinforcement learning-based approaches: Bogyrbayeva et al: Machine Learning Solve Vehicle Routing Problems: A Survey, IEEE Trans. on Intelligent Transportation Systems, 2023
-
As emphasised by the authors their proposed approach is a bit similar to the approach in MatNet described in reference [19]; so the novelty of the proposed method is limited; the paper's contribution is more the application to the robust TSP and VRP
-
The font size in Table , Table 3 and Fig. 2 are far too small
-
Careful proofreading is needed as the paper contains a range of typos, for example: page 3, line -2: "budge"
问题
-
The formulation of the robust TSP presented in Section 3 seems to be the same to the one studied in reference [22]; so this just be made clear to the reader?
-
On page 9 what exactly is meant by blended and fusion in the ablation study?
-
The choice of the baseline methods (BS - branch & cut, BD - benders decomposition, SA - simulated annealing, iDS - iterated dual substitution, edge generation - EGA) used for the experiments is not well justified; why just these methods? why no others?
局限性
- The datasets are just chosen from reference [22]; but it is unclear whether the observations made in the paper still valid for other datasets; how well do they generalise?
Thank you for your comments. We will strive to broaden the literature review to incorporate more recent studies. Additionally, we will carefully review and adjust the table font size and correct any errors.
Q1: The formulation of the robust TSP presented in Section 3 seems to be the same to the one studied in reference [22]; so this just be made clear to the reader?
A1: In Section 3, we provide a detailed mathematical modeling and description of the optimization problem to help readers understand the issue this paper aims to address. This also serves as the motivation for our work moving forward. While the mathematical definition is originated from reference [22], we have extended it to a broader set of budget uncertainties, with the interval uncertainty set mentioned in reference [22] being a special case of this broader framework.
Q2: On page 9 what exactly is meant by blended and fusion in the ablation study?
A2: In the ablation study, the terms "blended" and "fusion" refer to distinct methods of combining two matrices. Their specific definitions are elaborated in Appendix 7.4, under the section titled "Effects of the Encoding Approaches". For a more detailed visual representation of these methods, please refer to Figure 4.
Q3: The choice of the baseline methods (BC - branch and cut, BD - benders decomposition, SA - simulated annealing, iDS - iterated dual substitution, edge generation - EGA) used for the experiments is not well justified; why just these methods? why no others?
A3: To the best of our knowledge, these methods currently stand as representative and effective approaches for RTSP/RVRP. As highlighted in reference [29] from the INFORMS Journal on Computing (2022), their iDS algorithm has been benchmarked against state-of-the-art results, demonstrating significant advancements by surpassing several existing records.
Q4: The datasets are just chosen from reference [22]; but it is unclear whether the observations made in the paper still valid for other datasets; how well do they generalise?
A4: Our approach is not limited to just the interval uncertainty set issue as detailed in reference [22], but extends to a wider range of budget uncertainty sets. Moreover, a hyperparameter can be leveraged to adjust the robustness of the uncertainty set. Furthermore, we conducted a study on problem scale generalization in Section 5.2.
Dear authors, Many thanks for your answers that helped in clarifying several points. I am more happy now concerning the contribution.
Regarding Q1. The difference between the problem definition in reference [22] and yours is clear now. For future readers of your paper a small example might help. As far as I can see your example in Fig. 3 is also about interval uncertainty. Perhaps you can change this small example to illustrate more general budget uncertainty, and move this example away from the appendix and into the main text.
Regarding Q2. The meaning of “blended” and “fusion” is clear now. Unfortunately the explanation of these terms is hidden in the appendix. Perhaps you can briefly explain these terms in the text so that the future reader is not required to look into the online appendix and try to digest Fig. 4.
Regarding Q3. As far as I can see, reference [29] (the INFORMS paper where iDS is introduced) does not contain experiments to compare iDS against other methods for the RTSP. This is pointed out in [12] (which is co-authored by one of the authors of [29]), and the following reason is given: “However, due to the difficulty of representing the TSP as a binary integer programming model with a polynomial number of constraints, the iDS method has not been examined for the RTSP.” [12] then gives an experimental comparison between EGA, iDS and other methods from your list ([22], BC, BD).
Regarding Q4. This is clearer now. This point is related to my comment regarding Q1. It would be helpful for the future reader to illustrate the differences a bit more. With respect to scalability, [12] studies problem instances with up to 1000 vertexes. How is this in your paper?
Thank you for your valuable suggestions regarding the details and structure of our paper. We will incorporate your recommendations in the revised version, such as including illustrations of problem instances and providing an overview of "blended" and "fusion" concepts in the main text.
We acknowledge your point that while iDS was initially proposed in [29], it was experimentally implemented on RTSP in [12].
Regarding the scaling-up challenge, we regret that we do not have enough time to study the instances with up to 1000 vertexes due to resource limitations, memory constraints, and the inherent restrictions of neural network methodologies. We are committed to addressing these challenges in our future research endeavors.
Neverthless, we have conducted supplementary experiments to validate the effectiveness of our approach under general budget uncertainty conditions. Table 1 presents the comparative results for different values at a scale of . Our method demonstrates highly encouraging outcomes within a remarkably short time when contrasted with the leading solver EGA.
| Method | Obj | Time (s) | Obj | Time (s) | Obj | Time (s) |
|---|---|---|---|---|---|---|
| Γ=⌊C(n,2)/2⌋ | Γ=⌊C(n,2)/4⌋ | Γ=0 | ||||
| EGA | 0.7870 | 55.9 | 0.3175 | 56.7 | 0.0000 | 38.4 |
| ours*128 | 0.7870 | 11.2 | 0.3180 | 10.9 | 0.0005 | 11.3 |
| ours*8 | 0.7945 | 0.8 | 0.3305 | 0.79 | 0.0365 | 0.8 |
This paper proposes a neural combinatorial optimization (NCO) method to solve robust routing problems under uncertain travel times under the min-max regret criterion - a common occurrence in real-life scenarios in which one wants to minimize worst-case scenarios. The proposed approach employs the transformer-based MatNet approach, which can model edge features such as uncertain travel times and project features into the latent space. Moreover, a secondary pre-trained model is employed to evaluate the optimal tour efficiently under the max-regret condition. The approach can obtain near-optimal solutions with much faster solution times compared to baselines.
优点
To my knowledge, this paper is the first to apply a neural approach to the specific application of uncertain travel times with min-max regret, which is an important practical application. This includes the use of edge features, which is very important in practice but understudied in NCO for VRPs. The proposed model is a reasonable approach and shows good performance while cutting down considerably on computational requirements both in training and inference.
I have no major concerns regarding the paper's quality and experimental results. While the paper is specific to an application, I would recommend acceptance based on its quality, originality, and practical application.
缺点
I have spotted some weaknesses here and there, but no major ones.
-
This paper's main weakness may be its specificity to a task. However, as described above, given its importance and overall quality, I believe this is not a major flaw.
-
Equation 6: why is the loss function minimizing the expected reward? Shouldn’t it maximize it? Moreover, the gradient of the loss is defined in Eq. 10, which is different from the loss (with a different notation).
-
No code is provided at the time of submission.
问题
-
Why did you choose MatNet instead of e.g. BQ-NCO [1r] as your backbone?
-
Could the uncertainty in travel times be seen as a sort of “adversarial attack” on the solution - in other words, connecting your approach to ROCO [2r]?
References
[1r] Drakulic, Darko, et al. "Bq-nco: Bisimulation quotienting for efficient neural combinatorial optimization." Advances in Neural Information Processing Systems 36 (2024).
[2r] Lu, Han, et al. "Roco: A general framework for evaluating robustness of combinatorial optimization solvers on graphs." The Eleventh International Conference on Learning Representations. 2023.
局限性
See the above weaknesses. The authors describe the main limitation of the approach, namely that the scale is currently pretty small, although this is expected from a seminal paper.
Q1: Equation 6: why is the loss function minimizing the expected reward? Shouldn’t it maximize it? Moreover, the gradient of the loss is defined in Eq. 10, which is different from the loss (with a different notation).
A1: In order to maximize the expected reward, we invert the reward to align with loss minimization. Furthermore, we utilize a shared baseline technique in Eq. 10 to compute the gradient of the loss.
Q2: No code is provided at the time of submission.
A2: We will make our code publicly accessible once the paper is accepted as stated in line 706.
Q3: Why did you choose MatNet instead of e.g. BQ-NCO [1r] as your backbone?
A3: BQ-NCO presents an exceptional contribution by introducing a novel formulation for COPs as MDPs to improve out-of-distribution generalization. It leverages tail recursion properties and Bisimulation Quotienting to tackle this challenge. However, similar to AM and POMO methods, it treats "node features" as inputs. In robust scenarios where edge uncertainty is a concern, node features may become irrelevant or of limited use in this specific problem context. Hence, we choose MatNet, which handles edge features more effectively, as the backbone network.
Q4: Could the uncertainty in travel times be seen as a sort of “adversarial attack” on the solution - in other words, connecting your approach to ROCO [2r]?
A4: ROCO generates new instances from the original problem using adversarial networks (e.g., by relaxing constraints or lowering some edges) to ensure that these new instances do not lead to worse optimal costs. While both our method and ROCO offer flexibility, as they both accommodate neural networks or built-in worst-case solvers, our approach differs in that it only requires describing the uncertainty set for robust optimization, without needing to modify the policy to generate new altered solutions.
Moreover, as noted by the authors of ROCO, "Our paper also differs from the so-called `robust optimization', where the expected objective score is optimized based on a known data distribution.” ROCO addresses the black-box robustness issue of combinatorial optimization solvers, while we focus on robustness in the actual application environment of the problem.
Thanks for your answers. Here is some further feedback:
However, similar to AM and POMO methods, it treats "node features" as inputs. In robust scenarios where edge uncertainty is a concern, node features may become irrelevant or of limited use in this specific problem context.
I was referring to the version of BQ-NCO that uses a graph neural network at each step to model the edge features. But I agree with you that Matnet is also a good choice and faster, given that it does not re-encode at each step.
We will make our code publicly accessible once the paper is accepted as stated in line 706.
I see; I was referring to it at the time of submission. While I trust your results and paper, I believe releasing an anonymized source code to be reviewed at the time of submission is important. Some reviewers also tend to give higher scores to papers that do so. Hence, I suggest you consider doing so in future works.
This said I believe this is a solid paper worthy of acceptance. Good job!
Thank you very much for acknowledging our work! We will finalize our research based on the reviewers' feedback once the final decision on the paper has been reached.
This paper studied the robust routing problem with uncertainty travel time. The proposed method leverages a dual multi-head cross attention mechanism and incorporates a pre-trained model to efficiently compute regret values during training, demonstrating strong performance in terms of both computational efficiency and solution quality.
Reviewers generally agree that this paper offers a valuable contribution to the field, particularly in applying deep learning methods to robust combinatorial optimization problems. The authors have satisfactorily addressed concerns raised during the review process, including clarifications on the novelty of the approach and improvements made over existing methods. All reviewers consistently view this paper as strong in terms of methodological innovation, practical relevance, and experimental validation.