A Learning-based Capacitated Arc Routing Problem Solver Comparable to Metaheuristics While with Far Less Runtimes
we introduce a learning-based CARP solver to significantly narrow the gap with advanced metaheuristics while with far less runtimes.
摘要
评审与讨论
The paper introduces a learning-based method to address the Capacitated Arc Routing Problem (CARP). It involves breaking undirected edges into directed arcs and utilizing a graph attention network to build a Direction-aware Attention Model. In the training process, supervised learning is used to create the initial policy, followed by reinforcement learning based on policy gradients using Proximal Policy Optimization (PPO) to refine strategies. Lastly, dynamic programming is applied to optimize depot placements for path enhancement. Experimental outcomes show notable benefits of this algorithm in evaluation criteria.
优点
In general, the paper exhibits a well-organized structure with detailed experimental outcomes showcase through graphs and tables, facilitating readers in comprehending and visualizing the results effortlessly. The dataset employed comprises real-world scenarios, thereby boosting its practical relevance.
缺点
Converting the graph G from arcs to nodes represents a common approach in many heuristics for addressing CARP. This process adds complexity to the problem and increases its scale. The proposed method appears to lack enough novelty, with most components bearing resemblance to neural models designed for CVRP.
问题
In the experimental section, it is highlighted that supervised learning was employed for generating the initial policy using MAENS to produce real labels. Nonetheless, supervised learning may face challenges with large-scale datasets. Is supervised learning indispensable for this aspect? Would the final model's performance be affected if the initial policy was randomly generated instead?
Is path optimization applied during both the training and inference/testing phases, or is it specifically used for the inference phase only? Moreover, the time specified in Figure 2 likely pertains to the inference time for sample testing. I am also curious about the model's training time—approximately how long does it typically take?
局限性
It appears that the paper focuses on an unlimited number of vehicles. How would the approach adapt to a specific set of vehicles?
Q1: Converting the graph G from arcs to nodes represents a common approach in many heuristics for addressing CARP. This process adds complexity to the problem and increases its scale. The proposed method appears to lack enough novelty, with most components bearing resemblance to neural models designed for CVRP.
A1: Thanks for your insightful question. However, our method actually differs from previous studies. Most earlier methods employed a common technique known as line graph conversion from edges to nodes, which completely ignores the directionality of edge traversal, hindering the application of learning-based CARP solvers. Our work specifically addresses this issue, aiming to eliminate barriers for learning-based CARP solvers. Additionally, we have combined supervised learning and reinforcement learning to jointly optimize a single strategy, achieving better results than using either approach independently.
Q2: In the experimental section, it is highlighted that supervised learning was employed for generating the initial policy using MAENS to produce real labels. Nonetheless, supervised learning may face challenges with large-scale datasets. Is supervised learning indispensable for this aspect? Would the final model's performance be affected if the initial policy was randomly generated instead?
A2: Thanks for the valuable comments. The main purpose of using supervised learning is to accelerate the convergence of reinforcement learning algorithms. While reinforcement learning algorithms can be initialized with random strategies, they often require more training time to converge or even cannot converge. Therefore, supervised learning is indispensable. To validate the effects of random strategy initialization, we conducted experiments, as shown in Table R2. As seen, Task 20 can converge in about 2.5 hours, but for other task sizes, we spent a significant amount of time training with difficulty in achieving convergence. Additionally, for the converging Task 20, the cost of training with random initialization was approximately 15% higher than that with the combined SL+RL approach. In fact, since supervised learning is only performed on Task20, the SL pre-training does not take a lot of time.
Table R2. Training comparison between supervised pre-training and random initialization. | Task20 | | Task40 | | Task60 | | Task80 | | |:---------|:-------------|:---------|:-------------|:---------|:-------------|:---------|:-------------| | SL+RL | Random | SL+RL | Random | SL+RL | Random | SL+RL | Random | |0.6h | 2.5h | 0.8h | 8h+ | 1.5h | 15h+ | 2h | 20h+ |
Q3: Is path optimization applied during both the training and inference/testing phases, or is it specifically used for the inference phase only? Moreover, the time specified in Figure 2 likely pertains to the inference time for sample testing. I am also curious about the model's training time—approximately how long does it typically take?
A3: Thanks for the insightful comment. PO is only utilized in the stage of inference, and the training times of DaAM models are are given in the General Response A2.
Q4: It appears that the paper focuses on an unlimited number of vehicles. How would the approach adapt to a specific set of vehicles?
A4: Thanks for the thought-provoking question. The solution to the CARP includes multiple routes, each of which must adhere to vehicle limits (not exceeding the vehicle capacity). The number of vehicles does not affect the total travel cost but does impacts the overall execution time of the task. The more vehicles available, the more routes can be traversed simultaneously. Therefore, considering and restricting the number of vehicles during CARP solving process should extend the overall task execution time.
The authors skillfully address challenges posed by non-Euclidean graphs, traversal direction, and capacity constraints with their novel NN-based solver in solving capacitated arc routing problem. The introduction of the direction-aware attention model and a supervised reinforcement learning scheme is particularly commendable. These innovations significantly narrow the gap with advanced metaheuristics, achieving superior efficiency and competitive decision quality.
优点
- The manuscript employs numerous innovative methods to solve the capacitated arc routing problem, achieving impressive results.
- It also shows promising performance in generalizing to larger problem instances.
- The combination of supervised and reinforcement learning is quite interesting. Using supervised learning for pre-training followed by fine-tuning with reinforcement learning is a noteworthy approach.
- The qualitative comparisons in real street scenes presented in Figure 4 are particularly interesting.
缺点
- It's better to redraw the first part of Figure 1 to enhance its aesthetic quality.
- The baseline is not very recent. After S2V-DQN and S2V-DQN, there are still some excellent works that can be used to address the CARP problem.
- Some writing errors have been identified, such as in line 2 of Algorithm 1. Please review the entire manuscript to check.
- The completeness of the manuscript still requires supplementation and refinement.
问题
The method proposed by the authors is innovative and interesting, but I still have some questions that I hope the authors can address:
- In the manuscript, what is the significance of transforming the initial undirected edges into fully connected directed edges at the beginning? If edges' directions are required, why not just convert them into bidirectional directed edges?
- In section 4.2, how is ϵ selected?
- In section 4.3, the authors remove all the depot arcs initially and then add them back. What is the purpose of these steps? Why does this not affect the calculation of vehicle capacity?
- In the Solution Visualization section, the authors have visualized the streets of Beijing. How does the model handle the non-Euclidean space in real-world scenarios?
- The detailed structure of the proposed model and algorithm should be further completed and upplemented in the appendix.
局限性
The approach of decomposing undirected edges into directed ones introduces additional decision elements, which complicates the problem. It's better that the authors can find a more efficient graph processing method.
Q1: It's better to redraw the first part of Figure 1 to enhance its aesthetic quality.
A1: Thank you for the great suggestion, we will carefully redraw the paper if it is accepted. We would like to see if you have any concise suggestions for improving the aesthetic quality.
Q2: The baseline is not very recent. After S2V-DQN and S2V-DQN, there are still some excellent works that can be used to address the CARP problem.
A2: Thanks for the insightful suggestion! The comparison with more methods are given in the Genreal Response A1. As far as we know, there are indeed many excellent works that can solve the CARP-like problem recently. However, we found that most of them focus on variants of CARP, such as uncertain CARP or time CARP, while PS-Efficiency is still the advanced algorithm among constructive heuristics to solve CARP.
Q3: Some writing errors have been identified, such as in line 2 of Algorithm 1. Please review the entire manuscript to check.
A3: Thanks for the valuable suggestion. This writing errors does confuse readers when they are trying to understand. We promise to correct all the writting errors in the final version.
Q4: The completeness of the manuscript still requires supplementation and refinement.
A4: Thanks for the valuable suggestion. We have realized that the lack of some content indeed leads to incompleteness, such as more recent methods, experiments on public datasets, details of training time, parameters of PS variants, etc. We will add all these details in the final version.
Q5: In the manuscript, what is the significance of transforming the initial undirected edges into fully connected directed edges at the beginning? If edges' directions are required, why not just convert them into bidirectional directed edges?
A5: Thanks for the insightful comment. In our method, by splitting the undirected edge into two directed edges, DaAM can calculate the embeddings for the two edges with opposite directions separately and explicitly choose one directed edge at each decision. If using bidirectional edges, the directionality contained in each bidirectional edge cannot be determined after computing the embedding.
Q6: In section 4.2, how is ϵ selected?
A6: Thanks for the comment. This parameter is obtained by searching. Based on the experience of previous related papers, we set his search range to (0, 0.2], with a step size of 0.05. We use values of 0.05, 0.1, 0.15, and 0.2 as training parameters and select the optimal value by comparing the final training loss.
Q7: In section 4.3, the authors remove all the depot arcs initially and then add them back. What is the purpose of these steps? Why does this not affect the calculation of vehicle capacity?
A7: Thanks for the insightful comment. PO aims to optimize the depot arcs' position in the result sequence, i.e., determining when to return. Therefore, all depot arcs need to be removed initially, which temporarily disrupts capacity constraints. Then the capacity constraints is restored by re-inserting the deport arcs' back into suitable positions within the result sequence. This process can be understood as a "Route First, Cluster Second" mechanism, as mentioned in answer to question 7 of Reviewer h6eF.
Q8: In the Solution Visualization section, the authors have visualized the streets of Beijing. How does the model handle the non-Euclidean space in real-world scenarios?
A8: Thanks for the valuable comment. In our experiments, we first obtain road network data from map SDKs such as OpenStreetMap, then extract the topology to create a non-Euclidean graph. The core steps are already provided by the map SDKs.
Q9: The detailed structure of the proposed model and algorithm should be further completed and supplemented in the appendix.
A9: Thank you for your valuable suggestion. Considering that the details of the network structures such as GAT and AM are not shown in detail in this paper, it indeed have a certain impact on understanding. If the paper is accepted, we are willing to describe all the model structure in the supplementary materials to help readers fully understand every detail of our algorithm.
Q10: The approach of decomposing undirected edges into directed ones introduces additional decision elements, which complicates the problem. It's better that the authors can find a more efficient graph processing method.
A10: Thanks for the insightful comment. In previous learning-based solvers, decisions were made directly on undirected edges. Actually, since decisions are made continuously, the directionality of the edges is also determined at the same time. Thus, these methods implicitly encoded the directionality of edges. In contrast, our approach explicitly encodes the direction of the edges, reducing the difficulty for the model in learning edge directionality. Although our transformation process indeed creates redundant edges and increases the amount of input data, we can further reduce the complexity through graph pruning.
The paper proposes a new learning-based constructive heuristic for capacitated arc routing problems. In contrast to node routing problems such as the TSP and VRP, arc routing problems received comparably little attentition. To address the specific challenges in the capacitated arc routing problems, the authors propose a Neural Network-based approach that uses a graph attention model considering arc directionality, a reinforcement learning approach with supervised pre-training and PPO-based fine-tuning. In order to improve solutions obtained by an RL-based construction approach, they propose a beam search approach for path optimization which, after turning the set of routes into a giant tour, splits the tour into routes by adding returns to the depot. A set of experiments shows that the proposed approach consistently yields better results than traditional hand-crafted constructive heuristics, and that their solutions almost match the quality of a time-consuming memetic algorithm that is only capable of solving small instance in a reaonable amount of time.
优点
The authors propose one of the first learning-based approaches for the capacitated arc routing problem (CARP). Their approach, in particular their graph embedding, explicitly addresses one of the challenges of learning-based construction algorithms for this problem by explicitly replacing the undirected edges by directed arcs. This idea is original and turns out to be helpful to create a well-performing heuristic.
The online performance of the approach surpasses hand-crafted constructive heuristics both in terms of runtime and efficiency. While for small instances, other metaheuristic approaches are better, it can be assumed that for large-scale instances, the proposed approach surpasses the state-of-the art of heuristic approaches. This is a significant result, since for node routing problems such as the CVRP, researcher have been struggling for years to design learning-based heuristics that achieve a performace that is comparable to hand-crafted heuristics. It should be mentioned, though, that in general, arc routing problems receive much less attention than node routing problems in the literature.
The paper comprises several insightful and, as far as I can tell, reasonably designed experiments, in particular showing the generalization capability to larger instances.
The paper provides both code and instances.
缺点
The presentation of the CARP routing problem, the solutions approaches and related work lacks clarity in many places. As an example, in the abstract, we find that the CARP consists in finding "the minimum-cost tour"hat covers all required edges on a graph, while within capacity constraints". This is a a bit misleading description since we look for a set of routes instead of a tour.
The paper distinguishes "heuristics" and "metaheuristics", while clearly metaheuristics are a type of heuristics. Actually, what the authors appear to have in mind is "constructive heuristics" which sequentially construct a solution by adding edges to form routes. I suggest to formulate more precisely here.
Similarly, it would enhance the understanding of the paper to introduce the notion of "route-first, cluster second" and the related notion of a "giant tour" which is commonplace in routing applications, to characterize respective existing work. It would even facilitate the presented path optimization which actually turns the presented approach into a route-first, cluster second approach.
The computational results are convincing, but the discussion should emphasize that a fair comparison can only be made between their approach wihout path optimization and the other constructive heuristics. It would indeed be interesting to see how the far the path optimization is able to improve the results of the other constructive heuristics.
The claims "NN-based approaches tend to lag behind advanced metaheuristics" (abstract) and "NN-based methods usually lags far behind the traditional ones in solving CARP", "they still lag significantly behind traditional methods" are not valid. Actually, (Rahmamoorty et. al 2024) (reference 20) report that on average, they improve upon the memetic algorithm by 11% on average.
When it comes to the evaluation of the path scanning approaches in the experiments, it is unclear how they are parameterized. From reading the paper (Aarakaki 2019) one sees that the parameter alpha and the number of iteration have a considerable impact both on solution time and solution quality, and (albeit on different instances), the average gaps for the path scanning approaches to the optimal (and to the memetic algrithm) reported in (Aarakaki 2019) are smaller than those found in the submission.
The description of the path improvement is not very clear; in particular the definition of the state used in the Dynamic Programming algorithm. Is it a path? Is it the length of a path? Also, the statement "f(*) denotes a state featuring dynamic programming" is hard to decipher.
Training time is not discussed at all.
问题
Please address the weaknesses mentioned above. Also:
- How long does the training take?
- Where does the implementation of the memetic algorithm come from?
- How were the PS approaches parameterized?
局限性
Limitations are mostly addressed in a reasonable way. I suggest to add the following aspects:
I think that for small instances, the approach by (Rahmamoorty et. al 2024) may surpass the results reported here, which should be mentioned.
Also, you should at least briefly mention the training time, since this makes it easier to assess the trade-off between offline effort and online performace.
Q1(abbreviation): The abstract inaccurately describes the CARP as finding a "minimum-cost tour" rather than finding a set of routes.
A1: Thanks for the valuable suggestion. This vague description does confuse readers when they are trying to understand. We promise to correct all ambiguities in the representation of the final version.
Q2(abbreviation): The paper incorrectly separates "heuristics" and "metaheuristics".
A2: Thanks for the valuable suggestion. The "heuristic" in the paper is indeed "constructive heuristics" as you said. We will adopt this expression in the final version and give a specific definition.
Q3(abbreviation): Introducing the "route-first, cluster-second" strategy and "giant tour" concept could clarify and improve the paper's approach to path optimization.
A3: Excellent suggestion! It is very useful for us to enhance the representation. These two concepts are indeed closely related to the proposed method. The core idea of “Route-first, cluster second” is to construct a continuous tour (i.e., “giant tour”) including all the edges that need to be served, and then split it into multiple feasible sub-routes. We will re-construct the section of "path optimization" by using the two concepts to explain the mechanisms.
Q4(abbreviation): The paper should clarify that fair comparisons are between their basic approach and other constructive heuristics, and assess the impact of path optimization on the other constructive heuristics.
A4: Thanks for the interesting suggestions. Firstly, we mainly compare the constructive heuristics and DaAM without PO. As shown in Tables 3 and 4 of the original paper, DaAM exceeds them on problem instances of Task300 and smaller ones, except for Task40. Especially, although DaAM is trained on Task100, it still surpass them on larger problem instances, Task200 and Task300. On Task400, Task500 and Task600, although DaAM did not lead due to lack of effective training, its gap with PS variants is very small. Secondly, we further leverage PO to optimize all constructive heuristics, as shown in Table 4 of the submitted PDF. Overall, PO consistently enhances the routes obtained by PS across all scales. On Task 20 and Task 60, PO achieves a gap reduction of 2%-5%, while the decrease in the gap is more modest on other scales.
Q5(abbreviation): The paper’s claims that NN-based methods lag in CARP are refuted by Rahmamoorty et al. (2024), who report an 11% improvement over memetic algorithms.
A5: Thanks for the insightful comment. The paper (Rahmamoorty et. al [1]) does not provide its implementation code publicly, nor does it outline the data partitioning scheme. Therefore, we are unable to reproduce its results or conduct a fair comparison. However, we will thoroughly describe its method and rectify any discrepancies in the final version of the paper.
[1] Ramamoorthy M, Syrotiuk V R. Learning heuristics for arc routing problems[J]. Intelligent Systems with Applications, 2024, 21: 200300.
Q6&Q11(abbreviation): How were the PS approaches parameterized?
A6&A11: Thanks for the insightful comment. In this paper, we apply the official setting for the parameter: of these PS variants. To be specific, PS-Ellipse's is 1.5, PS-Efficiency's is 3.0, PS-Alt1's is 3.0, PS-Alt2's is 1.0. In terms of iteration number, the paper [2] stated that "For all sets of instances, the average deviations obtained by PS-Efficiency and PS-Ellipse with k = 1000 are better than those obtained by PS-RC and PS-RE with k = 20000." Based on this, we employ 1000 iterations for smaller problem instances from Task20 to Task100. Therefore, on smaller-scale data, the parameters we adopted for PS variants are exactly the same as those in the paper [2]. For larger problem instances from Task200 to Task600, we adjust the iterations to 100 to balance running time and solution quality.
[2] Rafael Kendy Arakaki and Fabio Luiz Usberti. An efficiency-based path-scanning heuristic for the capacitated arc routing problem. Computers & Operations Research (COR) , 103:288–295, 2019.
Q7(abbreviation): The description of path improvement and the state definition in the Dynamic Programming algorithm are unclear, particularly what "f()" denotes.*
A7: Thanks for the valuable comments. The overall mechanism of path optimization can be understood as a "Route First, Cluster Second" mechanism. In the path optimization, we initially disregard capacity constraints and remove all depots from the original path sequence to obtain path . Subsequently, depots are replanned through dynamic programming to ensure the result meets capacity constraints. In dynamic programming, the state represents the maximum cost saving that can be achieved by replanning the depots' positions within the path sequence (in terms of travel cost). The value of is derived from the optimized results of its various subpaths, denoted as . We will clearly describe for this point in the final version.
Q8&Q9&Q13(abbreviation): Training time is not discussed at all.
A8&A9&A13: Thanks for the insightful suggestion! The training times and related discussion are given in the Genreal Response (Genreal A2).
Q10: Where does the implementation of the memetic algorithm come from?
A10: For MAENS, we use the code released by the original author of the paper on Github last year: https://github.com/meiyi1986/MAENS.
Q12: I think that for small instances, the approach by (Rahmamoorty et. al 2024) may surpass the results reported here, which should be mentioned.
A12: As mentioned in A5 above, the absence of a data partitioning scheme makes it difficult to make a fair comparison. Looking ahead, we aim to develop a learning-based solver inspired by Rahmamoorty et al.'s method, which we hope will outperform MA on small-scale CARP instances.
This paper presents a novel neural network-based CARP solver that uses a direction-aware attention model to incorporate directionality into the embedding process. It then applies supervised reinforcement learning for subsequent fine-tuning.
优点
- A learning-based CARP solver is proposed.
- The performance of the proposed solver on large-scale data is discussed.
缺点
- The comparison algorithms were published five years ago, and there is no discussion of existing methods aimed at big data.
- The experiments only tested the self-constructed dataset and did not evaluate on public datasets.
问题
1.How is the runtime for different algorithms defined in this paper? Does the training time for the learning-based algorithms take into account? 2.How can one assess the effectiveness of the arc features used in this paper in a more intuitive way?
局限性
The amount of data required for algorithm training needs to be discussed.
Q1: The comparison algorithms were published five years ago, and there is no discussion of existing methods aimed at big data.
Q2: The experiments only tested the self-constructed dataset and did not evaluate on public datasets.
A1&A2: The suggestions are greatly appreciated! The experimental results and discussions of more comparison method and public dataset are given in the Genreal Response A1.
Q3-1: How is the runtime for different algorithms defined in this paper?
A3-1: Thanks for the insightful comment. The mean runtimes per CARP instance for MAENS and PS variants have already been displayed in Fig 2 of the original paper. Additionally, we have tested the runtimes of S2V-DQN and VRP-DL implemented in our experimental section, as shown in Table R1.
Firstly, on smaller scale problem instances (Task 100 and smaller scales), DaAM (blue line) only cost less than 10% of the computation time of the advanced PS variants and VRP-DL. On larger scale problem instances (Task 200 and larger scales), DaAM's computation time still remains large advantage compared to them. While the time consumed by S2V-DQN is much higher than these methods. It should be noted that S2V-DQN and VRP-DL did not originally support this task, so we modified them to be compatible with CARP. Thus, the low efficiency may be due to the improper implementation. When using the CPU-dependent PO, the advantage in computation time relative to the PS variants is reduced, but it still maintains a computational time advantage on problem instances above Task 60.
To prove fairness, we will release all the data and source codes of the mentioned comparisons.
Table R1. Runtime (second) of different methods on our dataset | Instance | MAENS | PS | PS-Ellipse | PS-Efficiency | PS-Alt1 | PS-Alt2 | VRP-DL* | S2V-DQN* | DaAM | DaAM+PO | |:-------------|--------:|-----:|-------------:|----------------:|----------:|----------:|----------:|-----------:|-------:|----------:| | Task20 | 17 | 1 | 1 | 1 | 1 | 1 | 4 | 113 | 1 | 2 | | Task40 | 775 | 1 | 2 | 5 | 3 | 2 | 9 | 221 | 1 | 5 | | Task60 | 2671 | 1 | 7 | 12 | 8 | 7 | 13 | 310 | 1 | 7 | | Task80 | 5021 | 1 | 16 | 22 | 19 | 17 | 20 | 588 | 2 | 12 | | Task100 | 8587 | 1 | 25 | 35 | 30 | 26 | 28 | 721 | 3 | 19 |
Q3-2: Does the training time for the learning-based algorithms take into account?
A3-2: Thank you for the insightful comment! Indeed, the performance of learning-based solvers like DaAM, VRP-DL and S2V-DQN heavily depends on the training time, which ensures whether the model is adequately trained.
In terms of training time, the General Response A2 provides the training times of DaAM models. For the training time of other learning based solvers, we first ensure that all the models are trained until convergence, so that they are adequately trained for fairness. As a result, the training time of S2V-DQN on different instance sizes is 0.5h, 1.5h, 2.5h, 3.5h, and 6h respectively, and The training times of VRP-DL on different instance scales are 1 to 2 hours. In terms of technique, we carefully considered ways to reduce training time in the design of DaAM. As mentioned in the General Response A2, we take the idea of curriculum learning to reduce the training time of each training scale so that DaAM can be trained sufficiently in an afforable time cost.
Q4: How can one assess the effectiveness of the arc features used in this paper in a more intuitive way?
A4: Thanks for the insightful comment. The efficacy of our arc feature design can be demonstrated through ablation studies. Our arc feature is bifurcated into two components. The first encompasses essential information pertinent to CARP tasks, as detailed in the referenced literature:
- : Indicates whether is the depot.
- : Cost associated with .
- : Demand of .
- : Edge weight from to .
- : Service eligibility of at time t.
The second component involves using MDS to enhance low-dimensional information. The effectiveness of arc feature can be verified by Table 5 and Figure 3. The lack of MDS or other CARP descriptions will lead to slower model convergence, poor performance, and large solution quality fluctuations.
Q5: The amount of data required for algorithm training needs to be discussed.
A5: Thanks for the valuable comment. Due to the nature of deep learning, which requires extracting decision-making knowledge from data, the amount of training data is crucial for learning-based methods. In our implementation, all training data must be loaded into the GPU memory before training. As the problem scale increases, we have been compelled to reduce the number of instances used for training.
In our experiments, for problem instances of Task 20, Task 40, and Task 60, we trained the DaAM model using 5,000 to 10,000 instances. For Task 80 and Task 100, we used approximately 2,000 to 4,000 instances for training.
General Response:
Great thanks to all the reviewers for their time and effort in reviewing this paper. In this review, most reviewers wanted to see the experiments on this method with more comparison algorithms (from Reviewers NUZV, DrmY) and on more public datasets (from Reviewer NUZV), and also raised questions about the training time (from Reviewers NUZV, h6eF, UWHC) of the proposed method. Therefore, below, we answered these common questions one-by-one. Please do not hesitate to let us know if you have any further questions. We are willing to solve them in time.
General Q1: More comparison algorithms and more public datasets in the experiments.
General A1: Thanks for all the valuable comments! To more comprehensively demonstrate the effectiveness of our approach, we have searched more recent works. However, we find that PS-Efficiency still exhibits benchmark performance in constructive heuristic algorithms, while MAENS remains competitive in meta-heuristic algorithms. Most of the work in this area involves meta-heuristic algorithms and, unfortunately, either no code is made available, or only partial code is released, preventing us from testing these methods on our own datasets. Therefore, we employed publicly available datasets, specifically BCCM [1], and EGL [2] datasets. Note that due to the lack of a distinction between training and test sets in the public datasets, we were only able to use the best model trained from Task20 to Task100 to handle the EGL and BCCM datasets.
The experiment on BCCM dataset is shown in Table 1 of the submitted PDF. It is evident that our method performs comparably to both PS-Efficiency and PS-Ellipse, performing better on some instances and worse on others. On the BCCM dataset, the larger the instance number, the larger the scale of the scenario. Our method demonstrates a clear advantage in larger-scale instances (9A and larger scenarios) but performs slightly weaker than PS-Efficiency in smaller-scale instances (8C and smaller scenarios). This is primarily because small-scale data is often too sparse to allow the model to capture important features in the test scenarios, and it is more susceptible to randomness. Conversely, large-scale data is denser, which dilutes the impact of randomness, allowing the learning-based solver to perform more stably. We conducted comparisons on the EGL dataset with more methods and larger-scale instances, as shown in Table 2 of the submitted PDF . The results show that our method, PS-Ellipse, and PS-Efficiency each have their strengths and weaknesses. However, their performance is significantly lower than that of meta-heuristic algorithms that require more optimization operations, such as MAENS, QICA[3], MANSP[4], CARPET[5], LMA[6], MA-ABC[7], and HMA.
These experiments illustrates that learning-based solvers could gain advantages from larger-scale and more diverse data. Therefore, in future works, we will attempt to enlarge the training set to include multiple cities, synthetic and real data, and multi-scale data, which might futher improve the generalization of using one model to handle all CARP instances.
[1] Benavent E, Campos V, Corberan A, Mota E. The capacitated arc routing problem: lower bounds. Networks 1992;22:669–90.
[2] R.W. Eglese, Routeing winter gritting vehicles, Discrete Appl. Math. 48 (3) (1994) 231–244.
[3] R. Shang, B. Du, K. Dai, L. Jiao, A.M.G. Esfahani, R. Stolkin, Quantum-inspired immune clonal algorithm for solving large-scale capacitated arc routing problems, Memetic Comput. 10 (1) (2018) 81–102.
[4] Li, Rui, et al. "Memetic algorithm with non-smooth penalty for capacitated arc routing problem." Knowledge-Based Systems 220 (2021): 106957.
[5] A. Hertz, G. Laporte, M. Mittaz, A tabu search heuristic for the capacitated arc routing problem, Oper. Res. 48 (1) (2000) 129–135.
[6] P. Lacomme, C. Prins, W. Ramdane-Cherif, Competitive memetic algorithms for arc routing problems, Ann. Oper. Res. 131 (1–4) (2004) 159–185.
[7] Ramamoorthy, Muhilan, Stephanie Forrest, and Violet R. Syrotiuk. "MA-ABC: a memetic algorithm optimizing attractiveness, balance, and cost for capacitated Arc routing problems." Proceedings of the Genetic and Evolutionary Computation Conference. 2021.
General Q2: How long is the training time of the proposed method?
General A2: Thanks for all the valuable suggestions! To more clearly explain the training duration of the DaAM model, it is essential to clarify its training method. As mentioned in lines 260-261 of the paper: "We utilize the training results obtained from the preceding smaller-scale dataset to initialize the model." This approach is a type of curriculum learning, where the model first learns from simpler instances (small-scale) before progressively tackling more challenging ones (large-scale). For example, the DaAM model for Task 20 is trained on Task 20's training data using SL+RL (Supervised Learning + Reinforcement Learning), whereas the model for Task 40 is trained using reinforcement learning on the already trained Task 20 model. In the same way, the model for Task 100 is trained on the Task 80 model using reinforcement learning.
Under this training strategy, the training time for DaAM is shown in Table 3 in the submitted PDF. The '+' symbol indicates the additional training time required at each stage, building upon the previous model. It is evident that as the scale of instances increases, the training duration for the current stage also increases. However, this remains within an acceptable range and is significantly less than the training times observed in other tasks. It only takes 0.5h to train a DaAM for Task 20, while it takes 7.8h totally to train a DaAM for Task 100.
The reviewers generally side against this paper, although one reviewer gives it a weak accept. I propose to reject this paper for the following reasons: the paper is not well written, in fact the awkward phrasing even starts with the odd title. The novelty is disputed between the reviewers. I tend to side with those who say the paper is not novel enough -- adding the edge direction to the embedding is basically the only real contribution of the work. Finally, the work is very specific to a single optimization problem with no real hope of generality to other areas of AI, making it not of significant interest to the NeurIPS community.