Complexity Scaling Laws for Neural Models using Combinatorial Optimization
Neural scaling laws fix the problem and scale the model/compute/data, we fix the model and scale the problem.
摘要
评审与讨论
Using the TSP as a target problem, this study extends existing work on data, parameter, and compute scaling laws to scaling laws based on problem complexity. The problem complexity is varied along both solution space size and input space size. For both SFT and RL trained models, the suboptimality gap smoothly decays approximately according to a power law with more model parameters or more compute. The authors find superlinear suboptimality growth with problem size, but plateauing suboptimality growth with spatial dimensions.
优缺点分析
Strengths:
- When comparing the neural laws with local search laws, there are interpretable findings and interesting hypotheses (e.g. similarity of cost landscapes when neural models have constrained capacity).
- Rigorous and reasonable experimental setup.
- Release of the TSP solution datasets would be a good contribution.
- Good discussion of results in the context of other neural scalings laws. It is quite interesting that SFT and RL here both scale according to power laws, and that for RL the scaling is with respect to the used reward metric.
- Very interesting discussion that is grounded in theory of how superlinearity of suboptimality gap in number of nodes must end at some point, meaning that fit scaling laws would be pessimistic at large problem sizes. In general, nice theoretical results that are relevant to the empirical analysis.
Weaknesses:
- The studied model and problem (TSP) is not that relevant to the rest of the deep learning literature. Even when considering prior works on deep learning for TSP, there are a few changes to e.g. the Kool et al. model that make it difficult to compare with prior work there.
问题
Does your work point to any data / compute / problem size regimes where deep learning approaches to solving TSP would be practically useful in some sense?
局限性
yes
最终评判理由
Great work overall, and my questions are generally answered. accept!
格式问题
n/a
Thank you for the detailed review and your thorough decomposition. We appreciate your recognition of the interest and interpretability of our findings, the rigor of our evaluations, and the value of our dataset for future work.
Good discussion of results in the context of other neural scaling laws. It is quite interesting that SFT and RL here both scale according to power laws, and that for RL the scaling is with respect to the used reward metric.
We are also interested in the similarity of the RL/SFT curves and would be excited about future work further evaluating algorithm insensitivity under model capacity bottleneck.
Tasks often don’t support scaling laws for RL directly w.r.t. return, making TSP (and speculatively, combinatorial optimization at large) a great testbed for researchers who want to study return as a natural performance metric.
The studied model and problem (TSP) is not that relevant to the rest of the deep learning literature. Even when considering prior works on deep learning for TSP, there are a few changes to e.g. the Kool et al. model that make it difficult to compare with prior work there.
We agree that the model architecture presents such challenges. We explain the motivation and discuss design details in Appendix G (due to space limitations in the main paper). We also provide hyperparameter optimization results in Appendix H and are open sourcing the code to help mitigate these concerns.
Does your work point to any data / compute / problem size regimes where deep learning approaches to solving TSP would be practically useful in some sense?
We plan to add a short Practical implications paragraph to the discussion in the camera-ready version to address this important question.
Algorithms that excel at solving TSP usually excel at other routing problems like the Vehicle Routing Problem (VRP) and Orienteering Problem (OP) [1], where the Kool et al. paper briefly introduces the relevance of routing problems to real-world problems. More recent work integrating the advantages of classical Operations Research (OR) algorithms with deep learning also emphasizes practical application [2].
Regarding parameter-constrained complexity scaling laws, we foresee some practical use cases, although they take more consideration to identify given that training until convergence is usually compute-inefficient. One use case would be predicting the limit of performance when primarily constrained by model size, e.g. edge applications requiring very fast inference.
References:
- Wouter Kool, Herke van Hoof, and Max Welling. Attention, Learn to Solve Routing Problems! In International Conference on Learning Representations, 2019.
- Wouter Kool, Herke van Hoof, Joaquim Gromicho, and Max Welling. Deep Policy Dynamic Programming for Vehicle Routing Problems. In International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research, 2022.
Thanks! Once again, great, well-executed work, and thanks for the discussion. I definitely support accepting this paper.
This work proposes scaling laws in terms of tasks and solutions in contrast to the common use of training settting details such as FLOPs or parameters. They demonstrate their process on a common graph based problem at multiple operating points. This analysis can serve as an alternative and complementary process in the development of agent-esque decision frameworks.
优缺点分析
Strengths
- Principled and novel perspective on a salient problem in broader optimisation, particularly given recent concerns regarding the efficacy of PPO etc.
- Method is thoroughly investigated with several operating points
- Findings are well formed and explored deeply
Weaknesses
- I have some concerns about quantifying the solution space for real world problems akin to set shattering. Could the authors further qualify this point as a constraint to the work with respect to real world application.
- Domain is narrow, though deep, broader insights are hindered, potentially if parameter size is fixed another synthetic investigation could be conducted
问题
Results appear to use a single seed per model size. How stable are the fitted exponents across seeds? The authors note the super-linear node law must “break” before ~40k cities. Can any empirical evidence beyond the small plotted frontier show where deviation begins? Figure 3 suggests SFT achieves the same gap with 3 times less compute. have the authors performed statistical tests or quantified intervals to confirm this advantage is statistically reliable
局限性
yes
最终评判理由
I had already given a positive score, and outstanding points primarily serve to probe the broader application of the approach. As such, I maintain my score as I see little concern to change my current judgement.
格式问题
None
Thank you for the review and your thoughtful connections and questions. We appreciate your recognition of the novelty and salience of our work, the rigor of our evaluations, and the depth to which we investigated this topic.
I have some concerns about quantifying the solution space for real world problems akin to set shattering. Could the authors further qualify this point as a constraint to the work with respect to real world application.
Would you be willing to elaborate on this request? We are very interested in discussing how our work connects to approaches for quantifying solution spaces in real world problems.
Domain is narrow, though deep, broader insights are hindered, potentially if parameter size is fixed another synthetic investigation could be conducted.
We agree broader insights are hindered by focusing on Euclidean TSP. An important next step for future work is demonstrating that complexity scaling laws are more broadly applicable across domains. Evaluating similar combinatorial optimization problems is straightforward, but there is further value in determining how to evaluate real-world problems where solution space complexity and representation space complexity are not independently scalable.
We plan to make two revisions for the camera-ready version which further emphasize the importance of this future work.
- Include specific examples of relevant future work at the end of the Beyond TSP scaling paragraph in Section 7.
- Elaborating on “interpreting and relating findings between domains will be more difficult” in the beginning of the Complexity scaling law theory paragraph to more precisely specify those gaps (along with how future work can address them).
The authors note the super-linear node law must “break” before ~40k cities. Can any empirical evidence beyond the small plotted frontier show where deviation begins?
We observe no sign of deviation at the evaluated node scales, so we expect that scaling law to not break immediately after 50 nodes. We also do not expect the break to occur suddenly at ~40k nodes. So, to limit compute requirements, future work could empirically estimate where breakdown occurs using a binary search of that node space, perhaps in log-space to provide more resolution to smaller node scales which require less training. We can add a brief mention of this to the final paper.
Figure 3 suggests SFT achieves the same gap with 3 times less compute. have the authors performed statistical tests or quantified intervals to confirm this advantage is statistically reliable
Providing confidence intervals is difficult here given our single-seed compute limitations. Figure 8 in Appendix B quantifies what this sample efficiency advantage looks like temporally (note that SFT’s plot has different x-axis scaling), demonstrating that the advantage is consistent over training. We believe this is an important piece of future work. After obtaining multi-seed results, one could measure the updates or compute required to reach successive target suboptimality scores, then provide confidence intervals for when those target scores are achieved, comparing between RL and SFT. We can add a summary of this future work proposal during revisions.
Results appear to use a single seed per model size. How stable are the fitted exponents across seeds?
Training on multiple seeds per run would be preferred, though this is a common shortcoming of neural scaling laws research due to computational constraints. To address this question, we performed jackknife (leave-one-out) resampling for node scaling, spatial dimension scaling, and parameter scaling regression fits. We plan to include relevant statistics in Appendix A of the camera-ready version. Leave-one-out distributions are distinct from those obtained using multiple seeds. However, this addition documents each fit’s sensitivity to individual training runs. Node-scaling exponent fits are relatively sensitive to the suboptimalities obtained at the largest node scales (as expected). Otherwise, fitted constants have low variance. Numerical details are provided below. For the camera-ready version we will also report leave-one-out statistics for compute scaling and optimal model size regression fits.
For node scaling, when fixing the offset, we obtain RL exponents of compared to the full-data when holding out node scales of , respectively. We obtain SFT exponents of compared to the full-data when holding out node scales of . Thus, is sensitive to values obtained at the largest node scales. Fitting the offset yields comparable results for both and over intermediate node scales but yields larger differences when holding out the edges; for example, for 5-node SFT and for 45-node RL. However, these fits shift the offset by 2-3 nodes and we know near optimality is achieved at 5 nodes (roughly suboptimality for both RL and SFT).
Leave-one-out fits for spatial dimension scaling and parameter scaling are quite stable. asymptotes in the former are nearly constant (roughly ±0.001). Base and exponent fits have the following mean and standard deviation of absolute difference (w.r.t. the full-data fit after ‘=’):
- RL 10-node dimension scaling: =1.18 ± 0.005 (0.006 SD)
- RL 20-node dimension scaling: =1.66 ± 0.016 (0.017 SD)
- RL parameter scaling: =0.582 ± 0.008 (0.011 SD)
- SFT parameter scaling: =0.712 ± 0.012 (0.011 SD)
Thank you for the thorough response, clarifying the first point
I have some concerns about quantifying the solution space for real-world problems akin to set-shattering. Could the authors further qualify this point as a constraint to the work with respect to real-world application?
In estimating generalisation, one can look to VC theory and the VC dimension of a function, being the magnitude of the space of sets it can shatter. In rough terms, with respect to inference, the problems it can model. Your existing approach assumes such a value can be computed or parameterised. Still, the solution space of complex problems modelled by complex models can often become very large or can only be loosely bounded. I wanted to discuss this point as it may limit the practical application; however, if you can incorporate some discussion of this point, the practical application may be increased.
My review was already positive, so I will maintain my score and hope the above is helpful in clarifying my point.
Yes, this is helpful context, thank you for clarifying. Viewing parameter-constrained solution space scaling through the lens of the model’s VC dimension is an interesting connection. We will include relevant considerations alongside the generalization challenges discussed toward the end of the Beyond TSP scaling paragraph.
This paper provides a neural-scaling analysis to task complexity by quantifying how solution quality degrades as (i) the TSP instance size grows and (ii) the embedding dimension increases. Across extensive supervised and RL experiments on Euclidean TSP, error follows a power-law with node count and converges exponentially with dimensionality. Futhermore, this paper analyze with the classic 2-opt heuristic to mirror these trends.
优缺点分析
Pros:
- It is interesting to explore the scaling law in the context of COPs.
- The dataset of 128 million optimal TSP solutions will be helpful for future research.
Cons:
- The paper is relatively hard to understand.
- While the findings offer useful guidance for model capacity, they rest on a single benchmark ( Euclidean TSP) and sparsely seeded runs, so the proposed "complexity-scaling laws" remain provisional for richer combinatorial tasks.
- Although this paper provides some insights, I feel like the empirical evidence is too limited and the comparisons too uneven to substantiate the strong, general claims made.
- To make the 2-opt curves resemble those of the neural models, the authors cap the number of swaps. However, removing this cap will change the error trajectory entirely, so the observed similarity is an artifact of the experimental setup, which may not evidence that both methods share the same optimization landscape.
- The paper repeatedly infers that similar error curves imply shared underlying mechanisms and hence general laws. It would be helpful to provide some theorectical justifications and controlled counter-examples.
- Any justification for the relation between the TSP problem size and complexity?
- It would be better to conduct the significance tests and provide variance across seeds.
问题
Listed above
局限性
Limitations are discussed in Section 7.
最终评判理由
The topic is novel, and it is indeed interesting to explore the scaling law in combinatorial optimization.
One limitation of this paper lies in the generality of the proposed scaling law, as also pointed out by Reviewer 3C61. As a researcher in combinatorial optimization, particularly on complex routing problems, I often encounter situations where conclusions that hold well for the very simple TSP break down completely when even a single constraint is added. I believe this is a fundamental concern that the current manuscript cannot fully address.
In addition, there are some concerns regarding the writing of the paper, although the authors promise substantial rewriting.
For these reasons, I lean toward a borderline rejection of this paper.
格式问题
NA
Thank you for the detailed review, this is useful feedback. We agree that exploring scaling laws using combinatorial optimization is interesting and we are excited for future work that can build on top of our dataset.
The paper is relatively hard to understand.
Clarity was a major consideration for the authors. Given that we rely on a combination of empirical evidence and analysis for several claims, there was a tension between readability and conciseness that was difficult to balance in the main paper. Would you be willing to elaborate with example passages or claims that were unclear? Further addressing these concerns and improving the readability of our paper would be a priority for us during camera-ready revisions.
While the findings offer useful guidance for model capacity, they rest on a single benchmark (Euclidean TSP) and sparsely seeded runs, so the proposed "complexity-scaling laws" remain provisional for richer combinatorial tasks.
We agree that the proposed “complexity scaling laws” remain provisional for richer combinatorial tasks, and we can further emphasize this point during revisions. Focusing on Euclidean TSP as a case study allowed us to provide in-depth supporting analysis and reduced the compute required for model training. We intended to answer whether the complexity scaling laws hypothesis should be further explored at all (to which we argue the answer is yes). Future work must demonstrate smooth suboptimality with respect to fundamental measures of complexity in other problems before complexity scaling laws can be argued as a general principle.
We propose two discussion revisions to further emphasize the limitations of using Euclidean TSP as a case study.
- Include specific examples of relevant future work at the end of the Beyond TSP scaling paragraph in Section 7.
- Elaborating on “interpreting and relating findings between domains will be more difficult” in the beginning of the Complexity scaling law theory paragraph to more precisely specify those gaps (along with how future work can address them).
To make the 2-opt curves resemble those of the neural models, the authors cap the number of swaps. However, removing this cap will change the error trajectory entirely, so the observed similarity is an artifact of the experimental setup, which may not evidence that both methods share the same optimization landscape.
We intentionally sought a local search algorithm that reproduced the scaling forms of parameter-constrained models, arriving at depth-constrained 2-opt. The similarity is indeed dependent on the search depth constraint. We argue that this approach is beneficial for informing future work on scaling law interpretability, given that the depth constraint is critical for alignment when scaling nodes (and the relationship between performance and that depth constraint is interpretable).
It is also important to clarify that we are not claiming that Figure 5 provides evidence of a shared optimization landscape. In the opening paragraph of Section 5: “These similarities do not directly imply that model inference and local search share underlying mechanisms. Instead, they provide potential topics to explore in future work, which we identify throughout this section.” We introduce Section 5 this way since the connections drawn are informal conjectures, and we attempt to appropriately scope related claims. By providing an interpretable reference algorithm with aligned complexity scaling behavior, these results and conjectures may help develop a formal theory for parameter-constrained deep learning in future work. We will further clarify this purpose of the 2-opt experiments for the camera-ready version.
The paper repeatedly infers that similar error curves imply shared underlying mechanisms and hence general laws. It would be helpful to provide some theoretical justifications and controlled counter-examples.
If this comment is referencing the local search analogy in Section 5, we state that similar suboptimality curves do not directly imply shared underlying mechanisms (see previous paragraph).
Regarding Figure 3, when comparing RL and SFT, we agree the term “law” can be misleading (especially for those who are less familiar with neural scaling laws research) given 1) we do not understand the conditions that lead to that scaling form and 2) we do not understand the degree to which the model-capacity bottleneck is algorithm insensitive (our results suggest some degree of insensitivity, but compare only two algorithms with single-seeded runs, hence why we avoid claiming that is strong evidence for a general case).
We argue that this is a broader terminology concern since both points above apply to neural scaling laws as well until the community arrives at consensus on theory explaining them. For a concrete example, Sorscher et al. [1] show that for data-constrained training an exponential decay of test error w.r.t. data can be achieved using data pruning. Thus, power law scaling w.r.t. data is not truly a “law” over arbitrary data distributions. We adopt the term for complexity scaling laws to make the connection to neural scaling laws immediately recognizable: simple macroscopic relationships that appear relatively insensitive to model and algorithm details.
It is also worth noting that several neural scaling law theories partially explain algorithm insensitivity (among gradient descent algorithms) for parameter-constrained convergence: resolution-limited underparameterized parameter scaling [2, 3] and model quanta capacity [4]. Since our proposed complexity scaling laws are similarly underparameterized, it seems reasonable to hypothesize that RL and SFT node-scaling behavior share an underlying mechanism.
Theory and controlled counter-examples are important next steps for establishing stronger claims. Using data pruning as a reference counter-example for neural scaling laws, we propose extending the Complexity scaling law theory paragraph (in Section 7) to suggest investigating counter-examples in problem complexity scaling as future work (along with providing theory explaining the distinction in scaling regimes).
Any justification for the relation between the TSP problem size and complexity?
Would you be willing to clarify this question? For example, do you mean the relationship between TSP node scale and solution space complexity? Or the relationship between solution space complexity and instance difficulty in TSP?
It would be better to conduct the significance tests and provide variance across seeds.
We agree that training on multiple seeds per run would be ideal, however, this is a common shortcoming of neural scaling laws research. Fitting a single seed per scale is typical due to both parallel compute requirements (many runs) and serial compute requirements (expensive runs). We detail compute requirements for this work in Appendix F. We are releasing our code in part to facilitate the reproduction and extension of our experiments using several seeds.
We have also performed jackknife (leave-one-out) resampling for node scaling, spatial dimension scaling, and parameter scaling regression fits. We plan to include relevant statistics in Appendix A of the camera-ready version. Leave-one-out distributions are distinct from those obtained using multiple seeds. However, this addition documents each fit’s sensitivity to individual training runs. Node-scaling exponent fits are relatively sensitive to the suboptimalities obtained at the largest node scales (as expected). Otherwise, fitted constants have low variance.
We provide numerical results at the end of our rebuttal for Reviewer SbNi, if you would like to read the details. For the camera-ready version we also plan to report leave-one-out statistics for compute scaling and optimal model size regression fits.
References:
- Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, and Ari Morcos. Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning. In Advances in Neural Information Processing Systems, 35:19523–19536, 2022.
- Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, and Utkarsh Sharma. Explaining Neural Scaling Laws. In Proceedings of the National Academy of Sciences, 121(27):e2311878121, 2024.
- Blake Bordelon, Alexander Atanasov, and Cengiz Pehlevan. A Dynamical Model of Neural Scaling Laws. In Proceedings of the 41st International Conference on Machine Learning, pages 4345–4382, 2024.
- Eric Michaud, Ziming Liu, Uzay Girit, and Max Tegmark. The Quantization Model of Neural Scaling. In Advances in Neural Information Processing Systems, 36:28699–28722, 2023.
Thank you for the response.
After reading the authors’ rebuttal and the comments from other reviewers, most of my concerns have been addressed. One of the remaining key concerns is the generality of the proposed scaling law: how well does it extend to other COPs and different experimental settings? While I acknowledge that the authors have touched on this in the discussion section and framed it as future work, I would still encourage them to elaborate further. Although recent work has shown that the TSP landscape shows coarse convexity, TSP remains relatively very simple compared to other COPs. Regarding Cons #4, I was referring specifically to the relationship between the scale of TSP instances (i.e., number of nodes) and the complexity of the solution space.
Thank you for the additional feedback, we are glad the rebuttal addressed most of your concerns.
We plan to specify potential future work toward testing the generalization of complexity scaling laws (and elaborate on anticipated challenges) during revisions. Alongside the two bulleted revision proposals provided in our rebuttal, we will also highlight other considerations regarding Euclidean TSP’s relative simplicity. For example, Euclidean TSP’s cost metric satisfies the triangle inequality, and avoiding edge intersections is a necessary condition for optimality in 2D (pruning much of the solution space). We agree that these considerations should be further emphasized beyond the existing discussion at the end of the Limitations paragraph (where we mention polynomial-time approximation schemes).
Regarding Cons #4, I was referring specifically to the relationship between the scale of TSP instances (i.e., number of nodes) and the complexity of the solution space.
Thank you for clarifying. We mention this relationship in Section 2.5’s Problem complexity scaling paragraph, where the solution space size is justified with a brief footnote. We will provide more details in the camera-ready version.
Thank you for the response. I will increase my rating.
The paper introduces complexity scaling laws: empirical laws that predict a fixed-capacity model’s sub-optimality as the problem (not the compute budget) grows. Using the Euclidean Traveling-Salesman Problem, the authors separate two axes of complexity—solution-space size (number of nodes) and representation-space size (embedding dimension). They show (i) classical neural scaling laws in this domain, and (ii) smooth, super-linear or bounded trends in sub-optimality as problem complexity increases, holding parameters constant. Comparable trends emerge from simple 2-opt local search, suggesting an interpretable link between learned inference and cost-landscape properties.
优缺点分析
strengths: 1- Using TSP to evaluate scaling laws is interesting and, to me at least, a novel approach to investigate deep learning scaling. This can also enable better interpretability. 2- Somewhat thorough evaluation. 3- Open-dataset and code.
weaknesses: My main skepticism is whether the conclusions truly transferable to more complex problems? TSP is a nice testbed but I am often skeptical of the utility of such theoretical investigations to real-world more complex cases.
Overall though, I enjoyed reading the paper and find that it can be valuable in better understanding scaling - an important (if not the most important) aspect of modern DNNs.
问题
How can you guarantee extrapolation of your conclusions beyond TSP?
局限性
N/A
最终评判理由
I found this paper to be interesting and novel and I maintain my positive review post-rebuttal.
格式问题
N/A
Thank you for the review and for your thoughtful feedback. To the best of our knowledge, we are indeed first to investigate scaling laws using combinatorial optimization problems, which we hope will provide a useful testbed for exploring scaling law theory (both neural scaling and complexity scaling). Sharing our dataset and code is intended to assist in that pursuit.
My main skepticism is whether the conclusions are truly transferable to more complex problems? TSP is a nice testbed but I am often skeptical of the utility of such theoretical investigations to real-world more complex cases.
We agree that the applicability and utility of complexity scaling laws for real-world tasks is currently speculative. Beyond TSP, our results simply suggest that the complexity scaling laws hypothesis should be further explored. Complexity scaling laws can be argued as a general principle only after demonstrating they broadly apply to a diverse set of tasks, including those where solution space complexity and representation space complexity are not independently scalable.
We plan to make two revisions for the camera-ready version of the paper which further emphasize the importance of addressing this question in future work.
- Include specific examples of relevant future work at the end of the Beyond TSP scaling paragraph in Section 7.
- Elaborating on “interpreting and relating findings between domains will be more difficult” in the beginning of the Complexity scaling law theory paragraph to more precisely specify those gaps (along with how future work can address them).
How can you guarantee extrapolation of your conclusions beyond TSP?
Guaranteeing generalization requires explaining the underlying behavior. Extending the future work mentioned above, one potentially useful avenue would be identifying problems where smooth parameter-constrained complexity scaling laws emerge but with different scaling forms. For example, suppose that scaling nodes and spatial dimensions in the Convex Hull problem resulted in divergent linear growth and convergent power growth of suboptimality, respectively. A theory explaining these distinctions from TSP complexity scaling may explain some (or all) of the distinctions between TSP and next-token prediction.
Thanks for the rebuttal. I maintain my positive score.
This paper proposes the study of complexity scaling laws in deep learning for combinatorial optimization, using the Euclidean traveling salesperson problem (TSP) as a testbed. More specifically, the authors separate solution-space and representation-space complexity, showing empirical suboptimality trends under parameter constraints. Finally, the paper also provides an analysis of parameter-constrained scaling laws through an interesting examination of local search.
This is an interesting and novel paper. Reviewers specifically noted the excellent presentation and the very extensive and rigorous experiments to assess their scaling laws, which I strongly commend the authors for. The reviewers and authors also had a productive discussion, clarifying minor points on the numerical analysis or instance generation. The main limitation identified, however, was the restricted scope of experiments, with concerns that conclusions may not generalize beyond the relatively simple Euclidean TSP. Thus, although most reviewers leaned positively, the paper was still sitting on a borderline score.
After carefully considering the review reports and re-reading the paper, I agree with the positive comments and believe this is an exceptionally well-executed study and a good fit for NeurIPS. In particular, rarely scaling laws have been studied in such a depth, and the TSP presents a nice case study with well-known (but complex) structure that allows for a first study of this form. The extensive numerical reports and reproducibility in sharing code are also a plus here.
I do, however, share the concern with reviewer r11d that the reliance on the Euclidean TSP as the sole testbed represents a limitation; it would be important if authors could include this in their discussion, and I do encourage the authors to extend their framework to more complex problem classes in the future.