Thank you for your insightful comments. We appreciate the opportunity to clarify several aspects of our work. Below, we address the concerns raised by the reviewers and provide additional explanations and evidence where appropriate.

How is the multiplier chosen? What impact does the choice have on the performance of the algorithm?

Theoretically, should be sufficiently large to enforce the acyclicity constraint. However, in practice, since candidate solutions are derived by solving linear assignment problems, the trace term becomes significantly large whenever a loop exists in the solution. As a result, even a moderate value of is effective at penalizing cycles.

In our experiments, we fixed as it provides a good balance between constraint enforcement and optimization stability. We also tested larger values, such as , and observed no significant difference in performance. This suggests that our algorithm is relatively insensitive to the exact value of , as long as it is reasonably large.

How dependent is the output on the chosen initialization? Would the algorithm benefit from re-running using different initializations?

We experimented with both random and fixed initialization strategies and found that the choice of initialization has minimal impact on the final solution quality. This is likely due to the stability and robustness of our optimization procedure, which consistently converges to high-quality solutions regardless of the starting point. For simplicity and reproducibility, we therefore adopt a fixed initialization rule in all our experiments.

How does the approximate gradient computation affect the output of the algorithm? Is the purpose of the approximation purely to improve runtime or does it also improve the final output of the algorithm?

We have experimentally evaluated the impact of using approximate versus exact gradients. In particular, when applying exact gradient computation on a 20-node symmetric TSP, we observed a performance gap of approximately 1.5% relative to LKH-3—significantly larger than the gap achieved when using the approximate gradient.

This suggests that the purpose of the approximation is not purely for runtime efficiency, but that it also has a positive effect on the final solution quality. One possible explanation is that the use of inexact gradients introduces a form of implicit regularization or stochasticity that encourages better exploration of the solution space. In contrast, exact gradients may lead to faster convergence but can get trapped in poorer local minima.

Therefore, while the approximation does speed up computation, it also empirically improves the robustness and effectiveness of the algorithm.

On step size adaptation and stability in the optimization process

Thank you for the thoughtful question. We clarify that in our method, the adaptive step size strategy is designed to balance exploration and convergence speed during optimization.

In our setting, decreasing the step size typically causes the algorithm to converge more quickly to a nearby local minimum. While this can improve numerical stability, it may lead to suboptimal solutions due to premature convergence.

Conversely, increasing the step size enables the optimizer to explore a broader region of the solution space, potentially escaping poor local minima and discovering better solutions. However, this also introduces a risk of oscillation or divergence, which is why careful control is required.

Our adaptive rule increases the step size when the current assignment remains unchanged (suggesting that a larger step may help move away from a plateau), and decreases it when a change is detected (to avoid overshooting). This dynamic adjustment aims to maintain a balance between exploration and stability.

Here, by stability, we refer to the algorithm’s ability to maintain progress toward a feasible and high-quality solution without oscillation or divergence—not necessarily convergence to a global or even local minimum in the classical sense.

We hope this clarifies the intention behind our step size strategy.

How does the presented algorithm perform on benchmark instances from, e.g., TSPLIB?

We evaluated our algorithm on large-scale ATSP benchmark instances from TSPLIB, particularly those derived from stacker crane problems. Our method consistently outperforms LKH-3 (1–100) in terms of solution quality while also requiring less computational time. These results demonstrate the practical effectiveness of our approach on real-world structured benchmarks, further validating its scalability and competitiveness beyond synthetic instances.

Instance	Best Known Result	Algorithm	Running Time (s)	Results
rbg323	1326	LKH(1–100)	1.46	1388
		LKH(1–10000)	3.38	1346
		LKH(10–10000)	20.32	1346
		Ours	0.42	1360
		Ours (no 2-opt)	0.20	1365
rbg358	1163	LKH(1–100)	1.67	1294
		LKH(1–10000)	4.03	1175
		LKH(10–10000)	25.16	1175
		Ours	0.09	1180
		Ours (no 2-opt)	0.04	1180
rbg403	2465	LKH(1–100)	6.27	2536
		LKH(1–10000)	26.99	2498
		LKH(10–10000)	9.06	2498
		Ours	1.53	2473
		Ours (no 2-opt)	1.28	2473
rbg443	2720	LKH(1–100)	5.44	2813
		LKH(1–10000)	7.87	2762
		LKH(10–10000)	30.06	2756
		Ours	0.52	2760
		Ours (no 2-opt)	0.27	2760

Proposition 1 states that any fractional matrix cannot satisfy the constraint. Does this not imply that the feasible solutions to the relaxed problem defined in Eq. (3) and of the original problem definition in Eq. (1) are the same, and therefore (3) is not a relaxation of (1), but an equivalent formulation?

Yes, Proposition 1 implies that the continuous formulation in Eq. (3) is an exact representation of the original TSP defined in Eq. (1), in the sense that both share the same set of feasible solutions—namely, permutation matrices corresponding to valid tours.

However, we still refer to Eq. (3) as a relaxation in the context of optimization because it replaces the discrete permutation constraint with continuous doubly stochastic and trace-based constraints. While the feasible set remains unchanged, this formulation enables the use of gradient-based optimization techniques over continuous variables. It is worth noting that, despite being continuous, Eq. (3) remains non-convex and NP-hard in general.

Therefore, Eq. (3) should be viewed as a continuous but exact reformulation of the TSP, offering new algorithmic opportunities while preserving the problem's combinatorial structure.

Comparison with ILP Solver

We conducted a preliminary comparison between our method and an exact ILP solver on both symmetric and asymmetric TSP instances. Specifically, we tested the solver on 100-node symmetric problems and 500-node asymmetric problems and found that the ILP solver was able to return optimal solutions within a reasonable time budget.

Due to time constraints, we evaluated both methods on 10 randomly sampled instances for each case. The results are summarized below. While the ILP solver guarantees optimality, our method achieves competitive solution quality with significantly faster runtimes, particularly for larger asymmetric instances—highlighting its scalability and practical utility for large-scale problems.

Performance comparison on 100-node symmetric TSP problems

Algorithm	Running Time (s)	Ratio to Optimal (Mean)	Ratio to Optimal (Median)
LKH(1–100)	0.07	0.61%	0.17%
LKH(1–10000)	1.24	0.12%	0.00%
LKH(10–10000)	12.14	0.00%	0.00%
Gurobi	2.06	0.00%	0.00%
Ours (no local search)	0.03	15.2%	14.4%
Ours (local search)	0.05	1.47%	1.35%

Performance comparison on 500-node asymmetric TSP problems

Algorithm	Running Time (s)	Ratio to Optimal (Mean)	Ratio to Optimal (Median)
LKH(1–100)	0.87	2.22%	0.17%
LKH(1–10000)	2.82	0.45%	0.00%
LKH(10–10000)	20.23	0.18%	0.00%
Gurobi	6.36	0.00%	0.00%
Ours (no local search)	0.17	0.005%	0.001%
Ours (local search)	0.30	0.005%	0.001%

Results of GOAL

We carefully examined the implementation of GOAL and found that it is trained to solve the Open Loop TSP, where the tour does not require returning to the starting node. Additionally, the official evaluation code for GOAL omits the cost of returning to the origin, which leads to an underestimation of the total tour length.

As a result, the reported performance of GOAL is not directly comparable to other methods in our evaluation, which all solve the Closed Loop TSP (i.e., Hamiltonian cycles). We note this discrepancy to ensure a fair and transparent interpretation of the benchmark results.