PaperHub
6.8
/10
Poster4 位审稿人
最低4最高5标准差0.4
4
4
5
4
2.8
置信度
创新性3.0
质量3.0
清晰度2.8
重要性3.0
NeurIPS 2025

Fractional Langevin Dynamics for Combinatorial Optimization via Polynomial-Time Escape

OpenReviewPDF
提交: 2025-05-02更新: 2025-10-29

摘要

Langevin Dynamics (LD) and its discrete proposal have been widely applied in the field of Combinatorial Optimization (CO). Both sampling-based and data-driven approaches have benefited significantly from these methods. However, LD's reliance on Gaussian noise limits its ability to escape narrow local optima, requires costly parallel chains, and performs poorly in rugged landscapes or with non-strict constraints. These challenges have impeded the development of more advanced approaches. To address these issues, we introduce Fractional Langevin Dynamics (FLD) for CO, replacing Gaussian noise with $\alpha$-stable L\'evy noise. FLD can escape from local optima more readily via L\'evy flights, and in multiple-peak CO problems with high potential barriers it exhibits a polynomial escape time that outperforms the exponential escape time of LD. Moreover, FLD coincides with LD when $\alpha = 2$, and by tuning $\alpha$ it can be adapted to a wider range of complex scenarios in the CO fields. We provide theoretical proof that our method offers enhanced exploration capabilities and improved convergence. Experimental results on the Maximum Independent Set, Maximum Clique, and Maximum Cut problems demonstrate that incorporating FLD advances both sampling-based and data-driven approaches, achieving state-of-the-art (SOTA) performance in most of the experiments.
关键词
Franctional Langvin DynamicsCombinatorial Optimization

评审与讨论

审稿意见
4

This paper studies combinatorial optimization and Langevin Dynamics and proposes FLD for CO.

Theoretically the authors demonstrate that their algorithm enhances exploration and improves convergence.

Empirically, results on several problems (the Maximum Independent Set, Maximum Clique, and Maximum Cut problems) further demonstrate that incorporating FLD advances existing approaches and they achieve SOTA performance in most of the experiments.

优缺点分析

Strengths:

This paper is well structured, clearly written, and easy to follow.

This paper presents both solid algorithmic design, theoretical proof and insights, plus comprehensive empirical results.

Weaknesses:

The related work part seems to be too concise. It could be extended a little bit.

问题

Could you make parentheses the big ones in equation (2), line 76?

The parentheses are not in the correct form in other places, e.g., line 111.

The topic of this paper is not very well align with my expertise, therefore I'm not able to evaluate it with much confidence.

局限性

n/a

最终评判理由

The authors have addressed the questions I raised in my review.

After checking the other reviews and the rebuttal, I believe this work makes a solid contribution.

格式问题

No formatting issues

作者回复

We sincerely appreciate the effort you put into reviewing our paper. Below, we provide our responses to the weakness and questions you raised.

W1: The related work part seems to be too concise. It could be extended a little bit.

Thank you for your suggestion. We originally intended to include the Related Work in the main text, but due to space constraints we only provided a partial discussion and subsequently moved it to Appendix A. Unfortunately, submission time pressure prevented us from expanding it further. We apologize for this oversight and will fully supplement the Related Work section in the final version of our paper to provide a more comprehensive background review.

To Q1&Q2: Thank you for catching this error. We will make the corrections in the final version of our paper and include the correct version in the final submission.

If you have any additional questions, please don't hesitate to reach out. We would be more than happy to address any concerns.

评论

Thanks for addressing my questions! I believe this work makes a significant contribution!

评论

Thank you for reviewing our rebuttal. We’re glad the clarifications addressed your concerns. Please let us know if anything further would be helpful.

审稿意见
4

This paper introduces ​​Fractional Langevin Dynamics (FLD)​​ to address key limitations of traditional Langevin Dynamics (LD) in combinatorial optimization (CO). LD's reliance on Gaussian noise hinders its ability to escape deep local optima, requires expensive parallel chains, and struggles with rugged landscapes or strict constraints. FLD replaces Gaussian noise with ​​α-stable Lévy noise​​, enabling long jumps that facilitate escape from local minima. Theoretically, FLD achieves a ​​polynomial escape time​​ from local optima, outperforming LD's exponential escape time dependent on barrier height. FLD generalizes LD and adapts to diverse CO scenarios via α-tuning. The authors develop explicit-gradient (FLD-EG) and implicit-gradient/data-driven (FLD-IG) variants. Experiments on Maximum Independent Set, Maximum Clique, and Maximum Cut problems demonstrate that FLD advances both sampling-based and data-driven approaches, achieving ​​state-of-the-art performance​​ in most cases.

优缺点分析

Strengths:

  • The paper provides rigorous analysis showing FLD achieves polynomial escape times (vs. LD's exponential dependence on barrier height) and generalizes LD (recovering it when α=2).

  • The work develops both explicit-gradient (FLD-EG) and implicit-gradient/data-driven (FLD-IG) variants, broadening applicability to problems with/without closed-form gradients.

Weaknesses:

  • The energy functions of the proposed method use simple penalty terms (Eq. 6); complex constraints (e.g., multi-objective or conditional) remain unaddressed.

  • The evaluations omit comparisons with evolutionary strategies (e.g., CMA-ES) or modern heuristics tailored to specific problems (e.g., MaxCut solvers).

  • FLD-IG’s training efficiency is noted, but inference scalability for massive graphs (e.g., >10k nodes) needs deeper analysis.

  • The proposed method has good theoretic results. However, the experimental results do not show significant advantage over existing methods.

问题

See above.

局限性

See above.

最终评判理由

I appreciate the responses in the author rebuttal. They are satisfacotry to me.

格式问题

No.

作者回复

We sincerely appreciate the effort you put into reviewing our paper. Below, we provide our responses to the weaknesses and questions you raised.

W1: The energy functions of the proposed method use simple penalty terms (Eq. 6); complex constraints (e.g., multi-objective or conditional) remain unaddressed.

Thanks for your constructive comment. While our paper employs a simple penalty term in the energy function and does not yet incorporate more sophisticated constraints, the scenarios we address are nonetheless highly challenging, and the case studies we present involve standard problems commonly used in prior work—on which we have achieved strong results. We believe that our proposed method can be extended to tackle more complex problems, and in future work we will expand our approach to include the elaborate penalty terms and complex constraints you mentioned.

W2: The evaluations omit comparisons with evolutionary strategies (e.g., CMA-ES) or modern heuristics tailored to specific problems (e.g., MaxCut solvers).

Thank you for raising this question. As far as we know, CMA‑ES is typically applied to continuous optimization problems, whereas our work focuses on discrete combinatorial optimization. If you could kindly point us to any relevant literature, we would be most grateful. Furthermore, our KaMIS baseline employs the ReduMIS algorithm, which conducts an evolutionary search on a reduced graph. As for MaxCut solvers, we present the results of a semi-definite programming method (SDP), which is a widely used baseline for MaxCut, in the table below and will include them in the final version of our paper.

MethodProblemSizeTime
SDPMaxCut-BA-[200–300]700.3635.78m
SDPMaxCut-BA-[800–1200]2786.0010.00h

W3: FLD-IG’s inference scalability for massive graphs (e.g., >10k nodes) needs deeper analysis.

We apologize for any misunderstanding. In our ER‑dataset experiments, training was conducted on ER‑[700–800], while inference was performed on ER‑[9000–11000]. The results suggest that our method may exhibit reasonable inference scalability even on massive graphs.

W4: The experimental results do not show significant advantage over existing methods.

Thank you for your question. We fully acknowledge your point: although our theoretical contributions are strong, the empirical improvements may not appear dramatic. Since prior solvers already approach the optimal solution, we focused on improving performance in the near‑optimal regime—which is substantially more challenging than improving cases far from optimal. Although our method yields improved results over previous approaches on most datasets, these gains are moderate and indicate that there is still plenty of scope for further refinement.

We sincerely hope that our response provides you with a clearer and more thorough understanding of our work. If you have any additional questions, please don't hesitate to reach out. We would be more than happy to address any concerns.

评论

I appreciate the responses in the author rebuttal. They are satisfacotry to me.

评论

Thank you for taking the time to review our rebuttal. We’re glad the clarifications addressed your concerns. Please let us know if any additional information would be helpful.

审稿意见
5

This paper addresses a known limitation of the application of Langevin dynamics (LD) for Combinatorial Optimization (CO) : its poor ability to escape local optima, especially in rugged landscapes. To that end, they propose replacing the Gaussian noise with \alpha-stable Lévy noise, introducing the Fractional Langevin Dynamics framework (FLD). This approach allows occasional large steps (Lévy flights) to escape local minima and provide a better coverage of the solution space. Authors compare the upper bounds of mean escape time and show that this quantity scales in polynomial time with FLD, an improvement over the exponential bound provided by LD. Authors then show how to derive from FLD a sampling method, where stability is dealt with using a truncation scheme and a data-driven method based on [Feng and Yang, 2025]. Performance of the method is evaluated on 3 common CO problems, showing SOTA results in most applications. Impact of the \alpha parameter is also investigated.

优缺点分析

Strengths: - The paper is mostly clear and well written - Replacing Gaussian noise with \alpha-stable Lévy noise is sound and is well justified - Claims are theoreticvally and empirically justified, showing an clear improvement over SOTA - Experiments are clean with a high number of baselines and it is appreciated that running time are provided.

Weaknesses: - Authors do not dive deep into the truncation strategy introduced to improve stability for low \alpha. Does it potentially introduce a bias? Is it difficult to tune? - Could authors add variance accross runs to ensure consistency? - Is the idea of replacing the Gaussian noise with \alpha-stable Lévy noise in LD novel? Has it been already done in adjacent fields? More generally, variants of LD could have been explored in the related work. - Section 3.5 is unclear to me. Could the authors provide more details in the main paper to clarify it ? Especially regarding the PRL procedure

问题

See weaknesses

局限性

Authors have adequately addressed the limitations of their work, claims are theoretically justified and experiments are fair

格式问题

no

作者回复

We sincerely appreciate the effort you put into reviewing our paper. Below, we provide our responses to the weaknesses and questions you raised.

W1: Authors do not dive deep into the truncation strategy introduced to improve stability for low α\alpha.

We sincerely apologize for the lack of clarity in our description, which may have caused confusion. Since we initially applied our method to 0-1 integer programming problems, for problems where the values are limited to 0 and 1, large outliers can cause the solution to remain stuck at 0 or 1, which goes against our intention of using noise to facilitate escaping from local minima. Therefore, applying reasonable truncation to the outliers allows the iteration process to more easily escape from local optima. We also show in the table below that dnoised_{\mathrm{noise}} is easy to tune, and slight perturbations to dnoised_{\mathrm{noise}} do not significantly affect the performance.

dnoised_{\mathrm{noise}}ProblemSize
20MIS-ER-[700–800]44.37
30MIS-ER-[700–800]44.34
40MIS-ER-[700–800]44.35
2MaxCl-RB-[200–300]18.96
4MaxCl-RB-[200–300]18.97
6MaxCl-RB-[200–300]18.97
20MaxCut-BA-[200–300]734.06
30MaxCut-BA-[200–300]734.13
40MaxCut-BA-[200–300]734.16

W2: Could authors add variance accross runs to ensure consistency?

We sincerely apologize for not including the variance of the datasets in the paper to ensure consistency. We evaluate the variance of FLD-EG by using different seeds for 10 repeated experiments, and the results are reported below:

ProblemSize
MIS-RB-[200–300]20.02±0.03
MIS-RB-[800–1200]40.24±0.04
MIS-ER-[700–800]44.34±0.09
MIS-ER-[9000–11000]377.28±1.07
MaxCl-RB-[200–300]18.97±0.01
MaxCl-RB-[800–1200]40.61±0.06
MaxCut-BA-[200–300]734.09±0.25
MaxCut-BA-[800–1200]2960.06±1.68

We will include the variance for all results of our methods in the final version of our paper.

W3: Is the idea novel? Has it been already done in adjacent fields?

Thank you for your question. We sincerely acknowledge that replacing Gaussian noise with α\alpha-stable L'evy noise is not an entirely new strategy. However, it has not been explored in the field of CO problems. We recognized the potential of α\alpha-stable L'evy noise in CO problems and have made preliminary attempts. Moving forward, we plan to extend these experiments to more complex problems. We apologize for not providing a detailed discussion of the application of FLD in other fields due to space limitations. We will include a more comprehensive review of the related work in the final version of our paper.

W4: Section 3.5 is unclear to me. Could the authors provide more details in the main paper to clarify it ?

We sincerely apologize for not providing a detailed description of FLD-IG in the main paper. Due to space limit, we included the specific method details and pseudocode in Appendix E.2. However, we would like to provide a more detailed explanation of the FLD-IG methods here:

Based on the differentiability of the energy function, we divide the FLD-based approach to solving CO problems into explicit gradient (FLD-EG) method in Section 3.4 and implicit gradient (FLD-IG) method in Section 3.5. In Section 3.5, when the energy function is not differentiable, we introduce an NN-based framework whose training procedure resembles reinforcement learning, alternating between sampling and update steps, and using auto-grad to approximate the gradient of the energy function.

We sincerely hope that our response provides you with a clearer and more thorough understanding of our work. If you have any additional questions, please don't hesitate to reach out. We would be more than happy to address any concerns.

审稿意见
4

The paper proposes a novel sampling technique based on fractional Langevin dynamics and incorporates it into a method for solving combinatorial optimization problems with binary variables and equality constraints. The key feature of this new sampling strategy is its ability to escape narrow local minima more quickly and better handle the challenging landscape of the objective function. The main math tool to develop such sampling is α\alpha-stable Levy noise, which replaces Gaussian noise in the classical Langevin dynamics. The authors theoretically show the polynomial time for escaping local minima, which is the solid basis for the presented approach. In experiments, the study demonstrates a better trade-off between the maximum values of objective functions and runtime compared to many competitors. The authors select MaxCut, MaxClique, and Maximum Independent Set problems as the benchmarks for testing the presented approach.

优缺点分析

Strenghts

  1. The paper is mostly well-written, and the authors' motivation for developing a better sampling strategy is clear.
  2. The experimental evaluation looks convincing and promising since the proposed approach outperforms many learning-based competitors and the standard OR tool Gurobi.
  3. The theoretical introduction clearly explains the basic concepts and describes key advanced concepts, such as the symmetric α\alpha-stable distribution.

Weaknesses

  1. The description of the FLD-IG and FLD-EG methods is too brief. Since they build upon the central theoretical framework presented in the paper, a clearer exposition of these methods would strengthen the paper's impact.
  2. I do not find the explicit statement about the overall pipeline complexity and memory consumption. Is the total runtime smaller due to the smaller number of iterations or the lower cost of a single iteration?
  3. The authors compare the developed schemes with learning-based methods; however, it is unclear to me whether the proposed methods have a training stage. If yes, then how long is this stage? Is it included in the runtime comparison in Tables 1 and 2?

问题

  1. Why Eq. (6) uses +λb(x)+ \lambda b(x). and not +λb2(x) + \lambda b^2(x) ? A large negative b(x)b(x) leads to a small value of HH, but it violates the constraint in the initial problem.
  2. In line 207, the authors first mention 'gradient' in the context of sampling and do not provide any clarification of what this term means. Please add necessary remarks about what gradient is used and why it is "explicit."? Is it possible to use "implicit gradient"? I find this term in the next section; however, its meaning remains unclear to me.
  3. What is the effect of the truncation parameters d,dnoised, d_{noise} and λ\lambda in Eq. (29)?
  4. What α\alpha is used in the ablation study for FLD?
  5. What is dd in Eq. (29)?
  6. Could the authors please perform an additional experiment on the toy problem to illustrate the feature of the fractional Langevin dynamics to escape from the narrow local minima faster than the classical Langevin dynamics? I think such a simple benchmark will not require too much space, but will serve as a bridge between advanced theory and challenging benchmark problems.

局限性

A limitation on the binary variables is mentioned in the conclusion section.

最终评判理由

The paper proposes a motivated modification to the noise distribution used in the sampling process for solving the combinatorial optimization problem. The study includes both solid theoretical analysis and convincing experimental evaluation of the suggested approach. The authors address my concerns in rebuttal, so I maintain my score and lean towards acceptance of this work.

格式问题

  1. The sentence in lines 190-193 is too long; please split it to improve readability.
作者回复

We sincerely appreciate the effort you put into reviewing our paper. Below, we provide our responses to the weaknesses and questions you raised.

W1: The description of the FLD-IG and FLD-EG methods is too brief.

We sincerely apologize for not providing a detailed description of FLD-IG and FLD-EG in the main paper. Due to space limitation, we included the specific method details and pseudocode in Appendix E. However, we would like to provide a more detailed explanation of the FLD-IG and FLD-EG methods here:

Based on the differentiability of the energy function, we divide the FLD-based approach to solving CO problems into explicit gradient (FLD-EG) and implicit gradient (FLD-IG) methods. For FLD-EG, we provide a recursive expression with gradient and noise truncation, where a multi-step sampling and recursion approach can approximate the optimal solution. For FLD-IG, when the energy function is not differentiable, we introduce an NN-based framework whose training procedure resembles reinforcement learning, alternating between sampling and update steps, and using auto-grad to approximate the gradient of the energy function.

We will include this detailed explanation of the FLD-IG and FLD-EG methods in the final version of our paper.

W2: Not find the explicit statement about the overall pipeline complexity and memory consumption.

We sincerely apologize for not providing the complexity and memory consumption of the method pipeline in the main paper. Compared with the RLSA and RLNN methods, the complexity in the single-step sampling phase of FLD-IG and FLD-EG is similar. However, due to the stronger escape ability of FLD with symmetric α\alpha-stable noise compared to LD with Gaussian noise, the FLD-IG and FLD-EG methods have fewer iteration steps. In contrast to other learning-based methods, FLD-IG and FLD-EG only involve sampling and updating steps, resulting in a lower cost in a single iteration.

The memory consumption is summarized in the table below:

MIS-RB-[200–300]MaxCl-RB-[200–300]MaxCut-BA-[200-300]
FLD-EG1576.96MB1413.12MB1372.16MB
FLD-IG1761.28MB1536.00MB1556.48MB

Here, we report only a single result for each case; in the final version of our paper, we will include the complete set of experimental results.

W3: Does the proposed methods have a training stage?

The FLD-EG method is a training-free method, while the FLD-IG method involves a training stage. The time consumption is shown in the table below. In Tables 1 and 2, the runtime includes only the model inference time and does not account for the time spent in the training stage. However, the training stage does not take very long time, and for each problem, the training stage will be carried out only once.

MIS-RB-[200–300]MaxCl-RB-[200–300]MaxCut-BA-[200-300]
FLD-IG4.65h7.83h4.72h

Here, we report only a single result for each case; in the final version of our paper, we will include the complete set of experimental results.

To Q1: We sincerely apologize for any confusion caused by the phrasing in our paper. Equation (6) represents the Lagrangian function, not a pure penalty function. This linear term naturally arises from the dualized constraints in dual theory, ensuring that when the KKT conditions are satisfied, the primal and dual problems are equivalent (strong duality holds for convex problems).

To Q2: In Equation (20), during the discrete sampling process, there is the gradient of the energy function. If the energy function is differentiable, the explicit gradient can be directly incorporated into the recursive expression, and the optimal solution is approximated through iterative updates in the sampling steps. This is referred to as FLD-EG. If the energy function is non-differentiable, the gradient of the energy function is implicitly approximated using auto-grad, which is referred to as FLD-IG.

To Q3&Q5: The truncation parameter dd is applied for both the gradient and the SαS\mathcal{S}\alpha \mathcal{S} noise in the Equation (29). We apply the Top-dd truncated indicator to H\nabla H, ensuring that only the dd components with the largest magnitude influence the update and apply the Top-dnoised_{\mathrm{noise}} truncated indicator to the sampled noise vector, truncating extreme values to stabilize the sampling process. The second term in the Equation (29) regularizes the expected Hamming distance between the two solutions, with λ\lambda' being the regularization coefficient.

To Q4: The value of α\alpha in the ablation study is consistent with the one presented in Table 3 and Table 4.

To Q6: We sincerely appreciate your valuable suggestions. In the ablation study, we explored the use of "fractional Langevin dynamics to escape from the narrow local minima faster than the classical Langevin dynamics." Below, we present specific data to demonstrate the escape ability of FLD.

As external links aren’t permitted in the rebuttal phase, we’re unable to include images here. Instead, we provide a textual description below to address your question: For the MIS problem on an ER‑[700–800] instance, LD required 50 steps to reach the Best Energy of –39, whereas FLD did so in just 16 steps. To attain the Best Energy of –41, LD needed 210 steps, compared to only 100 steps for FLD. Likewise, on a BA‑[200–300] instance of the MaxCut problem, LD took 32 steps to reach –800, while FLD achieved the same in just 8 steps.

To Paper Formatting Concerns: Thanks for your kind suggestion. We have rewrite the sentence in lines 190-193: "Similarly, we provide the discrete proposal for the escape time of both LD and FLD. We state upfront that the Markov chains of LD and FLD are reversible if they satisfy the detailed balance conditions. Additionally, pτ(x)p_{\tau}(x) is a positive stationary distribution, given that the symmetric proposal and the Metropolis-Hastings acceptance criterion are satisfied for constructing discrete LD and FLD."

We sincerely hope that our response provides you with a clearer and more thorough understanding of our work. If you have any additional questions, please don't hesitate to reach out. We would be more than happy to address any concerns.

评论

Dear authors,

Thank you for the detailed response! No more questions from my side. I believe that, after minor revisions in response to the reviewers' comments, the paper makes a significant contribution, so I maintain my score as it is.

评论

We appreciate your careful review of our rebuttal and are encouraged that the clarifications were satisfactory. We remain available to supply any further information.

最终决定

This paper proposes Fractional Langevin Dynamics for the combinatory optimization problem. All reviewers acknowledge the technical contributions of this paper and are also satisfied with the empirical demonstration. The authors provide sufficient feedbacks during the rebuttal phase and the concerns of some reviewers are well addressed. Therefore, I would lean to accept this paper.