Dual Lagrangian Learning for Conic Optimization
This paper presents a principled methodology for learning dual conic optimization proxies with dual feasibility guarantees.
摘要
评审与讨论
The authors propose Dual Langrangian Learning for learning dual conic optimization problems. Dual conic completion, differentiable conic projection layers, and a self supervised Lagrangian training framework are discussed.
优点
- Presents a unique framework for the dual optimization. No dedicated framework for dual optimization proxies are known to date -Extensive theoretical formulations
缺点
- The practical use of the framework is unclear especially with regard to scalability and computational costs involved
问题
- It is unclear how the utility of this framework can be tested for problem-solving in the real world. While DC3 has been selected as a comparable technique, it remains to be seen whether dual optimization is always beneficial -- for e.g. assuming a problem can be solved both in the primal and dual, are there necessary computational benefits to solving in the dual? How scalable are the proposed algorithmic frameworks?
局限性
No.
We thank the reviewer for their valuable feedback and their appreciation of our work.
Benefits of dual optimization
While the paper outlines the theoretical and computational building blocks of DLL, this methodology is meant to be used in tandem with primal techniques, e.g., primal optimization proxies that produce candidate primal solutions. The dual proxies trained with DLL then provide valid dual bounds to certify the quality (i.e. optimality) of primal solutions.
Accurate dual solutions can also be used to speedup primal algorithms, e.g., by identifying inactive constraints (whose associated dual variable will be zero), which can be safely removed from the problem.
There are also cases where dual solutions are valuable in their own right. This is the case for instance instance in market-clearing applications, where primal variables are cleared quantities, and dual variables are prices. In that context, DLL can offer higher accuracy and guarantees of dual feasibility.
Scalability
The scalability of our proposed framework leverages several key ingredients:
- Efficient conic projection layers: DLL does not require any external solver, nor the resolution of NP-hard problems (when considering the standard cones presented in the paper). We also introduce non-Euclidean conic projection with better complexity. For instance, for the PSD cone, the Euclidean projection has complexity , compared to for the proposed radial projection.
- Closed-form dual conic completion: we showcase several efficient dual completion layers for broad classes of problems (See Examples 1--3 and numerical experiments)
- Self-supervised learning: this removes the need for offline data-generation using standard optimization solvers. This greatly accelerates training, and we have found this approach to often yield improved performance compared to supervised learning settings.
Finally, DLL-based proxies are expected to offer similar computational burden and benefits as primal optimization proxies, which have been shown in various works to be scalable for large-scale real-life problems.
I thank the authors for their clarifying comments. I choose to keep my original scores for the paper.
This paper addresses the broad category of machine learning (ML) for optimization, where ML-based approaches are used to obtain solutions or bounds on the solutions of optimization problems. In particular, the paper addresses a particular subclass of problem - conic problems, and even more specifically, the dual solution of these problems.
Most "ML-for-optimization" frameworks have thus far focused on finding the solution to primal problems, which is intuitive because the variables in primal problems are typically the variables of interest for most real-world problems. However, learning dual solutions can help certify or bound the primal solution, and in some cases, one may be interested in the dual solution directly (for example, when learning shadow prices in economic optimization problems). This paper uses a conic projection layer in addition to the dual completion step to ensure dual feasibility.
The paper compares the method (DLL) to other existing methods in terms of their ability to learn solutions to dual problems and shows superiority, mostly due to the fact that DLL includes dual optimality conditions in addition to dual feasibility constraints. An alternative to Euclidean projection is also provided (Radial projection), for the projection layer, which allows for computation once per cone and attempts to avoid gradient vanishing issues during training.
优点
The paper claims that the model can guarantee dual feasibility, which is significant for yielding bounds on the primal solution. In addition, the authors show a closed-form, analytical solution for conic projections and for dual conic completion, outside of the ML model development.
The proposed DLL framework doesn't require an explicit formulation of the dual problem; instead, the user must only have to provide a set of primal constraints that satisfy a certain set of conditions. This is convenient but not a main contribution (if the primal conic problem is known, the dual should be relatively easy to find).
Even though the most challenging and time consuming optimization problems in practice are nonconvex optimization problems, many of these problems can be cast as conic problems, and thus the method can be applied to solve quite a large array of problems that may occur in practice.
Versus other methods, the DLL framework requires far less hyperparameter tuning and iterative procedures, which contributes to the ease of training and increased computational speed.
缺点
One limitation of the method, if a known closed-form optimal dual completion cannot be achieved, is its use of implicit layers. It would be interesting to see how the speedups diminish with very large problems where the implicit layer becomes a bottleneck.
The comparison with existing approaches is fair since those are the state of the art for learning primal solutions, but it's slightly unfair because those methods were not specifically designed to learn dual solutions. I think this is fine since there are not existing ML-for-optimization methods tailored towards dual solutions in particular, but just to note.
The work in this paper does seem like a more generalized and extended version of the work in reference [19], where reference [19] was the first to provide a dual proxy for a second-order cone relaxation of the AC optimal power flow (OPF) problem. However, in reading [19], it does not seem to be necessarily limited to the AC OPF problem, although those authors only apply it to such. The approach in the present paper is generalized to a broader class of problems, but is not the first to provide dual feasibility guarantees as determined by a learning model.
问题
It would be interesting to also see indications of whether or not strong duality holds in the test cases, if this is not too challenging to check. If so, the optimality gaps reported also should hold in the primal. The authors may also want to come up with some examples of why learning dual solutions directly may be a contribution on its own, outside of just providing a bound on a primal solution.
I would also want to see more of a discussion about the comparison with reference [19], specifically, is [19] already generalizable but the authors only applied it to the AC OPF problem in the paper?
Regarding the use of implicit layers, a discussion (or even better, experiments if time allows) on their computational limitations and how this slows down inference as problem size grows
局限性
The authors have sufficiently addressed the limitations of the method. I don't see any potential negative societal impacts of the work.
We thank the reviewer for their careful reading of our paper, and their valuable feedback. Please see our general rebuttal for comments regarding how DLL can be used in discrete and non-convex settings. Detailed responses to the reviewer's questions follow.
Implicit layers
We agree with the referee that implicit layers can quickly become a computational bottleneck for large problems. In our experience, this typically happens when dealing with implicit layers for problems with variables and constraints.
Nevertheless, we would like to point out the following:
-
We have never encountered any problem for which we were not able to design a closed-form completion procedure. Indeed, most real-life problems are naturally bounded, which allows to use the completion presented in Example 1. In many cases, finite bounds can be derived for all variables using the problem's structure. Furthermore, Problem (8) (see Theorem 1) can often be decomposed in smaller problems that can be solved in parallel.
While this is obviously only a computational conjecture, it should also be noted that even a partial dual completion can help reduce the size of the problem that eventually may need to be solved via an implicit layer.
-
As we pointed out in the general rebuttal, subgradients of the Dual Lagrangian Loss in Eq. (19a) are of the form , where solves Problem (8). Therefore, it suffices to use a primal-dual solver to solve Problem (8), and use the dual (resp. primal) solution for the forward (resp. backward) pass. This would further reduce the computational burden of implicit layers.
We will include this discussion (and explicit formulae for gradient computation) in the paper's final version.
Strong duality
Strong duality does hold for all instances considered in the numerical experiments, i.e., the primal and dual problems have same optimal values. Indeed, note that Slater's constraint qualification condition holds for all considered problems.
The reviewer is thus correct that the gaps reported in Tables 2 and 4 apply w.r.t to the primal optimal value, whose (average) value is reported in the second column of each table.
The most common example of dual solutions being valuable on their own is in market-clearing applications, where dual variables represent clearing prices. In that context, DLL can provide highly-accurate price predictions that are guaranteed to satisfy dual feasibility constraints.
Relation to Ref. [19]
In reference [19], Dual Conic Proxies for AC Optimal Power Flow, the authors learn dual solutions for a second-order cone relaxation of nonlinear, non-convex AC Optimal Power Flow (OPF) problems. While Ref [19] does utilize a dual completion procedure, its approach is specific to the SOC-OPF problem, and does not generalize to arbitrary conic settings. For instance, Ref [19] utilizes the fact that some primal variables can never be zero to eliminate dual variables, a property that does not hold in general. In addition, the authors use domain knowledge to design input features for their models, which cannot be replicated for arbitrary problems.
In contrast, in this paper proposes theoretical and computational tools for general conic optimization problems. In particular, Theorem 1 offers a much stronger foundation for dual completion procedures than the setting of Ref [19]. Furthermore, the paper presents closed-form completion procedures for broader classes of problems, and conic projection layers for all standard cones and their duals. This includes the PSD, exponential and power cones, which were not considered in Ref [19].
Thank you to the authors for their detailed response to my questions and comments, as well as the general comments that they have provided to the overall feedback from all reviewers. I think that I agree with the authors about practical problems vs. edge cases, and it sounds like computational tools and findings in practice can achieve a high performance with this framework. I am happy to change my rating to a 7.
This paper presents the Dual Lagrangian Learning (DLL) framework that utilizes a fully-connected neural network (FCNN) to provide high-quality dual-feasible solutions that can be used to generate valid Lagrangian dual bounds for linear and nonlinear conic optimization problems without the use of costly implicit layers. This FCNN provides a candidate dual feasible point that is passed through a conic projection step to produce a dual feasible point, then dual completion subproblem is solved to get a complete dual feasible pair $(\hat{z}, \hat{y}) which is then used to get a valid Lagrangian dual bound. The subgradient of the Lagrangian at the dual feasible point is used to update the FCNN network weights to provide better candidate points. The dual completion subproblem is solved for a variety of conic optimization problems. The experimental evaluation is done on both linear and nonlinear conic optimization problems and it is shown to outperform the state-of-art method DC3.
优点
The strength of the paper is the simplicity and effectiveness of the DLL framework compared to DC3 and commercial interior point solvers. While previous work used dual feasible points as a warm-start, this work shows how dual feasible points can be used to get valid Lagrangian dual bounds.
缺点
The main weakness of the paper is lack of consideration for general non-convex constraints. A minor weakness of the paper is lack of detail about how DC3 was tuned. The authors claim that DC3 was difficult to tune and only limited tuning was done with DC3 due to this. However, on the positive side, they do provide adequate detail of how the hyperparameters for DC3 were set for easy reproducibility.
问题
- On Line 515-6, there seems to be a typo. "do not valid bounds" should be replaced with "do not output valid bounds"
- How was the limited tuning of DC3 done? While DC3 has many parameters and is difficult to tune, it is possible that DC3 maybe better with more extensive tuning. It is important to show that a reasonable effort was made to tune DC3 before claiming that its performance is worse than DLL.
- How does the quality of the candidates provided by DLL change over time? It would be interesting to see how the Lagrangian dual bound changes with each iteration of DLL.
- What considerations did you use when designing the FCNN for DLL? Have you tried other NN architectures other than GNNs?
局限性
The authors acknowledge the limitations of their work, which include lack of consideration for general non-convex constraints and suggest Graph Neural Networks (GNNs) as a promising future avenue of research.
We thank the reviewer for their time reviewing our paper, and for their insightful comments. Please see our general rebuttal regarding how DLL can be used in non-convex and discrete settings. Detailed responses to the reviewer's questions follow.
Non-convex constraints
Please see our general rebuttal. In a nutshell:
- Most algorithms for global non-convex optimization leverage convex relaxations. DLL can therefore provide high-quality surrogate models for these convex relaxations, and/or accelerate their resolution.
- DLL can be extended to handle non-convex and discrete constraints by leverage similar Lagrangian duality. However, obtaining valid Lagrangian bounds then requires solving typically NP-hard problems, instead of polynomial-time closed-form formulae in the conic setting.
Model architecture and tuning
As the reviewer pointed out, DC3 has multiple hyper-parameters, which makes its tuning difficult and resource-intensive.
We ran initial experiments on small datasets to evaluate the sensitivity of DC3's performance with respect to 1) number and size of hidden layers, 2) overall learning rate, 3) number of inequality corrections, 4) learning rate for correction mechanism. These experiments led us to select the hyper-parameters reported in the paper.
We decided not to run systematic hyper-tuning for each size of problems because of the associated financial and environmental cost. Indeed, for production planning instances, even on the smallest instances, each DC3 training run would typically hit its 1-hour time limit on a V100 GPU (at a cost of about \100 to produce the results of Table 4.
To ensure a fair comparison with DLL, we nevertheless implemented the following improvements, which mostly catered to DC3:
- we implemented batched, analytical gradients to DC3's correction mechanism, to alleviate the use of AD tools;
- we implemented a learning rate scheduler to reduce the need for tuned learning rate. We also added a patience mechanism to avoid DC3 getting stuck in a local optimum early on;
- we increased the limits for training to 4000 epochs and one hour. In contrast, DLL models typically reached 99.9% of their final performance in at most 5min and a couple hundred epochs.
We agree with the reviewer that there might exist a configuration of DC3 whose performance is greatly improved compared to results presented in the paper. However, our experience suggests that this comes at a heavy price tag compared to DLL, be it developer's time, cloud computing budget, or environmental cost.
Finally, while we did not consider GNN architecture in this work, it is a very natural avenue which we intend to explore in future work.
Performance of DLL
As we briefly mention above, the training of DLL models is typically completed in at most 200 epochs and under a few minutes. We recorded a relative improvement of the average Lagrangian bound of less that 0.1% between 5min and 1hr of training time.
We will include a figure comparing the evolution of the DLL and DC3 bound during training in the final version of the paper.
I thank the authors for their detailed response to my comments and questions.
-
Regarding the general rebuttal about lack of consideration for non-convex constraints, I agree with the author's comments that convex case is a good starting point as convex relaxations are typically used for non-convex constraints. I recommend that the authors put a non-convex use case if submitting elsewhere as it may strengthen the paper.
-
Regarding the tuning of DC3, I thank the authors for providing a detailed account of the tuning of DC3 in the comments. It is understandable that extensive tuning costs time and computational resources. Still, I would recommend that the authors put these details in the paper to justify the limited tuning of DC3 and explain how the final parameters were chosen in the paper.
-
Regarding the consideration of other architectures other than chosen FCNN architectures, I was more interested in what other architectures the authors experimented with and why this particular architecture was chosen.
-
Good to hear that the authors will consider putting a figure comparing the evolution of the DLL and DC3 bound during training. It will be a useful addition to the paper.
With the addressing of the above comments, I would like to raise my score to 7.
We thank the reviewer for their additional feedback.
We will include in the final version of the paper a specific discussion of non-convex and discrete constraints (also addressing comments by reviewer yA4S), and a more detailed explanation and justification of the tuning of DC3. We will also include a figure comparing the evolution of the Lagrangian bound obtained by DLL and DC3 during training.
The paper introduces Dual Lagrangian Learning (DLL), a machine learning framework designed for dual conic optimization problems. DLL utilizes conic duality and machine learning models to produce high-quality, dual-feasible solutions and corresponding Lagrangian dual bounds for both linear and nonlinear conic optimization problems. It introduces techniques like a systematic dual completion procedure, differentiable conic projection layers, and a self-supervised learning framework based on Lagrangian duality. The methodology is demonstrated to significantly outperform existing state-of-the-art methods and provides a faster alternative to traditional interior-point methods used in commercial solvers.
优点
The paper introduces Dual Lagrangian Learning (DLL), which combines machine learning (ML) techniques with dual conic optimization to enhance the computation of dual feasible solutions and corresponding Lagrangian bounds. This integration addresses both linear and nonlinear conic optimization problems, aiming at providing a new approach as compared to a traditional methods. The methodology applies differentiable programming to optimization, thus enabling gradient-based optimization methods to refine dual solutions directly. The paper is well-structured and articulates complex ideas with clarity.
缺点
Despite its strengths, the paper has several areas that could be improved:
Scope of Application: The focus on continuous problems seems a step back considering the current trend and necessity of addressing mixed-integer programming (MIP) in real-world important applications like power systems and manufacturing, which authors rightfully point out in the beginning. The utility of DLL in these contexts remains unclear, especially without addressing the discrete aspects of such problems. Incremental Nature: The contributions, while innovative in combining ML with dual conic optimization, appear incremental when compared to existing research integrating ML with Lagrangian relaxation for MIP. More distinction or comparison with these methods could help in highlighting the unique advantages of DLL. Numerical Validation: The experiments provided are based on relatively small problem instances. This limitation might raise concerns about the scalability and practical applicability of DLL. Even by the standards of MIP applications, the sizes of the problems tested appear to be rather small. Extending these experiments to larger, more complex problem instances could enhance the paper's impact and relevance.
问题
Justification for Focus on Continuous Problems: Can the authors provide a stronger rationale for focusing predominantly on continuous problems given the prevalent interest and necessity for solving MIP in real-world applications? Comparison with Existing Methods: How does DLL compare in performance and applicability with existing methods that combine ML with Lagrangian relaxation specifically for MIP problems? Scalability and Practical Applicability: Could the authors elaborate on the potential scalability of DLL? Are there plans to test the methodology on larger-scale problems, particularly those involving discrete decisions common in industrial applications?
局限性
The limitations are provided in the paper, but in the view of the above review, they appear to be rather restrictive.
We thank the reviewer for their detailed feedback and their appreciation of our work. Please refer to our general rebuttal for a detailed discussion of how DLL can be used in the non-convex and discrete setting. Detailed answers to the reviewer's questions follow.
Comparison with Lagrangian decomposition for MIP problems
We agree with the referee that MIP problems are of utmost importance in real-life applications. We would nevertheless like to point out that many real-life problems are posed as conic optimization problems, e.g., portfolio optimization problems in finance, model predictive control, day-ahead and real-time electricity market-clearing, etc.
More importantly, convex optimization is at the heart of most algorithms for mixed-integer optimization, e.g., branch-and-cut or outer approximation. Hence, by providing highly efficient tools for convex conic optimization, DLL makes a first step towards scalable learning-based methods for mixed-integer optimization.
In addition, while LD-MIP and DLL both leverage Lagrangian duality, DLL introduces tools that handle nonlinear conic constraints, whereas, to our knowledge, existing work on LD-MIP only considers the dualization of linear constraints. Finally, while obtaining a valid dual bound in LD-MIP requires solving a combinatorial problem, which may be NP-hard, DLL obtains valid bounds using efficient (polynomial-time), closed-form formulae for a very broad class of problems.
Scalability
(this answer is repeated in our response to reviewer H5sy)
The scalability of our proposed framework is supported by several key ingredients:
- Efficient conic projection layers: DLL does not require any external solver, nor the resolution of NP-hard problems (when considering the standard cones presented in the paper). We also introduce non-Euclidean conic projection with better complexity. For instance, for the PSD cone, the Euclidean projection has complexity , compared to for the proposed radial projection.
- Closed-form dual conic completion: we showcase several efficient dual completion layers for broad classes of problems (See Examples 1--3 and numerical experiments)
- Self-supervised learning: this removes the need for offline data-generation using standard optimization solvers. This greatly accelerates training, and we have found this approach to often yield improved performance compared to supervised learning settings.
Finally, DLL-based proxies are expected to offer similar computational burden and benefits as primal optimization proxies, which have been shown in various works to be scalable for large-scale real-life problems.
I thank the authors for clarifying their stance. There are the following points that need to be addressed:
- Presentation-wise, the language needs to be adjusted to make sure that the authors are not overclaiming, For example, a good lower bound can be obtained to certify the solution quality, but the solution itself may not necessarily be obtained.
- The paper's contribution must be put in the context of the existing research integrating ML with Lagrangian relaxation.
- The authors consider a subgradient variant of the method, which may suffer from zigzagging. This loops back to point #2 above, necessitating a more in-depth literature review and, likely, more testing, time permitting.
We thank the reviewer for their additional feedback.
We will update the last sentence in the second paragraph of Section I
it becomes possible to design a primal proxy to deliver a high-quality primal solution and an associated dual proxy to obtain a quality certificate.
to make it clear that the paper focuses on the dual side, i.e., the primal proxy is trained separately.
We would like to point out that the contributions listed in Section 1.1 specifically mention only conic problems, from conic duality to conic projection and conic problems in the experiments. We will include, in Section 6, an additional discussion about the non-convex and discrete settings, and will expand the literature review in Section 2 to mention additional works on Lagrangian relaxation in the MIP context.
The dual Lagrangian function is concave (for a minimization problem) and sub-differentiable. Hence, most algorithms for Lagrangian relaxation are based on subgradients, unless a regularizer is used as in, e.g., Ref [20]. The zig-zagging issue is problematic, in an optimization context, when solving a single instance. In the ML context, as is the case here, the dual Lagrangian function is used as training loss function. In particular, it is evaluated (and averaged) across multiple instances in a minibatch, which provides a form or regularization. It is also important to note that 1) popular activation functions like ReLU also yield subgradients, without significant impact on performance, and 2) the models are trained with the Adam optimizer which uses momentum, thus reducing the risk of zigzagging. Finally, we note that works on Lagrangian relaxation in the MIP setting also use subgradient-based training, e.g., this paper.
General rebuttal
We express our gratitude to the area chairs and reviewers for their careful reading of our manuscript, their appreciation of our work, and their valuable feedback. In this general rebuttal, we comment on the applicability of the proposed framework to mixed-integer and non-convex problems.
We would also like to report that the paper's main author was sick with Covid during the past week, and was therefore unable to run additional experiments for this rebuttal. We nevertheless discuss, in individual rebuttals, what such experiments would have looked like and what results we would have expected. We hope to include them in the final version of the paper.
How DLL can be used for discrete / non-convex problems
First, we would like to point out that convex optimization is the workhorse that underlies virtually all global optimization methods for discrete and non-convex problems. This includes, for instance, branch-and-cut algorithms for MIL(N)P, Outer-Approximation algorithms for convex MINLP, and spatial branch-and-bound for non-convex continuous optimization. Namely, all these algorithms employ convex relaxations to obtain valid dual bounds and eventually prove optimality.
Hence, by establishing the theoretical fundamentals of a principled learning framework for (convex) conic optimization, this paper marks a first step towards hybrid frameworks for non-convex optimization, wherein learning-based methods are used to solve (or accelerate the solution of) convex relaxations.
Relation between DLL and Lagrangian decomposition for MIP
More generally, it is important to note that Lagrangian duality is a fundamental principle in constrained optimization, which is applicable to convex, discrete and non-convex problems alike. Formally, consider a problem and the Lagrangian relaxation . It is always the case that
- is concave w.r.t , even if is non-convex or discrete,
- is a valid dual (lower) bound,
- if is an optimal solution of the Lagrangian subproblem, then is a subgradient of at .
In our paper, is described by conic constraints. In Lagrangian decomposition for MIP problems (LD-MIP), e.g., this paper, is a discrete set. The two approaches then differ as follows:
-
Existing work on LD-MIP only considers the dualization of linear constraints of the form , which yields a simple dual set . In contrast, we consider general conic constraints , with associated multipler . Extending LD-MIP to support general conic constraints would require the conic projection layers proposed in our paper to ensure .
-
Obtaining a valid Lagrangian bound in LD-MIP requires solving typically NP-hard discrete problems and requires an external solver. In contrast, in the conic setting, dual bounds are typically obtained via efficient (i.e. polynomial) formulae.
This is thanks to the elegant conic duality theory, whereby the dual of a conic problem is a conic problem. On the other hand, there is no tractable duality theory for non-convex or discrete problems.
Therefore, DLL and LD-MIP both leverage Lagrangian duality in a learning context. DLL introduces key ingredients that support the dualization of nonlinear conic constraints, compared to linear constraints in existing works on LD-MIP. In addition, its extension to the non-convex or discrete setting is straightforward. The key computational difference is that, in the non-convex / discrete settings, computing a valid Lagrangian bound requires solving typically NP-hard non-convex / discrete problem. In contrast, DLL offers scalable, polynomial-time formulae.
The polynomial-time result applies to all standard cones considered in the paper. However, some classes of conic programs, e.g. copositive programming, are NP-hard to solve.
This paper leverages conic duality to learn to predict dual-feasible solutions and therefore valid Lagrangian dual bounds, for linear and nonlinear conic optimization problems. The paper is well-written and technically sound. Experiments showed that it compares favorably to baselines such as Mosek and DC3.