PaperHub
6.1
/10
Rejected4 位审稿人
最低1最高4标准差1.3
1
4
4
4
ICML 2025

Learning Multiple Initial Solutions to Optimization Problems

OpenReviewPDF
提交: 2025-01-23更新: 2025-06-18
TL;DR

Optimizers need a good initial solution. We propose a learning method that predicts multiple initial solutions, and then either selects the best one for a single optimizer, or runs multiple optimizers with different initializations in parallel.

摘要

关键词
optimizationinitializationoptimal controlroboticsautonomous driving

评审与讨论

审稿意见
1

This paper addresses the setting where a parametric optimization problem must be solved repeatedly, such as in online settings. For these settings, the authors argue that a key concern is the provision of a good initial guess for a local optimization solver. The paper proposes the MISO framework, which uses a transformer model to learn a pool of initial solutions, parameterized by the optimization problem. The models are trained using a loss function that weights both proximity to the true global solution and some diversity metric(s). The proposed method is evaluated on three small-scale control case studies.

给作者的问题

n/a

论据与证据

Most claims are made empirically and are evidenced suitably by the data. The main theoretical claim the authors make is a performance guarantee (Section 4.2) that is slightly confusing. Namely, “in the single-optimizer setting the best initial solution is always equal or better than the default according to the selection function Λ\Lambda”. However, this is not really a performance guarantee, as the initial solutions are not used. For example, if an initial point from an alternative method is in the pool of candidate initial solutions, but does not have the best objective value (the authors’ choice of Λ\Lambda), then the point will not be used at all. Therefore, from my understanding, it cannot be guaranteed that the optimized solution will be better than the solution found by using the initial point from the alternative method.

方法与评估标准

The benchmarks used are quite limited, making the motivation for this work slightly unclear. Specifically, the problems appear low-dimensional (1D or 2D controls), and the control problems are linearized. Therefore, it would be expected that standard methods such as convex quadratic programming or multi-starting solvers from a grid search/LHS would perform well, but these comparisons are not included. Moreover, the authors claim the main problem addressed is that local optimization solvers are trapped by local optima but use as an oracle the same solver with longer run-time. This implies that the main issue is convergence time rather than local optima. This paper requires an analysis of what optimization solver is being tested, what the computational expenses are, and why it fails to find a good optimum.

理论论述

No theoretical claims or proofs are included.

实验设计与分析

The experiments are designed well, but are missing proper benchmarks to control solutions and existing (non-learning-based) multi-start methods as described above. Moreover, implementation details of the optimization solver(s) used are notably lacking.

补充材料

The supplementary material provides an adequate description of the case studies considered, model hyperparameters, and baseline methods. Discussion of the optimization implementations is notably missing. A few sections consider ablation studies for some design choices of the method.

与现有文献的关系

The key contribution to the broader literature here appears rather limited: while several works have tried to learn the solution and/or initial guess for a parametric optimization problem, this work learns multiple solutions and provides simple algorithms to integrate them.

遗漏的重要参考文献

As the main contributions are in learning multiple, diverse solutions, the paper is missing a discussion of existing research in promoting diversity in a pool of solutions, e.g., quality diversity algorithms.

其他优缺点

This paper is very well written and is easy to read and follow. The motivations are presented well, and the algorithm is easy to understand. As the “learning” contribution here is minor (see above), this work may be very well suited for a leading journal/venue in control applications.

其他意见或建议

n/a

作者回复

Thank you for taking the time to review our paper and for your feedback. We appreciate your recognition of our clear writing, motivation, and coherent algorithm.

In the following, we clarify misconceptions regarding our claims, address concerns about benchmark tasks, solver comparisons, and implementation details, and underscore the rigorous empirical validation of our approach, highlighting its suitability for publication within the broader ML community.

Performance Guarantee Clarification. You correctly point out that our claim in Sect. 4.2 focuses on the initial solution quality according to the selection function Λ. Indeed, we emphasize this explicitly in lines 210-215: The guarantee pertains to the initial solutions themselves and does not directly guarantee improved, optimized outcomes. In contrast, MISO is guaranteed to lead to an equal or better final solution in the multiple optimizers setting, as stated in lines 216-219 and empirically supported by results in lines 1071-1099.

Misinterpretation of Main Claims. We would like to clarify that we have not claimed that the primary issue is local optima. Instead, we explicitly state that our setting is optimization under tight runtime constraints (lines 11, 47, 128, 255), for which local optimization solvers' performance is sensitive to initial solutions (line 229). We clearly distinguish using a proxy-oracle, allowing an extended runtime or multiple initializations to obtain (near-)optimal solutions. Notably, our empirical results (Table 1) show that MISO frequently outperforms the proxy-oracle baseline, validating this claim.

Missing Details. We acknowledge that some details were concise in the main text but stress that comprehensive implementation descriptions, hyperparameters, and computational cost analyses are thoroughly provided in the appendix (lines 607-680), directly addressing this concern.

Benchmark Tasks. Our selected tasks are widely recognized and commonly used benchmarks in optimization and learning-based research. Notably, the nuPlan autonomous driving task is an 80-dim problem featuring realistic and complex urban environments with nonlinear dynamics, making it significantly challenging and relevant. Although some optimizers, e.g., iLQR, internally employ linearization during optimization, the underlying tasks themselves are still inherently nonlinear.

Comparison to Other Solvers. Your suggestion regarding comparisons to methods like convex QP or multi-starting solvers is well taken. However, some selected optimizers (DDP and iLQR) inherently include internal QP iterations (Tassa et al., 2014; Amos et al., 2018). Thus, our chosen baselines already implicitly encompass aspects of these suggested comparisons.

Contribution. We respectfully disagree regarding the perceived limitation of our key contributions. While prior work primarily focused on predicting single initial solutions, MISO represents the first systematic exploration explicitly designed to predict multiple ones. This novel aspect significantly addresses limitations inherent in single-initial-solution approaches. Our contributions are rigorously supported by extensive empirical evaluations demonstrating substantial performance improvements across multiple optimizers and tasks (Tables 1, 2).

Suitability. We appreciate your perspective that the paper is suited for a leading venue in control applications. However, our contribution stands to benefit the broader ML community – especially under "optimization methods" and "application-driven ML" tracks, as it explores new ways to predict and use multiple initial solutions in general optimization problems. Furthermore, the relevance is reinforced by several recent works that similarly bridge ML with optimization tools, including those published at ICML 23–24 (e.g., Ferber et al. 2023, Li et al. 2024, and Huang et al. 2024).

Non-learning-based Approaches. We have indeed compared MISO to several non-learning-based methods, such as warm-start, perturbed warm-start, and strong oracle-proxy baselines (Tables 1, 2), demonstrating the significant improvements that MISO achieves over non-learning-based approaches.

QD. While QD solutions are generally generated iteratively through evolutionary processes, MISO directly predicts multiple high-quality, diverse initial guesses in one shot. Hence, MISO does not depend on iterative, population-based exploration that underlies QD approaches.

We have made every effort to address your concerns comprehensively and clearly in our revision. If you feel your concerns have been resolved, we would greatly appreciate it if you could reflect this positively in your evaluation. Should any remaining questions or points need further clarification, please let us know—we are more than happy to provide additional information.

Again, Thank you for your constructive feedback and valuable suggestions, which have significantly contributed to strengthening our manuscript.

审稿人评论

Thanks for the helpful clarifying answers regarding the claims of the work. I still believe different baselines are required to properly evaluate this contribution:

  • convex QP solvers should be tested, in order to test the performance against the state of the art, especially for control applications. It is important to use explicit QP solvers, not just solvers that use QP iterations internally or exhibit other "aspects" of QP, because these have convergence guarantees in many relevant control settings.
  • a different multi-start method should be tested, in order to test the performance improvement from learning the multiple starting points. This is crucial to this paper, since multi-start optimization is already a popular technique (e.g., built into many optimization packages). Therefore, the value of learning the starting points should be tested against standard methods, including randomized, grid search, and Latin hypercube sampling for multi-start optimization.
作者评论

Thank you for your thoughtful additional suggestions and for acknowledging our previous clarifications regarding performance guarantees, runtime constraints, and benchmark relevance.

We remain in the position that the requested additional comparisons are not strictly required for evaluating our proposed method for reasons previously stated. Nevertheless, we recognize that these comparisons can provide additional insights, and therefore, we have provided these results below. All new results fully support our original claims.

Explicit Convex QP Solvers:

You raised a point regarding comparison to explicit state-of-the-art convex QP solvers. We explicitly compared our method against seven popular convex QP solvers available through the CVXPY library on the nuPlan driving task (Sequential setting, single optimizer). The results are summarized as follows:

MethodMean Cost (± SE)
iLQR (default)283.86 ± 37.91
OSQP314.13 ± 53.32
CLARABEL311.20 ± 55.00
PIQP289.37 ± 41.60
PROXQP274.81 ± 39.78
DAQP307.01 ± 52.10
QOCO249.20 ± 32.10
MISO WTA (ours)30.75 ± 2.15

These results indicate that explicit QP-based solvers do not achieve competitive performance in our domain within the fixed runtime limits used by our method.

Comparison to Standard Multi-start Optimization Techniques:

You suggested comparisons to common multi-start optimization methods, specifically grid search and Latin hypercube sampling (LHS). We ran both methods on the nuPlan task, allocating identical runtime budgets (5 ms) as used by MISO. Both grid search and LHS systematically select multiple initial candidate solutions; we evaluated these candidates and selected the best (lowest-cost) candidate as the initial solution for subsequent optimization.

Results across single and multiple optimizer configurations:

Single Optimizer:

MethodOne-off CostSequential Cost
Warm-start283.86 ± 37.91283.86 ± 37.91
Grid Search491.58 ± 60.14619.28 ± 65.26
LHS439.57 ± 42.24577.80 ± 64.97
MISO WTA (ours)30.17 ± 2.2430.75 ± 2.15

Multiple Optimizers:

MethodOne-off CostSequential Cost
Warm-start283.86 ± 37.91283.86 ± 37.91
Grid Search518.14 ± 57.80560.23 ± 64.04
LHS511.00 ± 60.52520.54 ± 56.67
MISO WTA (ours)30.87 ± 2.3030.48 ± 2.07

As expected, due to the significant nonzero computational cost associated with evaluating solutions in our domain, standard multi-start methods like grid search and LHS fail to find sufficiently good initial solutions given equivalent runtime constraints.

We genuinely appreciate your suggestions. To comprehensively address your concerns, we will include these additional empirical comparisons explicitly in the revised manuscript.

Given these clarifications and additional demonstrations aligned with our original claims, we respectfully hope you reconsider your evaluation positively.

审稿意见
4

This paper proposes a method called MISO (Learning Multiple Initial Solutions Optimization) to improve the performance of local optimization algorithms by predicting multiple high-quality initial solutions. The framework leverages a transformer-based architecture and introduces three loss (PD, WTA, and MIX) functions to encourage diverse initial solutions to encourage solution diversity and prevent mode collapse. MISO is evaluated on three optimization tasks—cart-pole balancing (DDP), reacher task (MPPI), and autonomous driving trajectory optimization (iLQR)—demonstrating superior performance over warm-start, regression-based, and perturbation-based initialization methods.

给作者的问题

I have no additional questions.

论据与证据

The claim that MISO improves the performance of local optimization solvers by learning multiple diverse initial solutions is reasonable. While it lacks theoretical analysis, the motivation is clear, the methodology is well-designed, and empirical results support its effectiveness. In my view, the absence of a theoretical foundation is acceptable as the approach is highly intuitive. However, the scope of the experiments is somewhat limited. Expanding the evaluation to a broader range of optimization problems would strengthen the generalizability of the claim.

方法与评估标准

The proposed methods are well-designed for the problem, and the evaluation framework is generally reasonable. The experiments cover multiple local optimization solvers and tasks, demonstrating the effectiveness of MISO. However, there are still some issues:

  1. The evaluation is focused on control tasks, while the proposed approach could potentially be applied to a broader range of optimization problems, especially larger-scale optimization problems.
  2. The paper claims that MISO improves optimization efficiency, but it lacks an analysis comparing inference time and the number of iterations required for convergence, particularly when MISO-generated initial solutions are used to call the solver.

理论论述

The paper primarily focuses on an empirical approach rather than a theoretical one and does not provide formal proofs or theoretical guarantees. A more detailed analysis of the PD and WTA loss functions, particularly an explanation of why WTA performs well, would further strengthen the paper.

实验设计与分析

Overall, while the experimental design is well-structured and supports the paper’s main claims, broadening the range of optimization tasks and increasing the scale of experiments would make the evaluation more compelling.

补充材料

I reviewed the supplementary material in its entirety.

与现有文献的关系

The idea of learning multiple diverse initial solutions is particularly valuable, as it addresses a common limitation in local optimization—sensitivity to initial conditions—and can be combined with various existing methods to enhance performance.

遗漏的重要参考文献

I did not find any missing essential references.

其他优缺点

  • Importance: The paper addresses a fundamental challenge in local optimization—the dependence on high-quality initial solutions—making it relevant to many learning-to-optimize tasks.
  • Novelty: The introduction of three loss functions (PD, WTA, MIX) to encourage diversity is a, although simple, creative and meaningful contribution.
  • Clarity: The paper is well-structured and clearly written, making it easy to follow the key ideas. The methodology is explained in a logical and intuitive manner, with well-presented figures and empirical results.

其他意见或建议

I have no additional comments or suggestions.

作者回复

We thank you for your constructive feedback and for recognizing the key contributions of our work. We are pleased that you found the idea of learning multiple diverse initial solutions both valuable and well-motivated and that you appreciated the design of our methodology and the clarity of our presentation.

Below, we respond to your specific concerns and clarify key points that we hope further strengthen our manuscript:

Detailed Analysis of the WTA Loss. We appreciate your suggestion. Winner-Takes-All training (Guzman et al., 2012; Lee et al., 2016) promotes the learning of multiple specialized hypotheses by updating only the best-performing prediction head per training instance, effectively partitioning the output space into local conditional means resembling a centroidal Voronoi tessellation (Rupprecht et al., 2017). This selective feedback encourages hypothesis diversity, mitigates mode averaging, and preserves multi-modal distributions, particularly when combined with auxiliary scoring functions that estimate regional probability masses (Letzelter et al., 2024; Rupprecht et al., 2017). We will include a dedicated section in the appendix in the revised manuscript.

Scope and Generalizability of Experiments. We agree that demonstrating generalization beyond optimal control problems is valuable. While our current work focuses specifically on optimal control to ensure thorough experimentation, we are actively exploring broader optimization scenarios as part of future work (lines 430-436). We emphasize that our proposed MISO framework is task-agnostic by design, relying only on the structure of the underlying optimization rather than problem-specific details. Consequently, we anticipate strong potential for generalization to diverse optimization domains.

Theoretical Guarantees. Without guarantees on the generator and selection function and considering the runtime constraints, it is difficult to make solid theoretical claims; conversely, with such guarantees, the problem would be significantly simplified and perhaps trivial. Most methods in this research field are heuristic-based, aiming to improve performance rather than provide formal guarantees. Nevertheless, our approach allows for the inclusion of traditional initialization methods, such as warm-start heuristics, as part of the set of initializations (lines 208-219, which we demonstrate empirically in 1071-1099). This means our method does not risk degrading performance compared to established practices, effectively providing a safety net (lines 377-361). Our contribution demonstrates that learning multiple diverse initializations can empirically enhance optimizer performance across various tasks and optimizers.

Analysis of Optimization Efficiency. You raised a valid point regarding the analysis of optimization efficiency in terms of inference time and the number of iterations to convergence. While examining optimizer convergence could offer insights, metrics such as average additional iterations or convergence time are not particularly meaningful in our context. For instance, an optimizer might converge more quickly within the allocated budget but reach a suboptimal local minimum. In contrast, another optimizer might use the entire budget to find a better solution. Faster convergence does not necessarily equate to better performance. Additionally, there are no practical advantages to converging before the budget limit, as the optimizer would wait the remaining time until that solution is required. Furthermore, sample-based optimizers like MPPI operate with a fixed sample budget (line 660), eliminating the concept of iterations. Our main emphasis is on the quality of solutions within the given computational constraints, which is directly reflected in the cost metrics we report. We demonstrate that our method scales efficiently and consistently with the number of predicted initial solutions, K (lines 967-971), and we find that all outputs remain effective even as K increases (lines 896-903).

We have made every effort to address your concerns comprehensively and clearly in our revision. If you feel your concerns have been resolved, we would greatly appreciate it if you could reflect this positively in your evaluation. Should any remaining questions or points need further clarification, please let us know—we are more than happy to provide additional information.

Again, Thank you for your constructive feedback and valuable suggestions, which have significantly contributed to strengthening our manuscript.

审稿意见
4

In this work, the authors aim to improve the performance of solving sequential optimization problems, and the key is to improve the quality of initial solutions. To address this issue, the paper introduces MISO that uses a Transformer-based predictor to predict multiple diverse initial solutions conditioned on the problem instance parameters for sequential optimization problems. The output candidate solutions can then be used either to initialize a single optimizer via a selection function or to run multiple optimizers in parallel. To encourage both accuracy and diversity, MISO is trained using a combination of regression and diversity-promoting losses. Extensive experiments on optimal control tasks (including both open-loop and closed-loop evaluation) demonstrate that MISO consistently outperforms traditional warm-start methods, single-output predictors, and ensemble approaches, especially under strict runtime constraints.

给作者的问题

See the above sections for the details. I would greatly appreciate it if the authors could address some of the concerns I raised as a reviewer outside the control/optimization community.

论据与证据

The claims the author made about inference time and optimality in cost performance are both well supported in the results, although some results are delayed to the appendix.

Specifically, the optimality in cost is supported by the results in Table 1, 2, 6, and Figure 4. And the inference speed is supported is supported in the Appendix Figure 6, 7, 8. I would recommend to move some inference speed results to the main text to make the empirical evidence more comprehensive.

方法与评估标准

I think the methods and criteria are generally clear and make sense to me, the ablations over the loss selection in the methodology are also convincing. However, I have a few questions on them:

Methods

  1. The main method is to use a transformer-based model to predict the initial solutions given the task parameters ψ\psi. However, it does not seem to be well motivated why transformer is used as the network architecture. For instance, is it necessarily better than MLP or other architectures in the experiments?
  2. The training of this transformer depends on the data collection. How can we justify the accessibility of these 'expert' data? Are the performance of the 'expert' behavior policy reported in the table, e.g. warm-start in Table 1 and 2?
    1. If we only learn from same set of tasks and do not consider the generalizability and some 'expert' policy is affordable, why don't we directly use these oracle or warm-up strategy to get multiple initial solution candidates and then use single-optimizer or multi-optimizer framework in the MISO pipeline?
    2. If we do consider the generalizability and want to use these learning-based initialization to achieve some sort of generalizability to a family of problems with slightly different problem parameters ψ\psi, the authors had better explicitly mention this in the experiment section.
  3. The dataset contains 500,000 instances, according to the Appendix A.3. Does it refer to the number of timestep, or the number of episode (whole trajectories)?

Criteria

  1. The authors used cost as the evaluation metrics, and the authors include the exact cost definition for each task in the appendix. This main evaluation criteria is reasonable for optimal control problems.
  2. To my understanding, the cost function cc is the same as the objective function JJ in Equation (1), but it is also a bit confusing that the authors use CC for the constraint function in Appendix Equation (6). Please clarify these notations since it is very closely related to the exact criteria used for the problem formulation and performance evaluation.
  3. It is unclear whether the uncertainty levels the authors present in the table 1 and 2 are 95% confidence internal or empirical variance. They only mentioned that they ran experiments under five random seeds in the appendix.

理论论述

The paper mainly focuses on the empirical performance of this MISO system compared to other initialization baselines.

实验设计与分析

The experiment design generally seems clear to me, and the results are reasonable. Here are some questions that I hope the authors could further address:

  1. What is the relationship between the cost function cc, constraint function CC in Equation (6), and objective function JJ is Equation (1)?
  2. Are the testing domains in all three tasks (the problem parameter ψ\psi) different from the training domain where you collect expert data for transformer training? If so, how are they different?
  3. As is put in the method section, authors are expected to clarify about the accessibility or potential limitations of their oracle data policy mentioned in Appendix A.3. Besides, the authors may also want to clarify how generalizable the MISO system, e.g. to a set of problems with the same parameter structure but different value range, or some problems with slightly different parameter structures.

补充材料

I review most of the supplementary materials. In fact, the authors seem to put too much details in the supplementary, and I have included some of them in the previous sections.

与现有文献的关系

I am not quite familiar with the optimal control literatures. I'm familiar with RL and model-based RL literatures, yet the contribution of this paper is quite unique compared to the conventional deep RL papers.

遗漏的重要参考文献

The references are quite adequate. The authors tackle some problems that do not seem to be very usual in sequential optimization problems.

其他优缺点

Strengths

  1. The paper is generally well-written and easy to follow.
  2. The baseline comparison is comprehensive, and it considers both open-loop and closed-loop setting where the error could accumulate in the latter setting. The results they demonstrate is quite convincing.

Weaknesses

  1. The paper presentation can be polished and the structure can be reorganized. For example, the authors mention a lot of runtime limit issues, but no empirical evidence is illustrated in the main text.
  2. The paper does not have theoretical guarantees over the MISO system. Some regret bound or sample complexity bound (e.g. the offline sample size required for the transformer to learn good initizalition that reach a ϵ\epsilon-optimal solution) would be appreciated.
  3. The data collection seems to involve too much oracle data and warm-start data, as I have put in the previous sections, it would be helpful if the authors could explain the exact setting of data collection and their testing domain.
  4. It might be helpful to justify how strong the assumption of the optimization landscape is. If we can use the regression-based loss in most of the control problems in the continuous physical domain, then we can say this initialization strategy can be applicable to a considerable family of tasks. But given the limited scale of the experiments in both the number of tasks and the dimensionality of states, we cannot assert it for now.
  5. The current experiments are simplified to low-dimensional cases. As the authors have addressed in their limitations, it could be interesting to see whether such initialization strategy could be applicable to more high-dimensional setting.

其他意见或建议

  1. Some notations are a bit confusing, e.g. control cost, objecti function, and contraints: c(x),J(x),C(x)c(x), J(x), C(x).
  2. In general, this paper is well structured, but I would suggest the authors to move some of the results in the appendix to the main text to better support their claims.
作者回复

Thank you for your constructive feedback and insightful comments. We appreciate your recognition of our extensive experiments, clarity in presenting methods and results, comprehensive baseline comparisons, and our unique contributions relative to existing literature.

In the following, we clarify concerns regarding task selection and dimensionality, emphasize the relevance of strict runtime constraints and the use of an oracle-proxy, address ambiguities and supplementary material layout, justify our choice of Transformer architectures, and discuss theoretical guarantees.

Tasks, Dimensionality, and Generalizability. Our study specifically targets sequential optimization tasks under strict runtime constraints commonly encountered in optimal control domains such as robotics and autonomous driving. The selected tasks, with their respective dimensions—Cart-Pole: 10, Reacher: 20, and Driving: 80—are standard benchmarks widely used for evaluating optimization methods. Notably, the nuPlan autonomous driving task features realistic and complex urban environments with nonlinear dynamics. Nonetheless, we acknowledge the value of investigating even higher-dimensional tasks (lines 434-439) and intend to explore such scenarios in future work. Additionally, generalizability is explicitly highlighted in lines 238-243. As stated in lines 281–285, training and testing instances differ in parameters such as initial states, targets, and reference trajectories, ensuring that improvements represent generalization rather than memorization. Moreover, MISO naturally extends to variations like obstacle locations and friction coefficients.

Runtime Constraints and Oracle Data. Our manuscript strongly emphasizes tight runtime constraints as central to our problem setting (lines 11, 47, 128, 255). Given these, directly using the (slow) oracle-proxy at test time is infeasible, underscoring the importance of our fast, learning-based initialization. We recognize obtaining data as a common limitation in BC/IL methods. Our method reduces reliance on an actual oracle by using an oracle-proxy—not necessarily globally optimal, but sufficiently effective (lines 173–177). Due to space limitations, we summarize the key evidence supporting our runtime-aware design choices: in lines 806-839, we compare inference times of ensemble and multi-output architectures, demonstrating the scalability advantages of our approach. We evaluate sampling-based methods in lines 806-839, illustrating their unsuitability under strict runtime constraints.

Supplementary, Layout, and Clarifications. We appreciate your thorough reading of the supplementary materials and suggestions for balancing the content. Given the strict page constraints, we welcome recommendations on specific parts that would be more impactful in the main text. We remain willing to revise accordingly. Additionally, thank you for pointing out several ambiguities; we will revise the manuscript to clarify the following:

  • The "500K instances" refer to time steps, not trajectories.
  • Indeed, the cost, c, matches the objective J (Eq. 1) and is distinct from the constraint C (Eq. 6).
  • The uncertainty values reported represent the Standard Error (mentioned in Fig. 4)
  • The oracle-proxy baseline (lines 296, 343, 738) corresponds to our expert policy.

Transformer Architecture. The core novelty of our approach is independent of a specific architecture and centers around the training objectives and the selection function. Transformers were chosen primarily due to their capability to effectively model dependencies within sequences. We conducted experiments (not included in the paper) with MLPs, which performed less robustly than Transformers. Further discussion is provided in lines 1062-1069.

Theoretical Bounds. We agree that such theoretical guarantees can yield valuable insights (Sambharya & Stellato, 2024; Sambharya et al., 2024). These approaches leverage PAC-Bayes methods or carefully designed sample-complexity arguments to provide guarantees—but typically for more specialized problem classes (e.g., convex or contractive operators). However, MISO targets a broader, nonconvex setting with runtime constraints where deriving general bounds typically requires overly restrictive assumptions. We, therefore, believe that providing fully general regret or sample-complexity bounds for MISO without imposing strong structural constraints remains an open research direction.

We have made every effort to address your concerns comprehensively and clearly in our revision. If you feel your concerns have been resolved, we would greatly appreciate it if you could reflect this positively in your evaluation. Should any remaining questions or points need further clarification, please let us know—we are more than happy to provide additional information.

Again, Thank you for your constructive feedback and valuable suggestions, which have significantly contributed to strengthening our manuscript.

审稿意见
4

The paper introduces MISO, a novel framework designed to improve the performance of local optimization algorithms by predicting multiple diverse initial solutions to warm start optimization problems. The motivation is to make these solutions diverse so they cover different regions of the solution space. The paper presents three training losses to promote diversity among the predicted solutions: a pairwise distance loss, a WTA loss, and a mixture of the two.  The framework is evaluated on three tasks (cart-pole, reacher, and autonomous driving) using different optimizers (DDP, MPPI, and iLQR). The results show consistent improvement over baseline methods, demonstrating the effectiveness of MISO in both single and multiple optimizer settings.

给作者的问题

What are the sizes of the optimization problems used in experiments in terms of numbers of constraints and variables?

论据与证据

The paper provides quite a lot experiments across the three tasks and optimizers, showing consistent improvements over baseline methods. The ablation studies and detailed analysis of mode diversity strengthens the evidence provide deeper insight into the behavior of the framework.

The paper mainly focuses on control tasks, but optimization problem covers a lot of others. While the results are promising, their generalizability to other optimization problems outside optimal control problem and other real-world scenarios is not fully explored.

方法与评估标准

Overall, the method is sound and correct.

The evaluation is comprehensive. The use of benchmark datasets (cart-pole, reacher, and autonomous driving) is appropriate for evaluating the framework. The selection of optimizers (DDP, MPPI, iLQR) covers a range of optimization techniques.

However, i am interesting in seeing this framework evaluated on other types of optimization problems.

理论论述

N/A

实验设计与分析

The experiments are well-designed and structured.

One concern is that it seems that the problem can be solved relatively fast, but it is hard to collect the runtime performance for each method presented in the paper.

补充材料

The appendix is comprehensive and contains many information, but i didn't carefully read through it.

与现有文献的关系

The problem and method studied in this paper are important.

遗漏的重要参考文献

The paper discusses a few related paper in combinatorial/discrete optimization. However most of them are not super relevant. One could be relevant is the following: https://ai.meta.com/research/publications/genco-generating-diverse-designs-with-combinatorial-constraints/

其他优缺点

See above

其他意见或建议

See above

作者回复

We thank you for your thoughtful and constructive feedback. We greatly appreciate your positive comments regarding the extensive experimentation, soundness of the method, and comprehensive evaluation across multiple optimizers and benchmark tasks. We are particularly pleased you found the ablation studies insightful and agree that the method addresses an important problem in the optimization community.

Below we provide responses to your specific concerns and clarify key points that we hope further strengthen our manuscript:

Sizes of Optimization Problems. Below, we outline the size and constraints of each optimization task in our experiments, as detailed in appendix lines 630-681:

  • Constraints: Bounded control inputs (e.g., acceleration, steering angle/rate), Limited rail or joint ranges, Physical/kinematic constraints (forces, torques, vehicle dynamics).
  • Dimensionality of the optimization variable: Cart-Pole: 10, Reacher: 20, Driving: 80.

Generalizability to Other Optimization Problems. We agree that demonstrating generalization beyond optimal control problems is valuable. While our current work focuses specifically on optimal control to ensure thorough experimentation, we are actively exploring broader optimization scenarios as part of future work (lines 430-436). We emphasize that our proposed MISO framework is task-agnostic by design, relying only on the structure of the underlying optimization rather than problem-specific details. Consequently, we anticipate strong potential for generalization to diverse optimization domains.

Runtime Performance. We appreciate this insightful comment and believe the paper could benefit from clearer explanations regarding runtime performance. To clarify:

  • If your concern pertains to solving a single optimization instance, you are correct that most instances typically converge rapidly (e.g., within 5 ms for the driving task). However, convergence time depends significantly on several factors, notably abrupt environmental changes between optimization steps and the quality of initialization (lines 139-144). We explicitly illustrate this in Figure 5 (left), demonstrating scenarios where certain initializations significantly impact convergence time (lines 363-374).
  • If your comment pertains to comparing the runtime of different architectures (i.e., MISO versus alternative approaches), we include two detailed runtime comparisons in the appendix:
    • Appendix A.6 (lines 806-839) contrasts the inference speed between our multi-output architecture and an ensemble-based approach, demonstrating the superior scalability of MISO.
    • Appendix A.14 (lines 1061-1069) compares our approach against sampling-based (diffusion) methods, highlighting why sampling methods are infeasible for these real-time applications.

Relevant Related Work. Thank you for suggesting this additional reference. Unlike MISO, GENCO assumes access to gradients with respect to design variables, which may not be available in settings involving black-box simulators or non-differentiable objectives. We will incorporate it in the related work section.

Thank you once again for taking the time to carefully review our paper and for your insightful feedback. We have made every effort to address your concerns and have revised the manuscript accordingly. Should you have any remaining questions or require further clarification, we would be more than happy to provide additional details.

If you feel your concerns have been resolved, we would greatly appreciate it if you could reflect this positively in your evaluation.

最终决定

Dear authors,

thank you for submitting your work to ICML. Most of the reviewers find the main research idea behind MISO method interesting. In your paper, you aimed to train a transformer architecture that -- given the problem parameters -- can propose diverse set of initial starting points that could be used for the subsequent optimization problem (it make sense mostly for non-convex problems). Reviewers mostly agreed that this problem is very relevant and timely, the experiments are demonstrating the claim made in the paper and appreciated your rebuttals.

Despite that, there were some unresolved issues that prevented me to accept your paper. These includes

  • limited experiments to optimal control (despite this being an important subset of problems in ML)
  • numerical experiments in the paper focuses on comparing the proposed approach and have not included various SOTA baselines
  • the novelty seems to be partially incremental and lacks any theoretical justification. I strongly believe that more principled analysis could strengthen your paper.

I would also appreciate more explanation in Section 4.2. How the PD loss is encouraging the diverse set of initial solution candidates? If all the candidates are the same, the L_PD is 0 right?