PaperHub
6.3
/10
Poster3 位审稿人
最低3最高4标准差0.5
3
3
4
ICML 2025

Differentiable Solver Search for Fast Diffusion Sampling

OpenReviewPDF
提交: 2025-01-12更新: 2025-07-24
TL;DR

Differentiable Solver Search for Fast Diffusion Sampling

摘要

Diffusion models have demonstrated remarkable generation quality but at the cost of numerous function evaluations. Recently, advanced ODE-based solvers have been developed to mitigate the substantial computational demands of reverse-diffusion solving under limited sampling steps. However, these solvers, heavily inspired by Adams-like multistep methods, rely solely on t-related Lagrange interpolation. We show that t-related Lagrange interpolation is suboptimal for diffusion model and reveal a compact search space comprised of time steps and solver coefficients. Building on our analysis, we propose a novel differentiable solver search algorithm to identify more optimal solver. Equipped with the searched solver, rectified-flow models, e.g., SiT-XL/2 and FlowDCN-XL/2, achieve FID scores of 2.40 and 2.35, respectively, on ImageNet-$256\times256$ with only 10 steps. Meanwhile, DDPM model, DiT-XL/2, reaches a FID score of 2.33 with only 10 steps. Notably, our searched solver outperforms traditional solvers by a significant margin. Moreover, our searched solver demonstrates generality across various model architectures, resolutions, and model sizes.
关键词
solverdiffusion sampling

评审与讨论

审稿意见
3

This paper proposes a novel solver search algorithm for fast sampling of diffusion models, which optimizes both timesteps and solver coefficients. The key idea is to treat the solver design as a learning problem, optimizing solver parameters to minimize the numerical error and improve image quality. Experiments on rectified-flow models (SiT-XL/2, FlowDCN-XL/2) and DDPM (DiT-XL/2) demonstrate that the searched solvers can achieve improved FID with 5-10 steps. The learned time steps and coefficients can generalize to different model architectures and resolutions empirically.

Update after rebuttal

  • Most initial concerns have been addressed or clarified during the rebuttal. I'm supportive of acceptance as it's effective and well-supported by empirical evidence.
  • That said, I would not advocate for a higher rating due to the limited broader impact and significance of the contribution.

给作者的问题

  • How does the proposed method compare to the prior work [1] which optimizes the time steps?
  • What is the total computation cost to train the time steps and coefficients? How does it compare to [1]? Could you elaborate on the computational cost and efficiency of the solver search process in more detail, particularly in relation to the performance gains achieved?
  • Are there any techniques or optimization strategies that could be explored to reduce the computational burden of the search process?
  • Since the discretization schemes of the reference trajectory (L steps) and learned trajectory (N steps) are different, how do you compute the MSE loss between these two trajectories?

[1]: Xue, Shuchen, et al. "Accelerating diffusion sampling with optimized time steps." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.

论据与证据

Yes, most claims are well-supported.

方法与评估标准

Yes, the approach and evaluation criteria make sense.

理论论述

Yes.

实验设计与分析

Yes, the experimental designs make sense.

补充材料

Yes, I reviewed Appendix A-I.

与现有文献的关系

This paper is most related to [1] where only time steps are learned. This paper extends the search space to both time steps and coefficients and employs a different optimization objective and strategy.

[1]: Xue, Shuchen, et al. "Accelerating diffusion sampling with optimized time steps." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.

遗漏的重要参考文献

A few prior works ([1,2]) have also explored the idea of accelerating the solver via learning. They should also be cited and discussed in the paper.

[1]: Watson, Daniel, et al. "Learning fast samplers for diffusion models by differentiating through sample quality." International Conference on Learning Representations. 2021.

[2]: Dockhorn, Tim, Arash Vahdat, and Karsten Kreis. "Genie: Higher-order denoising diffusion solvers." Advances in Neural Information Processing Systems 35 (2022): 30150-30166.

其他优缺点

Strengths

  • The proposed method consistently achieves improved FID score within few-step regime compared to previous methods like DPM-solver++ and UniPC.
  • The authors provide theoretical justification for the approach, including error bound analysis and theorems supporting.
  • The learned time steps and coefficients can generalize to different model architectures and resolutions empirically.

Weaknesses

  • Clarity: The paper currently contains numerous typographical errors, inconsistencies, and formatting issues, which affect readability and clarity. See details in “Other Comments or Suggestions”.
  • The proposed approach requires generating tens of thousands of reference trajectories to learn the time steps and coefficients, which could be computationally expensive both in terms of time and space.
  • The learned solver might become suboptimal for different guidance scales.

其他意见或建议

Below are examples of specific errors noted during my review. There are more in other parts of the paper. I strongly recommend that the authors thoroughly proofread and revise the manuscript to address these issues comprehensively.

  • Line 191: “with prerained model” should be “with pretrained model”
  • Line 229: “of Our solver” → “of our solver”
  • Line 230: the equation is out of page, bijb_i^j is used without definition.
  • Line 260: remove comma from “{1-\sum_{j=0}^{i-1}c_i^j,}_{i=0}^{N-1}” and “{c_i^k, }”.
  • Line 314: “reconstruction error(in Appendix)” → “reconstruction error[need a space](in Appendix~[need to add reference])”
  • Line 315: “Euler-250 steps” → “250-step-Euler”?
  • Line 409: “Of Solver Parameters” → “of Solver Parameters”
  • Line 381: “Comparison with Distillation methods” → “Comparison with distillation methods”
  • The use of capitalization and periods in section headings and table captions is inconsistent, confusing, and distractive:
    • Some sections only capitalize the first word like Section 4 “Optimal search space for a solver” while the other sections capitalize all the words like Section 2 “Related Works”
    • Section 4.2: “Focus on Solver coefficients instead of the interpolation function” capitalize the first character of “focus” and “solver”, which is even more confusing.
    • Section 5: why is there a additional period?
    • Table 1: “Comparsion with Distillation methods” why is the first character of “Distillation” capitalized? Also, “Comparsion” → “Comparison”
作者回复

We would like to express our heartfelt gratitude for the valuable feedback you've provided on our manuscript. Your in-depth analysis and suggestions are of great significance to us, and we are committed to using them to enhance the quality of our work.

Q.1 Writing typos and inconsistent presentations

Thank you for pointing out the detailed writing typos and inconsistent presentations. We sincerely apologize for these inadvertent errors. We will meticulously review and revise every detail to enhance the readability of the text.

Q.2 Comparison with DM-Nonuniform[1]

DM-Nonuniform[1] primarily centers on the theoretical optimal timesteps, yet it fails to take into account the solver coefficients and model statistics. In contrast, our method conducts a statistical search for both the coefficients and timesteps concurrently. Through theoretical analysis, we have demonstrated that our method has a smaller error bound compared to those that neglect coefficients. This shows the superiority of our approach in more comprehensively handling the relevant factors in this context.

We compared the performance with DM-Nonuniform[1] in tab.2 and tab.3. We copy the result here.

Methods \ NFEs5678910
DPM-Solver++ with uniform-λ\lambda-opt[1]12.535.443.587.545.974.12
DPM-Solver++ with uniform-tt-opt[1]12.535.443.893.813.132.79
DPM-Solver++ with EDM-opt[1]12.535.443.953.793.303.14
UniPC with uniform-λ\lambda-opt[1]8.664.463.573.723.403.01
UniPC with uniform-tt-opt [1]8.664.463.743.293.012.74
UniPC with EDM-opt [1]8.664.463.783.343.143.22
Searched-Solver7.403.942.792.512.372.33

For DiT-XL/2-R512

Methods \ NFEs5678910
UniPC with uniform-λ\lambda-opt[1]11.405.954.824.686.936.01
UniPC with uniform-tt-opt[1]11.405.954.644.364.053.81
Searched-solver(searched on DiT-XL/2-R256)10.286.024.313.743.543.64

Q.3 Reduce the search burden

First, a significant amount of computational resources is wasted on constructing the target trajectory. Since this target trajectory can be reused for each step in the search for solvers, we can cache it to prevent redundant recomputation.

Furthermore, we have observed that the solver optimized on base-sized or even small-sized models exhibits a high degree of generalization when applied to XL-sized models. Thus, using a small model as a proxy is a viable and practical choice. This approach not only reduces computational overhead but also provides a more efficient way to achieve good performance across different model scales.

Q.4 Alignment between two trajectories

Since the learned trajectory of length NN has its corresponding timesteps, we select a subset of length NN from the reference trajectory based on the timesteps of the learned trajectory.

Q.5 Total burden of searching

Searching one solver step with 50,000 samples using FlowDCN-B/2 requires approximately 30 minutes on 8 × H20 computation cards.

[1]: Xue, Shuchen, et al. "Accelerating diffusion sampling with optimized time steps." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024

审稿人评论

Thanks for your response. One follow-up question on the trajectory loss: The reference trajectory may not necessarily contain values at the learned time steps, right? Are you using interpolation to obtain values at the learned timesteps from the reference trajectory?

作者评论

Yes, the reference trajectory may not necessarily contain values at the learned time steps. However, the reference trajectory has much more points (100100 reference steps in the default setting), so for each xsx_s in the source trajectory, we can directly select the closest points from the reference trajectory based on the nearest timesteps, which is equivalent to the nearest neighbor interpolation.

审稿意见
3

The paper aims to accelerate reverse diffusion by integrating a novel differentiable solver search algorithm for better diffusion solvers. The paper demonstrates that a data-driven approach in the post-training scenario can also enable fast sampling. Using a compact search space related to the timesteps and solver coefficients, the proposed method can find the optimal solver parameter for each diffusion model. The experiment shows the effectiveness of the proposed method on multiple models compared to the current solver-based fast sampling method.

update after rebuttal

The extended visualization and evaluation shows the improvement of the Solver Search. Although the proposed method can not be generalized to multi-resolution scenario. It still offers a good solution for optimal timestep determination. Thus, my recommendation for this paper is weak accept.

给作者的问题

  • What is the CLIP-score, the aesthetic score, and GenEval score for PixArt-α\alpha? It would be helpful if the method could be evaluated on these metrics on large diffusion models, such as SD3.
  • What is the overhead to derive the optimal coefficients?

论据与证据

The paper claims that the error caused by the non-ideal velocity estimation model can be estimated by a function related to the timesteps and coefficients. The claim is verified in the appendix.

方法与评估标准

The proposed method is evaluated on text-to-image generation using multiple metrics. However, these metrics are limited. For example, CLIP-score, GenEval, aesthetic score, etc, are not included.

理论论述

I checked correctness of Theorem 4.4.

实验设计与分析

  • The experiments only provide a quantitative comparison for DDPM/VP for text-to-image models but not rectified flow models. A similar evaluation should also be conducted.
  • Solver-based methods are also included in the comparison with distillation in table 1.
  • The comparison between the proposed method and FlowTurbo is limited, more results should be exhibited such as on different models and metrics, other than FID and IS.

补充材料

I reviewed the A - H

与现有文献的关系

The proposed method might help reveal the error of each timestep and identify the importance of each diffusion timestep.

遗漏的重要参考文献

n/a

其他优缺点

  • The quality comparison is limited. Only Figure 2 provides a few examples.
  • More quality results would be helpful to demonstrate the effectiveness of the proposed method across different prompts and diffusion models.
  • More comparison should be focused on FlowTurbo since they both are parameterized velocity refiners.

其他意见或建议

  • using ×\times instead of x in table 1.
  • There are replicated parts in the supplementary materials, G and L.
作者回复

We sincerely appreciate your valuable feedback on our manuscript. Your insights are extremely helpful and have provided us with clear directions for improvement.

Q.1 Quality comparison

We plan to expand the quality comparison by including more models, such as SD3, Pixart-α\alpha-R512, and Pixart-α\alpha-1024. This will provide a more comprehensive evaluation of the performance and quality across a wider range of relevant models, enhancing the depth and validity of our analysis.

The anonymous visualization link: https://anonymous.4open.science/r/NeuralSolver-ICML25/README.md

Q.2 More Comparison with Flow-Turbo

We presented the performance comparison in Tab. 4. Additionally, we have summarized the sampling and searching complexity in the table below, in relation to FlowTurbo. It should be noted that the value of nn will not exceed 15 steps. .

StepsNFENFE-CFGCache PredOrdersearch samplesParams
Adam2nn2n22/n
Adam4nn2n44/n
heunn2n4n22/n
FlowTurbon>>n>>2n22540000(Real)2.9×1072.9\times10^7
ournn2nnn50000(Generated)n + 0.5(n×\times(n-1))

Q.3 Pixart on GenEval We provided the solver(searched on DiT-XL/2-R256) with PixArt-α\alpha on GenEval benchmark

Resolution 512 for PixArt-α\alpha on GenEval benchmark

stepscfgcolorscountingcolor_attrtwo_objectsingle_objectpositionall
dpm++51.572.0727.195.7526.2691.2530.37587
81.577.6632.196.7536.3694.064.50.41921
unipc51.573.1425.946.2526.2690.9430.37588
81.578.7232.506.540.1593.755.50.42875
ours51.572.8731.56633.0891.8850.40065
81.576.8633.44740.4094.065.50.42878

Resolution 512 for PixArt-α\alpha on GenEval benchmark

stepscfgcolorscountingcolor_attrtwo_objectsingle_objectpositionall
dpm++52.076.6030.946.5033.0891.254.750.40519
82.076.8637.195.2539.6593.755.750.43074
unipc52.077.6631.876.5034.8592.195.250.41387
82.079.5236.566.7240.6695.3160.44134
ours52.077.6233.755.2537.3792.814.750.41933
82.079.5238.447.2542.6895.007.500.45064

Q.4 Total burden of searching

Searching one solver step with 50,000 samples using FlowDCN-B/2 requires approximately 30 minutes on 8 × H20 computation cards.

审稿人评论

Thanks for authors reply. The extended visualization and evaluation shows the improvement of the Solver Search. I have a follow up question regarding the method. Current flow-matching models apply timestep shift while sampling resolution changed. Does the method still work for varied resolution?

作者评论

Due to the strong coupling between our coefficients and time steps, we can no longer use timeshift simultaneously. However, we found that directly transferring the search results still yields satisfactory performance. To pursue ultimate performance, one should need to conduct a search specifically tailored for this resolution.

审稿意见
4

This paper proposes a differentiable solver search algorithm to find an optimal ODE solver for reverse-diffusion solving of pre-trained diffusion models. The authors use gradient-based optimization to identify solver parameters that lead to improved sample quality with very few function evaluations. The approach is evaluated on both rectified flow models and DDPM/VP frameworks, showing improvements in FID scores on ImageNet benchmarks under 10 sampling steps.

给作者的问题

No

论据与证据

The authors claim that their differentiable search method significantly reduces discretization error compared to traditional solvers. This claim is supported by extensive experiments, including comparisons to state-of-the-art methods such as DPM-Solver++ and UniPC, as well as ablation studies examining the impact of search sample size and solver parameterization. The theoretical analysis, detailed in the appendix, provides error bounds that reinforce the empirical findings.​

方法与评估标准

The methodology addresses limitations of t-related Lagrange interpolation in existing fast sampling solvers by reparameterizing solver coefficients and timesteps into a differentiable framework. The evaluation is comprehensive, utilizing FID and other metrics across multiple model architectures and resolutions. The choice of benchmarks, including ImageNet-256 and ImageNet-512, and the inclusion of both rectified flow and DDPM-based models, provide a strong basis for assessing the method’s generality and effectiveness.​

理论论述

The paper provides theoretical support for its solver-search method by deriving explicit bounds on discretization error. Key results include Theorem 4.4, showing that solver error depends explicitly on solver coefficients and timesteps, and Theorem 4.2, establishing the optimality of expectation-based solver coefficients over traditional Adams-like interpolation. And also, Theorem 4.5 which argues analytically that the proposed solver achieves tighter error bounds than conventional multi-step methods. These results justify the approach theoretically, I also checked the proof of these claims but not very carefully.

实验设计与分析

The experimental evaluation seems to bde quite comprehensive, with detailed comparisons to recent solver-based methods. The ablation studies are informative, demonstrating how the performance of the searched solver varies with different numbers of search samples and parameter settings.

补充材料

The supplementary material includes extended experimental results, additional metrics (sFID, IS, Precision, Recall), and detailed proofs of the theoretical claims.

与现有文献的关系

The authors situate their work within the context of recent advances in fast diffusion sampling and solver-based methods. The paper builds on insights from prior works on DDPM/VP solvers and rectified flow models, providing relevant comparisons to state-of-the-art techniques like DPM-Solver++ and UniPC. This discussion clarifies how the proposed approach advances the current understanding of efficient diffusion sampling.

遗漏的重要参考文献

Authors reference most relevant works.

其他优缺点

Strengths:

  1. The paper is well-written and well-structured
  2. The experiments are quite comprehensive to evaluate the method

Weakness:

  1. The improvements, while consistent, are incremental compared to existing solvers.
  2. The paper would benefit from a more detailed discussion on the computational overhead of the search process.​

其他意见或建议

No

作者回复

Thanks for your valuable feedback on our manuscript.

Q.1 Total burden of searching

Searching one solver step with 50,000 samples using FlowDCN-B/2 requires approximately 30 minutes on 8 × H20 computation cards.

Q.2 More Quality comparison

We plan to expand the quality comparison by including more models, such as SD3, Pixart-α\alpha-R512, and Pixart-α\alpha-1024. This will provide a more comprehensive evaluation of the performance and quality across a wider range of relevant models, enhancing the depth and validity of our analysis.

The anonymous visualization link: https://anonymous.4open.science/r/NeuralSolver-ICML25/README.md

最终决定

All reviewers recommend acceptance (1 accept, 2 weak accepts). After reading the paper, reviews, and discussion, I agree with the reviewers and recommend acceptance.