PaperHub
4.3
/10
Rejected3 位审稿人
最低3最高5标准差0.9
5
3
5
3.3
置信度
ICLR 2024

ERA-Solver: Error-Robust Adams Solver for Fast Sampling of Diffusion Probabilistic Models

OpenReviewPDF
提交: 2023-09-22更新: 2024-02-11

摘要

Though denoising diffusion probabilistic models (DDPMs) have achieved remarkable generation results, the low sampling efficiency of DDPMs still limits further applications. Since DDPMs can be formulated as diffusion ordinary differential equations (ODEs), various fast sampling methods can be derived from solving diffusion ODEs. However, we notice that previous fast sampling methods with fixed analytical form are not able to robust with the various error patterns in the noise estimated from pretrained diffusion models. In this work, we construct an error-robust Adams solver (ERA-Solver), which utilizes the implicit Adams numerical method that consists of a predictor and a corrector. Different from the traditional predictor based on explicit Adams methods, we leverage a Lagrange interpolation function as the predictor, which is further enhanced with an error-robust strategy to adaptively select the Lagrange bases with lower errors in the estimated noise. The proposed solver can be directly applied to any pretrained diffusion models, without extra training. Experiments on Cifar10, CelebA, LSUN-Church, and ImageNet 64 $\times$ 64 (conditional) datasets demonstrate that our proposed ERA-Solver achieves 3.54, 5.06, 5.02, and 5.11 Frechet Inception Distance (FID) for image generation, with only 10 network evaluations.
关键词
diffusion models

评审与讨论

审稿意见
5

The paper proposes a strategy to select a fixed number of estimated Gaussian noises from the buffer per timestep t_i, and then used the selected ones in estimating the next diffusion state via the implicit Adams numerical method. The selection strategy intends to minimize the prediction error of the estimated Gaussian noises. Different experiments are performed, showing the effectiveness of the new sampling method.

优点

The paper proposes a new method for selecting a fixed number of the estimated Gaussian noises from the buffer per timestep to better compute the next diffusion state. It seems that the search procedure is performed online for each individual sampling, which is interesting.

缺点

(1). The method needs to introduce a buffer to store all the historical estimated Gaussian noises up to the most recent timestep. As the timestep t_i approaches to 0, the buffer size is increasingly large which is undesirable from a practical point of view.

(2). As the buffer becomes large along with the timestep, one can imagine that it would take more time to do searching. Therefore, the method requires not only more memory but also more sampling time.

问题

(1) I think Theorem 2 is not properly formulated because the term "large enough" cannot be quantified.

(2) I don't get which line in Algorithm 1 performs selection of the estimated Gaussian noises from the buffer.

(3) It is not clear from the paper how the search is performed. Is it greedy search?

审稿意见
3

Some fast sampling methods of DDPMs rely on the equivalence between sampling and solving an ODE on the noise process. They then leverage various ODE solvers. The main remark of the paper is to notice a significant discrepancy between the theoretical noise and the noise practically estimated by the trained diffusion model. This noise is also specific to each dataset. The authors hence propose a technical solution to include this uncertainty in the ODE solver and show numerically how their approach outperforms existing approaches on several classical datasets.

优点

The remarks and technical solutions of the authors for accelerating ODE-based fast sampling DDPMs are novel and well-conducted.

缺点

There are three weaknesses in this paper:

  • The motivation of the paper is not compelling. The argument is that the main drawback of DDPMs is the sampling time and that there are only two areas of research: 1) fast ODE samplers and 2) distillation or learning-based samplers. The authors need to mention latent diffusion models, which are the go-to solution for fast sampling for the generating tasks mentioned in the paper, i.e., classical natural image problems. How does their approach even compare to latent diffusion models? For the authors' argument to be compelling, I suggest they experiment with their approach to generating tasks where latent diffusion models are complex to leverage, i.e., in tasks where we need access to a good latent representation of the data. Alternatively, it would be interesting to see how their approach can be used to accelerate the sampling of latent diffusion models further. Because otherwise, the paper is interesting as a new technical solution but will not be helpful in practice.
  • There are plenty of typing errors and sentences that need meaning. With all the corrector error systems, this is not very pleasant to read. For instance, in the introduction, there are almost systematically missing spaces between words and citations. Alternatively, even a sentence in the abstract that is hardly understandable "are not able to robust with the various error patterns in the noise estimated ...".
  • The numerical experiment only showcases FID scores; why not use other scores, such as improved recall and precision?

问题

None

审稿意见
5

This paper proposes an error-robust Adams solver (ERA-Solver) that consists of a predictor (similar to the Adams-Bashforth method) and corrector (similar to the Adams-Moulton method). The authors propose an error-robust selection strategy to select the former evaluations rather than the last kk evaluations in the predictor. The experiment result shows it can achieve good sample qualities at a few NFEs.

优点

\cdot The writing of this paper is clear and easy to follow.

\cdot The authors take the score estimation error into consideration in the sampling process, which is novel. The authors conduct some simple experiments to verify that as t0t\rightarrow 0, the error in terms of ϵ\epsilon becomes larger.

缺点

\cdot Some experiment comparison results are unfair. For example, in Table 5 in this paper, on the LSUN-bedroom dataset, the DPM-Solver++[2] achieves 6.04 FID in 20 NFEs. However, in the DPM-Solver[1] paper, DPM-Solver achieves 3.09(ϵ=1e3\epsilon = 1e-3)/2.60(ϵ=1e4\epsilon = 1e-4) FID in 20 NFEs. This paper and [1] both use the pretrained diffusion models offered in [3]. I guess the reason is that in this paper the authors use the 'time linear space' schedule for timesteps while [1] use the 'logSNR linear space' schedule. I suggest the authors should compare these methods in a better setting.

[1] DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps, Lu et al.

[2] DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models, Lu et al.

[3] Diffusion Models Beat GANs on Image Synthesis, Dhariwal et al.

问题

See weaknesses

AC 元评审

This article makes an interesting remark that existing fast sampling methods for diffusion model are not robust with respect to various error patterns in the noise. A new integrator is proposed to address this issue. However, reviewers unanimously raised several concerns regarding both the presentation and the sufficiency of demonstration. Therefore, I cannot recommend acceptance, but encourage the authors to consider providing further evidence in a future submission.

为何不给更高分

I agree with reviewers' assessment.

为何不给更低分

I agree with reviewers' assessment.

最终决定

Reject