PaperHub
7.1
/10
Poster5 位审稿人
最低4最高5标准差0.5
5
4
4
4
5
3.6
置信度
创新性2.8
质量3.0
清晰度2.4
重要性2.8
NeurIPS 2025

DGSolver: Diffusion Generalist Solver with Universal Posterior Sampling for Image Restoration

OpenReviewPDF
提交: 2025-05-09更新: 2025-10-29
TL;DR

Fast high-order generalist diffusion solvers with universal compensation mechanisms for high-quality image restoration in a training-free manner

摘要

关键词
Image restorationdiffusion generalist solveruniversal posterior samplingdeep learning

评审与讨论

审稿意见
5

The paper proposes a training-free solver for diffusion generalist models, termed as DGSolver. First, it reformulates the diffusion process as ODEs and exploits the semi-linear integral structure to better approximate the solutions. This avoids the computational overhead of training a series of networks for quality refinement. Second, it integrates the estimated residual term I_{res} into diffusion posterior sampling to construct the universal sampling mechanism, which guides the sampling process universally without the prior knowledge of degradation formulation. These two components enhance the restoration accuracy and efficiency across a variety of degradation tasks. Overall, the proposed method demonstrates notable theoretical novelty, practical applicability, and broad adaptability.

优缺点分析

Strengths

1.This paper presents a rigorously derived and technically sound ODE integrator with accelerated sampling strategy for accurate sampling from diffusion models to achieve quality refinement, which can be widely exploited and benefited by other related methods.

2.The proposed UPS is novel and clever, and does not require the prior knowledge or estimation of degradation formulation. It can cope with various types of degradation without complicated designs with high potential.

3.All the framework is well motivated and the explanations are clear. The authors effectively provide a theoretical derivation and error analysis of the proposed method.

4.Comprehensive experiments are carried out on multiple datasets, various tasks, and different domains, fully demonstrating the proposed method’s superior performance over state-of-the-art methods both quantitatively and qualitatively.

Weaknesses

1.The expressions for α(t)\alpha(t), β(t)\beta(t), and γ(t)\gamma(t) in Eq.(4) lack clarity.

2.Some of the notations in the appendix lack definitions and some typos need to be corrected (e.g., IresθI_\text{res}^{\theta} is misspelled in Appendix F). It would be worthwhile to spend some effort to make that better.

问题

1.How do you choose the sampling steps or variable tt in the reverse process? What are the relationships between time step tt and predefined parameters α(t)\alpha(t), β(t)\beta(t), and γ(t)\gamma(t) in Eq.(4)? I think it is important for readers to understand the paper and implementation.

2.Please provide the computational complexity analysis before and after applying the queue-based sampling strategy.

3.When dealing with compound degradation, the model can restore degraded images in a specific order, which seems to be determined spontaneously by the unified model. Could the authors share some insights on achieving controllable restoration in the given order or handling all degradation types in one execution for compound degraded images based on the proposed method?

局限性

Yes, the authors have discussed the related limitations.

最终评判理由

After further discussion with the author, I still maintain my acceptance decision.

格式问题

None

作者回复

W1&Q1: Experssion and selection of noise schedule

Following the linear noise schedule, the formulations of αˉt,βˉt,δˉt\bar{\alpha}_t,\bar{\beta}_t,\bar{\delta}_t are:

βt=βmin+(βmaxβmin)tT,βmin=0.0001,βmax=0.02\beta_t = \beta_{min} + (\beta_{max} - \beta_{min})\frac{t}{T},\beta_{min} = 0.0001,\beta_{max} = 0.02,

δt=δmin+(δmaxδmin)tT,δmin=1×106,δmax=0.002\delta_t = \delta_{min} + (\delta_{max} - \delta_{min})\frac{t}{T},\delta_{min} = 1\times 10^{-6},\delta_{max}=0.002

βˉt=0tβi\bar{\beta}_t = \sum_0^t \beta_i, αˉt=10t(1βi)\bar{\alpha}_t = 1 - \sqrt{\prod_0^t (1 - \beta_i)}, δˉt=0tδi\bar{\delta}_t = \sum_0^t \delta_i

Notably, the total timesteps T=1000,tZ,0t<TT=1000, t\in\mathcal{Z}, 0\le t < T. Since βt\beta_t and δt\delta_t are non-negative within the range of [0,1], three variables βˉt,αˉt,δˉt\bar{\beta}_t,\bar{\alpha}_t,\bar{\delta}_t exhibit strict monotonic increase.

W2: Typos

We sincerely appreciate your meticulous review, and we will correct the misspelled IresθI_{res}^{\theta} in revision. We promise to revise other typos for readability.

Q2: Complexity of naive and queue-based sampling strategies

We utilize neural function evaluations (NFEs) to evaluate the computational complexity. Let kk denote the solver order and nn be sampling steps. For a kk-order solver, naive sampling from time ss to tt requires interpolating k1k-1 intermediate points within the interval, resulting in kk NFEs per sampling step. Consequently, the overall computational complexity for nn steps is O(nk)O(nk). In contrast, the queue-based sampling strategy precomputes the values at (k−1) intermediate time points and caches them for reuse in subsequent steps. This reduces the total complexity to O(k1+n)O(k−1+n), offering a clear computational advantage for multi-step sampling. For quantitative results, please refer to Tab. r1 in Reviewer (9bRU-W1) and Tab. r3 in Reviewer (1zc3-W2&Q3).

Q3: Insights about controllable restoration.

Our core idea is to map different degradations into a shared representation space, and then leverage high-order solvers to enhance reconstruction accuracy, the queue-based sampling to improve efficiency, and the UPS mechanism to provide effective guidance for restoration. Hence, our model handles compound degradation through a unified framework, but the restoration order is indeed implicitly learned during training rather than explicitly controlled. Below we clarify some insights and potential extensions: (i) Add task-specific prompts as controllable factors to condition the solver. For example, we can mine degradation priors cc either from the image itself or from foundation models, and train model Iresθ(It,Iin,t,c)I_{res}^{\theta}(I_t,I_{in},t,c) using paired data with restoration order. With learnable parameters, image restoration becomes controllable. (ii) Project the gradient guidance logp(IinIt)\nabla \log p(I_{in}|I_t) onto orthogonal subspaces for degradation-specific restoration. For instance, we assign orthogonal bases vi,i=1,..,kv_i, i=1,..,k for kk degradation types during training, and utilize the UPS term to project gradients ikθiPvilogp(IinIt)\sum_i^k \theta_i P_{v_i}\nabla \log p(I_{in}|I_t), where θi\theta_i is a switch for activating the ii-th degradation subspace and PviP_{v_i} is a projection operator. This facilitates controllable and interpretable restoration. We sincerely appreciate your insightful comments, and we will conduct further research in our future work.

评论

I appreciate the authors’ thoughtful reply, which has resolved my concerns. Having considered the other reviewers’ feedback, I continue to support acceptance of this paper.

评论

We sincerely appreciate your valuable review and kind support for the acceptance of our paper. Your constructive comments have significantly contributed to improving our work.

审稿意见
4

This paper presents DGSolver, a novel framework designed to balance the commonality of degradation representations with restoration quality in diffusion models. DGSolver mitigates accumulated discretization errors through tailored high-order solvers with a universal posterior sampling strategy, and further enhances sampling efficiency by introducing a queue-based accelerated sampling strategy.

优缺点分析

Strengths:

  1. The paper provides a thorough and rigorous derivation of the proposed method, demonstrating a solid theoretical foundation.

  2. The proposed high-order solvers and universal posterior sampling strategy are intuitively illustrated in the figure, effectively clarifying their fundamental principle.

  3. Experimental results demonstrate the effectiveness of the proposed method from qualitative, quantitative perspectives.

Weaknesses:

1.The ablation study should quantitatively compare model parameters, sampling steps, and inference time with baseline DiffUIR[1] to better demonstrate the superiority of the proposed method.

2.Lack of quantitative comparison with the latest state-of-the-art methods proposed in 2025.

3.Table 4 shows that increasing the number of sampling steps beyond a certain point leads to a decline in PSNR/SSIM. Could the authors provide insight into this counterintuitive result?

4.Why were the experimental settings (datasets and training parameters) from the baseline DiffUIR not adopted?

5.The motivation for designing the forward process as a stochastic differential equation (SDE) is unclear. Its practical significance remains questionable, as no explicit analysis or justification is provided in the experiments.

6.The expression “total M+1 time steps” in line 146 is inconsistent with the notation presented in Figure 2.

7.In Figure 5, the position of blue boxes is inconsistent.

8.In Tables 2, 3, and 4, the best results should be highlighted in bold.

问题

1.The Universal Posterior Sampling (UPS) strategy demonstrates clear performance benefits in this work. Could UPS also be applied to other existing methods?

2.Is the queue-based sampling strategy affecting model performance while improving efficiency?

3.Does increasing the order of the high-order solvers influence the efficiency of inference?

局限性

The limitations have been discussed in the supplementary materials.

最终评判理由

I will retain my original recommendation.

格式问题

NA

作者回复

W1: Ablation and comparison with DiffUIR

We sincerely appreciate this constructive suggestion. We perform a detailed efficiency comparison with baseline DiffUIR, as presented in Tab. r7. Both methods maintain identical model parameters (7.73M), confirming our enhancements require no additional network capacity. Our method outperforms the baseline in terms of performance, but the overall time consumption increases. For a fair comparison, we apply our solver to DiffUIR for inference process, and the results show a significant improvement in restoration quality, as presented in Tab. r2 in Reviewer (9bRU-W3). This verifies our solvers can be integrated into related methods to enhance restoration accuracy without retraining. For detailed efficiency comparison, please refer to Tab. r1 in Reviewer (9bRU-W1).

MetricDiffUIROurs (k=1)Ours (k=1+UPS)Ours (k=2)Ours (k=2+UPS)Ours (k=3)Ours (k=3+UPS)
Param.(M)7.737.737.737.737.737.737.73
Steps3888888
Time (s)0.2060.5291.1440.5981.2730.6571.470
Average PSNR (dB)29.9330.7031.0131.2031.3631.2331.36
Average SSIM0.9070.9130.9180.9190.9200.9190.920

Table r7: Ablation and comparison with DiffUIR

W2: Comparisons with 2025 SOTAs

For a fair comparison, we reimplemented and retrained several representative universal image restoration models, namely AwRaCLe[1], MaIR[2], and DeepSNNet[3], on our collected dataset using their released code and official settings. The quantitative results are presented in Tab. r8. Evidently, our method outperforms others. Thanks for your constructive comments, and we will include these results in revision.

MethodYearDerainingEnhancementDesnowingDehazingDeblurringAverageComplexity
PSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMParams(M)FLOPs(G)
DA-CLIP202428.630.85419.500.73028.230.93427.260.94126.470.81827.540.88132.96158.14
DiffUIR202430.670.88721.210.76930.700.94330.290.94429.000.87729.930.9077.7332.93
AwRaCLe[1]202529.420.86822.220.77328.900.94424.600.88327.540.84627.940.88894.18165.42
MaIR[2]202529.170.85920.090.72628.850.93126.790.93626.740.81227.910.88020.71110.44
DeepSNNet[3]202528.970.84717.840.65930.170.92728.000.93925.800.76828.200.86517.3171.79
Ours-31.460.89623.840.80132.690.95531.680.94630.150.89931.360.9207.7332.93

Table r8: Quantitative comparisons with latest methods

[1]Rajagopalan S, Patel V M. AWRaCLe: All-weather image restoration using visual in-context learning[C]. AAAI, 2025.

[2]Li B, Zhao H, Wang W, et al. Mair: A locality-and continuity-preserving mamba for image restoration[C]. CVPR, 2025.

[3]Deng X, Zhang C, Jiang L, et al. DeepSN-Net: Deep semi-smooth Newton driven network for blind image restoration[J]. TPAMI, 2025.

W3: Counter-intuitive phenomenon between performance and sampling steps

Thanks for your insightful observation. We agree that the non-monotonic performance trend in Tab. 4 appears unintuitive at first glance. This counter-intuitive phenomenon may stem from two main factors. (1) Our approach to image restoration is motivated by the principle of commonality, aiming to handle diverse degradation types within a unified framework. In Appendix E.2, we discuss the model's behavior under compound degradation scenarios. It can be observed that across multiple sampling steps, the model tends to first remove the primary degradation before addressing secondary ones. Consequently, in cases where samples from the deraining or deblurring datasets also suffer from additional degradations (e.g., low-light conditions), the restored output may diverge from the available reference that typically reflects only the removal of the primary degradation, leading to reduced evaluation metrics despite better perceptual quality. (2) The stochasticity in both the forward and reverse processes, combined with network estimated errors, can cause slight influence on performance within an acceptable range.

W4: Datasets setting

Our task setting is closely aligned with that of DiffUIR. However, since each method adopts different dataset configurations for image restoration, we attempt to collect and standardize all the datasets used by these methods for fairness, enabling direct and meaningful comparisons. Besides, we strictly follow their open-source code and retrain all of them except autodir.

W5: SDEs motivation

The motivation behind adopting forward SDEs is that SDEs offer a broader perspective for analyzing the transition of probability distributions in a continuous manner. As discussed in Appendices A and B, both the perturbed process and deterministic implicit sampling can be viewed as first-order instances of the reverse-time SDE and ODE, respectively. Therefore, defining the diffusion process from the perspective of probability distribution in Eq. (4) is fundamentally equivalent to describing it through continuous SDEs in Eq. (5). Their performance comparisons can be found in Tab. 2, where the results for k=1k=1 without UPS correspond to deterministic implicit sampling, while our high-order solvers demonstrate improved performance. Besides, SDEs formulation offers higher scalability, enabling the integration of our high-order solver and UPS, which are tightly coupled with the dynamics of the SDEs.

W6,7,8: Figure and table errors

Thank you for your careful observations regarding inconsistencies in notation, figures, and formatting. We will correct the inconsistency in the description of Figure 2, as well as the misaligned highlighted regions in Figure 5. In addition, Tables 2, 3, and 4 will be revised to include clearly marked highlights for improved readability. We will thoroughly check and address all similar issues in the revision for clarity and accuracy.

Q1: Generalization on other methods

Thank you for the thoughtful question. Within the domain of image restoration, UPS can indeed be generalized to related frameworks such as DiffUIR. We apply UPS to DiffUIR, which yields performance gains, as shown in Tab. r2 in Reviewer (9bRU-W3). Broadly speaking, UPS depends on residual information from given distribution to provide gradient guidance for diffusion reverse process. It can be seamlessly extended to other tasks wherein both prior and target data distributions are accessible, such as image translation. Therefore, we believe UPS exhibits strong scalability and generalizability. We leave a more comprehensive exploration of UPS in other vision tasks as promising future work.

Q2: Ablating sampling strategies on performance

In Appendix B.2, we theoretically show that cumulative error is positively correlated with the sampling interval Δt\Delta t . For a k-order solver, naive sampling from time ss to tt requires interpolating k1k−1 intermediate points, whereas queue-based sampling leverages precomputed points cached in a queue. As a result, the effective sampling interval in the queue-based strategy is larger than that of naive sampling, which may lead to higher theoretical error. To validate this, we conduct experiments, and the results are shown in Tab. r9. Performance of queue-based sampling is slightly lower than naive sampling. However, queue-based approach significantly reduces neural function evaluations (NFEs), achieving 2–3× higher efficiency than naive sampling (please refer to Tab. r1 in Reviewer (9bRU-W1) ).

SamplingOrderUPSDerainingEnhancementDesnowingDehazingDeblurringAverage
---PSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIM
Naivek=2×31.330.89423.100.79932.520.95631.640.94830.090.89831.220.919
Queuek=2×31.310.89423.080.79832.510.95531.620.94830.070.89731.200.919
Naivek=231.470.89723.850.80232.700.95531.690.94630.170.89931.370.920
Queuek=231.460.89623.840.80132.690.95531.680.94630.150.89931.360.920
Naivek=3×31.350.89523.120.80032.550.95631.650.94830.110.89831.240.920
Queuek=3×31.330.89423.130.79932.520.95531.710.94830.090.89831.230.919
Naivek=331.500.89723.870.80332.720.95531.700.94630.190.90031.390.920
Queuek=331.450.89623.840.80132.690.95531.670.94630.150.89931.360.920

Table r9: Performance for differnent kk with different strategies  

Q3: Solver order and inference efficiency

We use neural function evaluations (NFEs) to measure inference efficiency. Let kk denote the solver order and nn be sampling steps. For a kk-order solver, naive sampling from time ss to tt requires interpolating k1k-1 intermediate points within the interval, resulting in kk NFEs per step. Consequently, the overall computational complexity for nn steps is O(nk)O(nk). In contrast, the queue-based sampling strategy precomputes the values at k1k−1 time points and caches them for reuse in subsequent steps, offering high efficiency by reducing total complexity to O(k1+n)O(k−1+n). Therefore, Both naive solvers and queue-based solvers have the same time complexity O(k)O(k) only when n=1n=1. When n>1n > 1, the naive solvers are increasingly less efficient than queue-based solvers because nkk1+nnk \ge k-1+n. Under the queue-based sampling strategy, the difference in NFEs among different solvers equals the difference in their orders. As a result, their efficiency gap scales linearly. For quantitative results, please refer to Tab. r1 in Reviewer (9bRU-W1) and Tab. r3 in Reviewer (1zc3-W2&Q3).

评论

I would like to thank the authors for their comprehensive rebuttal. I will retain my original recommendation.

评论

We sincerely appreciate your constructive feedback and acceptance recommendation. Your insights have greatly strengthened our work. Best regards.

审稿意见
4

The paper introduces DGSolver, a novel diffusion generalist solver with universal posterior sampling for image restoration. It addresses the limitations of existing diffusion models in universal image restoration, which suffer from cumulative errors in reverse inference and the trade-off between degradation representation commonality and restoration quality. The authors derive exact ODEs solution for diffusion generalist models and develop high-order solvers with a queue-based accelerated sampling strategy to enhance accuracy and efficiency. Additionally, they integrate universal posterior sampling to improve noise estimation and correct errors in inverse inference. Extensive experiments demonstrate that DGSolver outperforms state-of-the-art methods in restoration accuracy, stability, and scalability across various tasks and datasets.

优缺点分析

Strengths:

  1. The reformulation of the generalist diffusion process using ODEs and the development of customized high-order solvers with a queue-based accelerated sampling strategy are novel contributions to the field of image restoration.
  2. The integration of universal posterior sampling provides a versatile and effective way to enhance the accuracy of noise estimation and correct errors, leading to improved restoration quality.
  3. The proposed method is validated through experiments on multiple image restoration tasks, such as deraining, low-light enhancement, etc., and shows superior performance over existing methods.

Weaknesses:

  1. The reformulation of the generalist diffusion process using ODEs and the development of customized high-order solvers have been extensively explored in image generation, and their application to image restoration is a natural extension.
  2. The paper proposes a queue-based accelerated sampling strategy. However, its effectiveness might be limited in few-step sampling scenarios, and the paper lacks ablation studies to verify its performance in such cases.
  3. While δˉT\bar{\delta}_T controls degradation commonality, the paper does not systematically explore its effect on specific degradation types.
  4. Although zero-shot results on real-world datasets are presented, the evaluation lacks quantitative metrics for unpaired real data, relying solely on visual comparisons

问题

  1. What challenges or opportunities arise from applying the generalist diffusion process reformulated with ODEs and customized high-order solvers, which are common in image generation, to the field of image restoration?
  2. Given that k=2 with UPS already achieves near-optimal performance, what is the practical benefit of higher-order solvers (k=3) in terms of restoration quality versus computational cost? Can you provide a trade-off analysis (e.g., FLOPs vs. PSNR) for different k?
  3. The paper mentions the use of a queue-based accelerated sampling strategy. Could this method remain effective when using fewer sampling steps? Typically, high-order solvers are designed to accelerate sampling with fewer steps.
  4. The optimal δˉT=1\bar{\delta}_T=1 is used for complex degradations, but does this generalize to all tasks? For example, does deraining benefit from a different δˉT\bar{\delta}_T than deblurring?

局限性

yes

最终评判理由

The authors' response during the rebuttal phase has addressed my concerns.

格式问题

no

作者回复

W1&Q1: Challenges and opportunities arise from image restoration ODEs solvers

To the best of our knowledge, in the field of image generation, high-order solvers are primarily employed to accelerate the sampling process from random noise to image samples, with the only requirement being that the generated samples conform to the target distribution. There is no strict constraint enforcing a one-to-one correspondence between samples from the prior and those from the data distribution. Consequently, various solvers can be employed to approximate the data distribution without explicit constraints, and the potential of high-order terms remains largely underexplored due to the weak coupling between the prior and data. In contrast, image restoration involves a strong dependency between the prior (i.e., the degraded image) and the target data (i.e., the high-quality image). Therefore, we argue that the key challenge and opportunity for designing solvers in image restoration lies in effectively exploiting the relationship between these two distributions and embedding this relationship throughout the diffusion process. To this end, we propose DGSolver. Our key insights are: (1) We fully incorporate the residuals derived from both the prior and data distributions into the high-order solver. (2) The potential of residuals in designed solvers is further exploited through a Universal Posterior Sampling (UPS), which reinforces the connection between the prior and data distributions, thereby enhancing restoration performance, inference stability, and sample fidelity.

W2&Q3: Disscusion about queue-based sampling

High-order solvers can reduce the sampling steps, but they do not necessarily reduce the number of neural function evaluations (NFEs) in few-step sampling scenarios. NFEs directly measure the number of neural network calls with respect to total runtime. For a kk-order solver, naive sampling from time ss to tt requires interpolating k1k-1 intermediate points within the interval, resulting in kk NFEs per step. Consequently, the overall computational complexity for nn steps is O(nk)O(nk). In contrast, the queue-based sampling strategy precomputes the values at k1k−1 time points and caches them for reuse in subsequent steps, offering high efficiency by reducing total complexity to O(k1+n)O(k−1+n). Therefore, only when n=1n=1 do both naive solvers and queue-based solvers have the same time complexity O(k)O(k). When n>1n > 1, the naive solvers are increasingly less efficient than queue-based solvers for nkk1+nnk \ge k-1+n.

To verify that, we report runtime comparisons in Tab. r3 with a fixed image size of 512 and UPS deactivated. Apparently, queue-based strategy is more efficient than naive sampling. Additionally, as shown in Tab. r1 in Reviewer (9bRU-W1), enabling UPS further increases the computational cost, making the efficiency gap between the two strategies even more pronounced. Moreover, Tab. r9 in Reviewer (iqQ1-Q2) presents a performance comparison between the queue-based and naive strategies. Queue-based solvers possess a 2–3× speed advantage with comparable restoration quality to naive solvers.

In conclusion, queue-based accelerated sampling still remains effective when using fewer sampling steps  

Steps12345678910
NFETime(s)NFETime(s)NFETime(s)NFETime(s)NFETime(s)NFETime(s)NFETime(s)NFETime(s)NFETime(s)NFETime(s)
(k=1)10.06320.12930.19440.26350.32260.40170.46680.52990.596100.661
Naive(k=2)20.12040.24760.39380.523100.644120.769140.910160.989181.147201.251
Queue(k=2)20.12730.19140.26250.32060.39970.47680.53390.598100.659110.728
Naive(k=3)30.20160.40190.578120.769150.977181.169211.322241.446271.585301.814
Queue(k=3)30.19740.25850.31860.41070.47980.52890.594100.657110.719120.806

Table r3: Time efficiency of differnent strategies

W3&Q4: Discussion about δT\delta_T

We appreciate your constructive suggestion. We respectfully clarify that performance comparisons under different values of δT\overline{\delta}_T are presented in Tab. 3 (Page 8). As observed, the choice of δT\overline{\delta}_T significantly influences restoration performance across different tasks, with higher values generally being more suitable. Specifically, the model achieves the best performance on deraining, low-light enhancement, and desnowing tasks with δT=1.0\overline{\delta}_T = 1.0; for dehazing, the optimal value is δT=0.9\overline{\delta}_T = 0.9; and for deblurring, δT=0.75\overline{\delta}_T = 0.75 yields the best results. Nevertheless, considering overall performance across tasks, δT=1.0\overline{\delta}_T = 1.0 emerges as the most favorable setting within our framework.

W4: Real world quantitative evaluation

To systematically evaluate the zero-shot capability of each method on real-world datasets (Appendix E, Tab. A1), we adopt PSNR for paired data and NIQE [1] for all data to assess perceptual quality, as reported in Tab. r4. Our method consistently achieves superior PSNR across various tasks. However, the non-reference metric (NIQE) exhibits patterns that diverge from the objective PSNR trend, where our performance is at a moderate level. This observation is consistent with prior studies [2,3], which highlight the inherent instability of NIQE. NIQE is highly sensitive to content variations and often fails to align with human perception, particularly under complex degradations. In conclusion, our method still remains highly competitive, as evidenced by both quantitative metrics and visual results (Appendix E, Fig. A6).

MethodDerainingEnhancementDesnowingDehazingDeblurringAverage
PSNRNIQEPSNRNIQEPSNRNIQEPSNRNIQEPSNRNIQEPSNRNIQE
Restomer21.1947.45-13.66-14.1310.4612.3020.1911.0020.6030.34
AirNet19.5750.49-13.63-14.3410.2112.2013.5514.1317.6732.54
Prompt-IR21.3851.34-12.98-14.2910.4613.0624.8312.5621.9832.66
ProRes16.2645.57-13.12-15.9910.5413.0821.3812.1917.4730.12
IDR21.0649.09-13.09-13.8210.2813.0222.2811.8121.0631.25
AutoDIR19.6054.68-13.02-14.4710.4115.3825.4312.7120.8934.48
DA-CLIP20.9351.90-12.76-14.4510.4014.1122.2512.5220.9632.99
DiffUIR21.3851.85-12.57-14.348.5712.7924.7712.2121.9132.85
Ours21.4953.23-12.63-14.2810.6113.2525.6712.3422.2933.57

Table r4: Evaluation on Real-world image restoration tasks

[1] Mittal A, Soundararajan R, Bovik A C. Making a “completely blind” image quality analyzer[J]. IEEE Signal processing letters, 2012.

[2] Liu Y, Zhao H, Gu J, et al. Evaluating the generalization ability of super-resolution networks[J]. TPAMI, 2023.

[3] Zhang L, Zhang L, Bovik A C. A feature-enriched completely blind image quality evaluator[J]. TIP, 2015.

Q2: Disscusion about solver order

The performance and computational cost of different solvers are summarized in Tab. r5. Notably, UPS increases computational cost by ~2× due to additional backward propagation[1]. In our configuration, the solver with k=2k=2 and UPS yields the best performance and is thus used as the default setting in the paper. The case of k=3k=3 demonstrates the solver’s scalability, although performance reaches a saturation point within our framework. Nevertheless, when generalized to related frameworks such as DiffUIR, both k=3k=3 and UPS bring additional improvements, as shown in Tab. r2 in Reviewer (9bRU-W3). In terms of efficiency, due to the queued sampling strategy yielding neural function evaluations (NFEs) of n+k1n+k−1 (as proved in W2&Q3), the computational difference between solvers of different orders is determined solely by the order gap. Consequently, FLOPs increase approximately linearly with kk. In conclusion, the solver complexity grows linearly with order, while performance varies depending on the target framework.

OrderUPSDerainingEnhancementDesnowingDehazingDeblurringAverageNFEsForward FLops(G) per NFEBackward FLOPs (G) per NFETotalFlops(G)
PSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIM
k=1×30.690.88623.040.79532.060.95231.470.94729.330.88730.700.913832.93-263.50
k=2×31.310.89423.080.79832.510.95531.620.94830.070.89731.200.919932.93-296.44
k=3×31.330.89423.130.79932.520.95531.710.94830.090.89831.230.9191032.93-329.37
k=131.100.89223.070.79832.320.95431.540.94729.800.89531.010.918832.93≈65.86≈790.32
k=231.460.89623.840.80132.690.95531.680.94630.150.89931.360.920932.93≈65.86≈889.11
k=331.460.89623.840.80132.690.95531.680.94630.150.89931.360.9201032.93≈65.86≈987.91

Table r5: FLOPs vs. PSNR for differnent kk with queue-based sampling strategy (Steps=8)

  [1] Hobbhahn M, Sevilla J. What’s the backward-forward flop ratio for neural networks?[J]. Published online at epochai. org, 2021.

评论

Thank you for your reply, but about q1 I still have two concerns:

  1. The authors' motivation to leverage the strong dependency between the degraded prior and the target image in restoration tasks is well-founded. However, the central claims of novelty appear to overlook significant and highly relevant prior art, most notably ResShift (Yue et al., 2023).

  2. The contribution of a "high-order solver" is also weakly positioned. This is an active research area. For instance, DoSSR (Cui et al., 2024) also proposes a "domain shift" approach with custom solvers to enhance efficiency. Furthermore, dedicated high-order solvers like DPM-Solver and GENIE are well-established methods for accelerating diffusion models.

To be considered, the paper must rigorously differentiate its contributions from these foundational works (ResShift, DoSSR) and the broader field of high-order diffusion solvers. If you can address these two concerns of mine, I will consider raising the score.

评论

We appreciate your thoughtful feedback and would like to address the two points raised regarding prior art and the novelty of our proposed "high-order solver."

Regarding prior art, particularly ResShift (Yue et al., 2023) and DoSSR (Cui et al., 2024): In image restoration, low- and high-quality images distributions serve as critical priors, which may be inevitably incorporated into diffusion process. Broadly speaking, these diffusion processes are governed by a probability distribution of the form:

p(ItI0,Iin)N(wt1I0+wt2Iin,(wt3)2I)=wt1I0+wt2Iin+wt3ϵp(I_t\vert I_0,I_{in}) \sim N(w^1_t I_0 + w^2_t I_{in}, (w^3_t)^2 \boldsymbol{I}) = w^1_t I_0 + w^2_t I_{in} + w^3_t \epsilon

In Tab. d1, we summarize the forward processes of ResShift, DoSSR, and our DGSolver. ResShift modifies the standard diffusion process by using a Markov chain to shift the residual between LR and HR images. Building upon this, DoSSR retains the default diffusion settings (e.g., noise schedule) akin to DDPM(Ho et al., 2020) to fully utilize the pre-trained diffusion prior, and integrates DPMSolver framework to obtain high-order solvers. However, the key distinctions between our work and related methods are:

(1) General diffusion framework for image restoration: In the forward process, we decouple the coefficients of each term (i.e., wti,i=1,2,3w_t^i,i=1,2,3) into independent variables, enabling the application of distinct noise schedules to each. This design enhances the flexibility and generalizability of our forward process. By employing different noise schedule settings, our framework can yield different solvers, whereas DoSSR is limited to specific schedules and can be regarded as a special case of DGSolver. Furthermore, we introduce a queue-based sampling strategy, which acts as a plug-and-play strategy to enhance solvers efficiency, potentially benefiting the performance of DoSSR and other solvers.

(2) Reuse the solver component for Universal Posterior Sampling (UPS): Our approach not only integrates residual component into the forward process, but also reuses it during the reverse inference process to implement UPS. This ensures the stability and robustness of the solver, marking a key contribution that differentiates our work from previous methods.

In summary, though the diffusion components are similar to those in ResShift and DoSSR, and may be inevitably applied in future work, our SDE solver stands out by integrating the residual component into both the forward and reverse SDEs, and embedding it into the reverse process for UPS with a queue-based sampling strategy, which are generalizable and efficient. Critically, our framework subsumes these methods as special cases under different noise schedule configurations.

Regarding the high-order solver and its position in the broader research field (e.g., DPMSolver (Lu et al., 2022) , GENIE (Dockhorn et al., 2022)): We agree that solvers explored in DPM-Solver and GENIE are key advances in the field. Both our work and these methods fundamentally improve solver accuracy using Taylor expansions at different orders. Apart from that, we incorporate further innovations:

(1) Queue-based accelerated sampling: DPMSolver employs a naive sampling strategy, while GENIE accelerates sampling using Gradient Distillation, which incurs additional computational overhead. In contrast, we introduce a queue-based sampling strategy, serving as a plug-and-play option to enhance efficiency for different solvers.

(2) Utilize solver component for Universal Posterior Sampling (UPS): In addition to applying Taylor expansions to the solver component for more accurate approximations, residual component of solvers is further explored and utilized for UPS. This enables us to enhance the quality of restored images by reinforcing the coupling between prior and target data distributions, which is particularly critical for restoration tasks.

In summary, though these solvers share a similar core principle, our method stands out by highly efficient sampling strategies and the full exploitation of solver components to enhance both efficiency and performance.

We sincerely appreciate your constructive feedback, which provides us with valuable insights and significantly enhances the quality and depth of our manuscript. These valuable insights enable us to make clearer comparisons between different methods, allowing us to better position our work within the existing literature. Besides, it helps present a comprehensive overview of the field to the readers. In the revision, we will properly cite and discuss these relevant works, and include a detailed comparative analysis in the appendix.

Methodwt1w^1_twt2w^2_t(wt3)2(w^3_t)^2
DDPM(Ho et al. 2020)αt\alpha_t01αt21-\alpha_t^2
Resshift1ηt1-\eta_tηt\eta_tκηt2\kappa\eta_t^2
DoSSRαt(1ηt)\alpha_t (1-\eta_t)αtηt\alpha_t \eta_t1αt21-\alpha_t^2
Ours1αt1-\alpha_t^*(αtδt)(\alpha_t^* -\delta_t^* )(βt)2(\beta_t^*)^2

*Table d1: Formulation of different diffusion process for image restoration. *

评论

Thank you for the authors' substantial efforts in the rebuttal. It has addressed my concerns, and I will be raising my score from 3 to 4.

评论

Thank you very much for your insightful and constructive comments throughout the review process. Your feedback has significantly improved the quality of our manuscript, and we deeply appreciate the time and expertise you dedicated to this work.

审稿意见
4

The paper proposes DGSolver, a training-free diffusion-based solver for image restoration that unifies a custom reverse-time ODE, high-order solvers, and universal posterior correction. It introduces a semi-linear forward process using a residual-based degradation model and derives tailored solvers to efficiently recover clean images.

优缺点分析

Strengths:

  1. The use of high-order solvers tailored to the semi-linear structure of the reverse process introduces a novel and principled way to reduce the number of sampling steps in diffusion-based restoration.
  2. The solver design is clearly presented, with a queue-based implementation that enables efficient computation of higher-order derivatives without redundant evaluations.
  3. Evaluation is very thorough and shows improvement over comparison methods.
  4. Ablation studies are well-designed and support the impact of key components such as solver order and universal posterior correction (UPC).

Weaknesses:

  1. The main ideas are valuable but somewhat difficult to follow due to dense presentation and unconventional notation. A more streamlined explanation with clearer separation of key components would improve accessibility.
  2. The forward process relies on the definition Ires=IinI0I_{res}=I_{in}-I_0, which implies additive degradations in aligned domains. While appropriate for denoising-like tasks, this formulation does not generalize to inverse problems such as super-resolution, MRI, or operator-unknown deblurring, where the degradation is either non-additive or defined in a different domain. Given that IresI_{res} plays a central role in both the forward and reverse processes, this assumption limits the applicability of DGSolver to a narrow set of problems, despite its claim to universality.

Minor comments:

  • The blue zoomed-in insets in Figure 5 appear inconsistent as the regions shown for the rainy input and ground truth differ from those in the restored results.
  • Line 94, “…computationally intractability...” -> computationally intractable
  • Line 111, “when” -> When

问题

  1. Section 3.3 appears heavily influenced by Diffusion Posterior Sampling (DPS), yet the original paper is not cited or discussed in this section. Can the authors clarify what is novel in their Universal Posterior Sampling (UPS) approach beyond the omission of an explicit measurement operator? Additionally, why is the original DPS work not acknowledged in the relevant discussion?
  2. The results in Table 4 feel a bit unintuitive to me. It’s not clear why using exactly 8 steps leads to better performance than 9 or 10. I would normally expect performance to improve as the number of sampling steps increases. Is there a natural explanation for this behavior?
  3. Can the authors comment on sample variety?

局限性

Yes, the authors have addressed the limitations in the Appendix F.

最终评判理由

The authors have satisfactorily addressed my concerns, and I find the paper to offer a meaningful and well-supported contribution that warrants acceptance.

格式问题

None.

作者回复

W1: Steamlined explanation of our key components

We thank the reviewer for pointing out the issue with presentation clarity. We will add a more streamlined method description for readability in Sec. 3. Simplified contents can be concluded as: The overview of our DGSolver is illustrated in Fig. 2. In the forward process, diffusion generalist SDEs map degradations into a shared, degradation-agnostic latent space. In the reverse process, our DGSolver integrates diffusion generalist solvers and universal posterior sampling to jointly enhance inference accuracy and stability, with respect to red and blue trajectories in Fig. 2. The former component reduces cumulative error by solving a semi-linear ODE, thereby guiding the solution toward the ground truth; the latter component further optimizes the solutions through gradient guidance along the learned data manifold."

W2: Residual Formulation and Degradation

From a degradation modeling perspective, residual-based and kernel-based approaches emphasize additive degradation restoration and inverse problem solving, respectively, with each carrying inherent limitations. However, we respectfully clarify that our use of residual modeling is not tied to a strict pixel-wise alignment assumption, but instead rooted in a signal decomposition perspective. From this broader view, residual decomposition offers a more general and widely applicable framework, extensively adopted across various image restoration tasks, such as super-resolution, deblurring, and denoising [1,2,3]. In contrast, kernel estimation often suffers from ill-posedness and increased complexity, making the decomposition less stable and kernel estimation harder to solve. In our DGSolver, residual modeling is seamlessly embedded into the diffusion generalist SDEs, enabling the unified design of both high-order solvers and the UPS mechanism. As a result, UPS becomes a plug-and-play component for diffusion-based methods that adopt residual modeling, whereas kernel-based approaches still depend on explicit kernel priors or estimation. That said, we acknowledge the limited capacity of residual modeling to generalize across all inverse problems. For instance, as shown in Tab. 2, UPS yields more substantial improvements in deraining (i.e., 0.15 dB (k=2), 0.12 dB (k=3)) than in deblurring (i.e., 0.07 dB (k=2), 0.06dB (k=3)). We believe that kernel-based modeling may perform well in specific inverse problems, but its applicability to broader restoration scenarios remains somewhat constrained, especially in the compound restoration tasks.

In a broad sense, our residual-based modeling exhibits the potential to be applied beyond the realm of image restoration. It can be seamlessly extended to other tasks wherein both prior and target data distributions are accessible, such as image translation.

[1] Zhang Y, Li K, Li K, et al. Residual non-local attention networks for image restoration[J]. ICLR, 2019.

[2] Liang J, Cao J, Sun G, et al. Swinir: Image restoration using swin transformer[C]. CVPR, 2021.

[3] Cui Y, Ren W, Cao X, et al. Revitalizing convolutional network for image restoration[J]. TPAMI, 2024.

W3: Typos and figure errors

We appreciate your attention to detail regarding typos and figure inconsistencies. We will revise "computationally intractability" to “…computationally intractable” in line 94 and capitalize “when” to “When” in line 111. Besides, we will ensure that all zoomed-in regions are spatially aligned across input, ground truth, and restored images in Fig. 5. All these issues will be thoroughly checked and addressed in the revision to ensure clarity and accuracy.

Q1: Disccusion about UPS and DPS

We thank the reviewer for pointing this out. We sincerely apologize for overlooking the discussion and connection to DPS in Section 3.3. We fully acknowledge and appreciate that UPS is partially inspired by DPS, and in fact, we have dedicated Section 2.2 to introducing the theoretical foundations and related methods of DPS. We will explicitly include a discussion in Section 3.3. The simplified addition is as follows: “DPS[1] circumvents the intractability of posterior sampling in diffusion models via a novel approximation, which is generally applicable to noisy inverse problems. Inspired by this, we propose universal posterior sampling from the perspective of residual modeling.

Both UPS and DPS aim to incorporate gradient guidance into the diffusion sampling process. By omitting the measurement operator, our residual modeling is generalizable across compound degradations. Beyond omission of an explicit measurement operator, we believe UPS can be a plug-and-play component for diffusion models in field of image restoration. Given that the prior distribution (e.g. degraded images) and data distribution (e.g. high-quality images) are both available, we believe future variants of diffusion model will increasingly integrate residual prior into their formulation. In this context, UPS can be seamlessly coupled into their reverse process. Moreover, UPS can be seamlessly extended to other tasks wherein both prior and target data distributions are accessible, such as image translation. In contrast, kernel-based models are often limited by their formulation. In future work, we plan to explore the adaptability of UPS in other tasks.

[1] Chung H, Kim J, Mccann M T, et al. Diffusion Posterior Sampling for General Noisy Inverse Problems[C]. ICLR, 2023.

Q2: Performance and sampling steps

Thanks for your insightful observation. We agree that the non-monotonic performance trend in Tab. 4 appears unintuitive at first glance. This counter-intuitive phenomenon may stem from two main factors. First, the stochasticity in both the forward and reverse processes, combined with network approximation errors, can cause slight performance fluctuations across neighboring sampling steps, where the variations are within an acceptable range. Second, since we project all degradations into a shared degradation-agnostic representation and train the model on mixed degradation types, the model implicitly learns to handle compound degradations, as discussed in Appendix E.2. Consequently, when the number of sampling steps increases, the model may begin to address secondary degradations present in the image. For example, some samples in the deraining or deblurring datasets, also exhibit low-light conditions. After removing the primary degradation, the model may perform low-light enhancements, where the available ground truth typically reflects only the removal of the primary degradation. This mismatch means that additional sampling steps do not necessarily bring the output closer to the reference, and may even result in a slight drop in evaluation metrics.

Q3: Sample variety

Thanks for raising the concerns about sample variety. In field of image restoration, the primary focus is on achieving high-fidelity and consistent restoration results, rather than promoting sample diversity. Our work is therefore motivated by fully leveraging diffusion models to recover a unique, high-quality target, as opposed to sampling from a distribution of plausible outputs. As a result, sample variety is expected to be low. However, we still consider that our sample variety may arise from two sources: (i) the randomness of initial states sampled from the Gaussian prior, and (ii) the model’s capacity to handle compound degradations. For (i), we conduct a thorough evaluation across 10 random seeds, reporting the average and variances of metrics in Tab. r6. The results indicate strong robustness, with highly consistent outcomes across seeds. Notably, the use of UPS further stabilizes the inference process and reduces variance. For (ii), we conduct generalization experiments on compound degradations in Appendix E.2. Since the initial states are degradation-agnostic representations, more sampling steps allow the model to progressively eliminate secondary degradations, resulting in diverse yet plausible restorations.

DGSolver(k=2)DerainingEnhancementDesnowingDehazingDeblurringAverage
UPSPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIM
x31.31 ±\pm 0.001180.894 ±\pm 0.00000423.08 ±\pm 0.001800.798 ±\pm 0.0000832.51 ±\pm 0.000680.955 ±\pm 0.0000131.62 ±\pm 0.006550.948 ±\pm 0.00001230.07 ±\pm 0.0002910.897 ±\pm 0.00001431.20 ±\pm 0.001180.919 ±\pm 0.000001
31.46 ±\pm 0.000080.896 ±\pm 0.00000223.84 ±\pm 0.000530.801 ±\pm 0.0000232.69 ±\pm 0.000110.955 ±\pm 0.0000131.68 ±\pm 0.001870.946 ±\pm 0.00000230.15 ±\pm 0.0001820.899 ±\pm 0.00000231.36 ±\pm 0.0001250.920 ±\pm 0.000001

Table r6: Performance of our DGSolver with different seeds

评论

I would like to thank the authors’ for their clear and well-structured rebuttal, which effectively addressed my concerns. After reading the other reviews as well as the clarifications and additional experiments provided, I believe the paper makes a meaningful contribution and merits acceptance. I will therefore retain my original recommendation.

评论

We are truly grateful for your insightful review and the generous support for our paper's acceptance. Your constructive and valuable feedback has greatly enhanced the quality of our work.

审稿意见
5

Overall, this work is a strong paper with both theoretical depth and clear empirical improvements over previous universal methods. The authors reformulate the restoration process as a semi-linear ODE that can be solved analytically, and they derive a closed form integral solution for the ODE. This derivation uses an inverse transformation for the nonlinear parts, assuming those parts are monotonic and invertible functions. They expand these nonlinear integrals with Taylor series and retain terms up to a chosen order k, and obtain a family of k-th order update formulas. Authors show that earlier diffusion restoration methods can be considered as the first order case of their solver. Overall, the proofs around proposition 1 are solid and novel in the context of diffusion restoration.

Next, authors provide a theoretical upper bound for the approximation error (Jensen's gap where the outer expectation over the posterior is replaced by inner expectations with respect to the data distribution). They show the error depends on the variance of the measurement noise and the accuracy of the model’s predictions, guaranteeing that the error is controlled (for linear inverse problems with Gaussian noise) in Theorem 1.

Authors' claims are well supported by the empirical results, showing strong SoTA performance among universal methods and on par performance compared to task specific methods.

优缺点分析

Strengths

  1. The paper proposes the novel idea to combine exact ODE solution + universal posterior sampling, generalizing prior diffusion restoration methods.
  2. Theoretical derivations are clear.
  3. Empirically, they show strong SOTA results across tasks and datasets.

Weaknesses:

  1. Authors don't provide FPS/runtime throughput data on full-resolution images. Also, they don't provide data on actual inference time or memory usage.

问题

  1. Equation (4) has a typo; q0tq_{0t} should be q0q_{0}?

  2. Equation (21), is this l2l_{2} norm 2∥⋅∥^2 ?

  3. Could you show a direct ablation against another method like DiffUIR using their own solver vs. DGSolver to isolate the solvers contribution?

局限性

Theorem 1 provides a rigorous bound for the posterior approximation error only for linear measurement models, although many degradations ops are inherently nonlinear and hence not covered.

最终评判理由

Authors have addressed my concerns and I continue to support acceptance of this work.

格式问题

NA

作者回复

W1: Efficiency comparisons among universal methods

We appreciate the reviewer’s concern regarding inference efficiency. For fairness, we collect and mix the datasets used by comparison methods, with image resolution ranging from 256 to 1024 pixels. Accordingly, we evaluate model efficiency under three representative resolution settings, as summarized in Tab. r1. Let kk denote the solver order and nn be sampling steps. Obviously, our method (n = 3) and baseline DiffUIR remain competitive in computational cost and efficiency. When increasing nn, memory consumption remains stable while time cost grows proportionally. Activating UPS that requires gradients backpropagation introduces per-step computational overhead and additional memory usage. To alleviate these issues, we employ a queue-based sampling strategy that significantly improves efficiency by reducing the computational complexity. Specifically, we use the number of neural function evaluations (NFEs) to evaluate the complexity. For a kk-order solver, naive sampling from time ss to tt requires interpolating k1k-1 intermediate points within the interval, resulting in kk NFEs per step. Consequently, the overall computational complexity for nn steps is O(nk)O(nk). In contrast, the queue-based sampling strategy precomputes the values at k1k−1 time points and caches them for reuse in subsequent steps, offering high efficiency by reducing total complexity to O(k+n1)O(k+n−1). Tab. r1 also demonstrates that the queue-based solver achieves approximately a 2× efficiency improvement over naive solvers when k=2k=2, and around a 3× improvement when k=3k=3.

Size256x256512x5121024x1024
MethodMem.(G)Time(s)FPSMem.(G)Time(s)FPSMem.(G)Time(s)FPS
Restomer1.9590.1059.5636.6700.3812.62225.4191.7730.564
AirNet1.0390.1945.1593.4800.7381.35511.24420.4990.049
PromptIR2.5440.1118.9817.2550.3992.50826.0051.8450.542
ProRes2.0270.3183.1492.5140.7661.3056.0251.7150.583
IDR1.3400.05219.2534.3130.1367.37316.1100.6151.626
AutoDIR7.0236.2660.16011.02111.9860.083---
DA-CLIP2.1192.5850.3876.7757.9370.12658.54860.8930.016
DiffUIR(n=3)1.5630.1188.4503.5280.2064.86218.0600.9111.098
Ours-L(n=3)1.5610.1128.9083.5280.1995.01418.0590.9071.103
Naive(n=8)k=1×UPS
Ours-T0.7770.2773.6052.2910.3852.59415.3061.7050.587
Ours-S0.7870.2903.4502.3000.4012.49415.3161.7550.570
Ours-B0.9420.2913.4312.9070.4922.03317.4382.2800.439
Ours-L1.5620.2933.4073.5270.5291.88918.0582.4020.416
Naive(n=8)k=1√UPS
Ours-T1.7640.5001.9985.7930.8291.20632.5814.4540.225
Ours-S1.9050.5021.9936.2650.8651.15634.2374.5450.220
Ours-B2.7620.5201.9239.6451.0780.92841.3575.7700.173
Ours-L3.6130.5351.86810.5931.1440.87443.8156.0220.166
Naive(n=8)k=2×UPS
Ours-T0.7840.5351.8692.2970.7171.39515.3133.1940.313
Ours-S0.7940.5481.8242.3080.7471.33915.3233.2960.303
Ours-B0.9480.5551.8022.9130.9241.08217.4444.2790.234
Ours-L1.5630.5711.7513.5270.9891.01118.0594.5220.221
Naive(n=8)k=2√UPS
Ours-T1.7700.9741.0275.8001.5590.64132.5878.1500.123
Ours-S1.9330.9871.0137.5081.8710.53533.58610.4820.095
Ours-B2.8030.9901.0109.6882.0890.47941.39911.2760.089
Ours-L3.8291.0190.98110.6792.2190.45144.15411.7230.085
Queue(n=8)k=2×UPS
Ours-T0.7800.3033.3012.2940.4312.32115.3101.9050.524
Ours-S0.7910.3213.1152.3040.4602.17415.3201.9800.505
Ours-B0.9460.3233.0962.9110.5471.82317.4422.5730.388
Ours-L1.5630.3243.0863.5270.5981.67318.0592.6940.371
Queue(n=8)k=2√UPS
Ours-T1.7710.5571.7955.8000.9251.08132.5875.0120.200
Ours-S1.9200.5611.7826.2790.9611.04134.2505.1010.196
Ours-B2.8920.5901.6959.6771.2010.83341.3886.4720.155
Ours-L3.8160.6001.66710.6011.2730.78644.0746.7410.148
Naive(n=8)k=3×UPS
Ours-T0.7900.7591.3172.3031.0530.95015.3194.6870.213
Ours-S0.7990.7751.2902.3121.0990.91015.3284.8360.207
Ours-B0.9540.7801.2832.9191.3470.74217.4506.2530.160
Ours-L1.5620.8011.2483.5271.4460.69218.0586.5860.152
Naive(n=8)k=3√UPS
Ours-T1.7821.4140.7075.8122.3380.42832.59912.3840.081
Ours-S1.9581.4280.7006.3182.4740.40434.28813.2380.076
Ours-B2.9091.4330.6989.6943.1000.32341.40516.7550.060
Ours-L4.0481.4960.66811.0503.3020.30348.27317.4610.057
Queue(n=8)k=3×UPS
Ours-T0.7770.3382.9592.2900.4812.07715.3062.1320.469
Ours-S0.7870.3642.7442.3010.5021.99115.3162.1950.455
Ours-B0.9420.3662.7302.9070.6151.62617.4382.8450.351
Ours-L1.5620.3692.7103.5270.6571.52218.0582.9990.333
Queue(n=8)k=3√UPS
Ours-T1.7650.6241.6025.7941.0600.94332.5815.6640.177
Ours-S1.9070.6461.5476.2671.0900.91734.2385.7890.173
Ours-B2.7650.6611.5129.6481.3690.73041.3597.3390.136
Ours-L3.6130.6951.44010.5931.4700.68043.8147.6560.131

Table r1: Efficiency comparisons among universal methods. '-' means out of memmory.

Q1&Q2: Definitions about q0tq_{0t} and norm \|\cdot\|

We consider q0q_0 as the data distribution of I0I_0. In Eq. (4), q0t:=q(ItI0,Ires,Iin)q_{0t}:=q(I_t\vert I_0,I_{res},I_{in}) denotes a conditional probability distribution of ItI_t conditioned on I0,Ires,IinI_{0},I_{res},I_{in} for simplicity. In Eq. (21), \|\cdot\| is l2l_2-norm. We appreciate your comments and promise to refine the notations to avoid ambiguity in the revision.

Q3: Isolated contribution of our Solver

We appreciate your insightful suggestion. Our solver is built upon the prediction of residuals and noise, and is thus applicable to any diffusion-based models that adopt similar formulations. To validate its generality, we adapt our solver to DiffUIR. As shown in Tab. r2, DiffUIR consistently benefits from our solver across various configurations, leading to notable performance improvements.

DiffUIRn=3DerainingEnhancementDesnowingDehazingDeblurringAverage
OrderUPSPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIM
k=1x30.670.88721.210.76930.700.94330.290.94429.000.87729.930.907
k=2x30.760.88821.280.77130.820.94630.580.94929.040.87930.050.910
k=3x30.810.89021.300.77230.850.94830.610.95129.050.88030.080.911
k=130.720.88721.240.77030.810.94530.430.94629.020.87830.010.908
k=231.170.89921.520.78631.010.95130.760.95429.340.88530.320.917
k=331.210.90121.540.78731.080.95330.800.95529.350.88630.370.918

Table r2: Adapt our solvers to DiffUIR

评论

I appreciate authors' efforts in addressing the comments. This is a strong work and I continue to support its acceptance.

评论

Thank you for your supportive feedback and acceptance recommendation. We truly appreciate your time and valuable insights. Best wishes.

最终决定

The paper improves diffusion-based image restoration by proposing a training-free solver for diffusion models, treating them as ODE. They demonstrate enhanced restoration quality and efficiency across a wide range of degradations and benchmarks and compared to several existing baselines.

Five expert reviewers refereed the work and found the approach sound, well-motivated, and the empirical evidence generally convincing. On the other hand, the main concerns were regarding ablation studies, some similar prior work, experimental setup, clarity, and computational complexity.

The authors provided a thorough rebuttal which addressed all the main concerns and led to all reviewers eventually suggesting acceptance.

The AC does not see any major flaw with the paper and on the other hand finds the contribution novel with significant empirical evidence. Therefore, the AC suggests acceptance.