5.5

/10

Poster4 位审稿人

最低5最高6标准差0.5

3.5

置信度

正确性2.8

贡献度2.8

表达2.3

NeurIPS 2024

Resfusion: Denoising Diffusion Probabilistic Models for Image Restoration Based on Prior Residual Noise

Zhenning Shi,Haoshuai Zheng,Chen Xu,Changsheng Dong,Bin Pan,Xie xueshuo,Along He,Tao Li,Huazhu Fu

OpenReview PDF

提交: 2024-05-12更新: 2024-11-06

TL;DR

We propose Resfusion, a general framework that incorporates the residual term into the diffusion forward process, starting the reverse process directly from the noisy degraded images.

摘要

关键词

Diffusion based models; Image restoration; Shadow removal; Low-light enhancement; Deraining

评审与讨论

审稿意见

评分: 6置信度: 42024-07-05

The paper presents Resfusion that leverages prior residual noise to improve restoration performance. It introduces a smooth equivalence transformation for learning residual noise and demonstrates the efficacy of Resfusion through extensive experiments and ablation studies.

优点

The paper is well-written and the idea is clearly illustrated. The technique of finding T' to simplify Eq 6 is clever. The experiments are detailed and show non-trivial improvements.

缺点

Since T' is smaller than the usually used T=1000, the sampling process of the proposed method seems to be faster than previous methods. But there are no comparisons on inference speed.
What is the scheduling of \alpahs? The authors are suggested to add more experimental analysis about the error introduced by ignoring terms in Eq 6.

问题

See the weakness.

局限性

The authors have discussed the limitations clearly.

作者回复

2024-08-04

Thank you for the detailed review and thoughtful feedback. Below we address specific questions and comments.

Since T' is smaller than the usually used T=1000, the sampling process of the proposed method seems to be faster than previous methods. But there are no comparisons on inference speed.

We demonstrate the comparison of different methods' inference speed on the ISTD dataset, LOL dataset and Raindrop dataset. When testing inference speed, all images are resized to $256\times256$ , using single NVIDIA RTX A6000, with all configurations being identical to Appendix A.6:

Methods	PSNR $\uparrow$	SSIM $\uparrow$	Inference Time (s) $\downarrow$
ISTD Dataset
Shadow Diffusion [1]	32.33	0.969	0.024 $\times$ 25 = 0.600
Resfusion (ours)	31.81	0.965	0.027 $\times$ 5 = 0.135
LOL Dataset
LLFormer [2]	23.65	0.816	0.092 $\times$ 1 = 0.092
Resfusion (ours)	24.63	0.860	0.027 $\times$ 5 = 0.135
Raindrop Dataset
WeatherDiff $_{64}$ [3]	30.71	0.931	0.328 $\times$ 25 = 8.20
WeatherDiff $_{128}$ [3]	29.66	0.923	0.439 $\times$ 50 = 21.95
Resfusion (ours)	32.61	0.938	0.027 $\times$ 5 = 0.135

We will certainly add the comparison of inference speed into Appendix A.6 of the revised paper.

What is the scheduling of \alpahs?

For all experiments, we used the truncated version of the Linear Schedule from reference [4] as the noise schedule for $\beta_t$ (or $\alpha_t$ ), which we refer to as the Truncated Linear Schedule. We provide a detailed implementation of the Truncated Linear Schedule in Appendix A.7.

The authors are suggested to add more experimental analysis about the error introduced by ignoring terms in Eq 6.

The ground truth of $x_{T'}$ is formulated as Eq. (7):

${x}_{T'}= (2\sqrt{\overline\alpha_{T'}}-1) x_{0}+(1-\sqrt{\overline\alpha_{T'}})\hat{x}_{0}+\sqrt{1-\overline\alpha_{T'}}\epsilon$

And the estimated acceleration point $\hat{x}_{T'}$ is formulated as Eq. (9) or Eq. (15):

$\hat{x}_{T'}= \sqrt{\overline\alpha_{T'}}\hat{x}_{0}+\sqrt{1-\overline\alpha_{T'}}\epsilon$

Since,

$R = \hat{x}_{0} - x_{0}$

The absolute value of the error can be derived as:

$||x_{T'} - \hat{x}_{T'}||$ $=||(2\sqrt{\overline\alpha_{T'}}-1) x_{0} + (1-2\sqrt{\overline\alpha_{T'}})\hat{x}_{0}||$ $=||(1-2\sqrt{\overline\alpha_{T'}})R||$

As shown in Figure 3 (a) of the Author Rebuttal PDF, the error $||(1-2\sqrt{\overline\alpha_{T'}})R||$ exponentially decreases with the increase of T. When T is relatively small, this error is not negligible. Fortunately, we can eliminate this error through the technique of Truncated Schedule when T is small (when T is large, Truncated Schedule is actually consistent with Original Schedule).

The core idea is that the diffusion steps after the acceleration point are not involved in the actual diffusion process. We provide a detailed implementation of the Truncated Linear Schedule in Appendix A.7. We provide a detailed code implementation of the Truncated Linear Schedule in Author Rebuttal Section 3.

As shown in Figure 3 (b) of the Author Rebuttal PDF, when $T'/T=5/12$ , Truncated Schedule can effectively eliminate "residual shadows". The results of PSNR, SSIM, and LPIPS between Truncated Schedule and Original Schedule on ISTD dataset are provided:

Methods	PSNR $\uparrow$	SSIM $\uparrow$	LPIPS $\downarrow$
ISTD Dataset
Truncated Schedule	31.81	0.965	0.030
Original Schedule	29.41	0.964	0.036

We will certainly add the analysis of the error into the revised paper.

References

[1] Guo, Lanqing, et al. Shadowdiffusion: When degradation prior meets diffusion model for shadow removal. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.

[2] Wang, Tao, et al. Ultra-high-definition low-light image enhancement: A benchmark and transformer-based method. Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 37. No. 3. 2023.

[3] Özdenizci, Ozan, and Robert Legenstein. Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE Transactions on Pattern Analysis and Machine Intelligence 45.8 (2023): 10346-10357.

[4] Nichol, Alexander Quinn, and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. International conference on machine learning. PMLR, 2021.

审稿意见

评分: 5置信度: 42024-07-12

This paper presents a general diffusion framework for image restoration, named Resfusion. The main idea is to introduce a residual term to DDPM to directly generate clean images from degraded images. Moreover, the form of the inference process is consistent with the DDPM and allows very few sampling steps. The model is evaluated on ISTD, LOL, and Raindrop dataset. Several ablation experiments on residual terms and loss functions are also conducted. In discussion, the authors show that their method can be used for general image generation by set $\hat{x}_0$ to 0.

优点

The idea of adding a residual term to the diffusion process is interesting.
The proposed smooth equivalence transformation is promising.
The experimental results are good.

缺点

How do you get/design Eq. (3)? I wonder if the term $(1-\sqrt{\alpha_t})R$ is manually designed or derived from a specific equation.
In Eq. (6), $\beta_t$ is not defined in the context.
I can understand that Eq. (8) aims to ensure the coefficient of $x_0$ in Eq. (7) is close to 0. But how can you obtain Eq. (9) in which $\hat{x}_0$ is non-zero?
The overall derivation in Section 2.2 is unclear and confusing. Please make the connections between Equations more smooth. In Eq. (12), how do you get the variance $\Sigma_\theta$ ? And how do you obtain Eq. (13)? Please explain why and add the reference papers.
In experiments, it would be better to unify the evaluation metrics. For example, using PSNR, SSIM, LPIPS for all tasks and datasets (FID or MAE can be also added).
The presentation (especially the derivation) should be improved.

问题

The proposed method seems to be sensitive to noise. Can you train the model for the denoising tasks?

局限性

Please see the Weaknesses.

作者回复

2024-08-04

Thank you for the detailed review and thoughtful feedback. Below we address specific questions and comments.

How do you get/design Eq. (3)? I wonder if the term $(1-\sqrt{\alpha_t})R$ is manually designed or derived from a specific equation.

As shown in Figure 2 in the original paper, the resnoise-diffusion reverse process can be imagined as doing diffusion reverse process from $R+\epsilon$ to $x_{0}$ (as shown by the violet arrow).

$x_{t}$ can be represented as a weighted sum of $x_{0}$ and $R$ (in the forward process, we know the ground truth). At timestep $t$ , to maintain consistency with DDPM [1], the coefficient of $x_{0}$ is determined as $\sqrt{\overline\alpha_t}$ . Following the principles of similar triangles, the coefficient of $R$ at step $t$ is computed as $1-\sqrt{\overline{\alpha}_{t}}$ , and we can derive Eq. (5).

Through the reparameterization technique, we can derive Eq. (3) (or Eq. (4)) and Eq. (5) from each other.

In Eq. (6), $\beta_t$ is not defined in the context.

In Eq. (12), how do you get the variance $\Sigma_{\theta}$ ?

For all constant hyperparameters, our definitions are completely consistent with the DDPM [1].

We will certainly add a detailed definition of the constant hyperparameters in the revised paper.

We provide a detailed definition (which is also declared in Author Rebuttal Section 1):

$\beta_t = 1 - \alpha_t$
$\overline{\alpha}_{t} = \prod_{s=1}^{t}\alpha _s$
the $\Sigma_{\theta}$ is taken fixed as $\widetilde\beta_{t} = \frac{1-\overline{\alpha}_{t-1}}{1-\overline{\alpha}_{t}} \beta_t$

I can understand that Eq. (8) aims to ensure the coefficient of $x_{0}$ in Eq. (7) is close to 0. But how can you obtain Eq. (9) in which $\hat{x}_{0}$ is non-zero?

As defined in the first line of Section 2.1 in the original paper, $\hat{x}_{0}$ represents the input degraded image, which is obtainable during the resnoise-diffusion reverse process.

The overall derivation in Section 2.2 is unclear and confusing. Please make the connections between Equations more smooth.

We provide a detailed logical relationship for the equations in Section 2.2:

To obtain a computable starting point, we introduce the smooth equivalence transformation technique, corresponding to Eq. (7) - Eq. (9).
Because both the forward and backward processes only involve $t \le T'$ , we unify the training and inference processes, corresponding to Eq. (10) - Eq. (14).
We explained the working principle of Resfusion from the perspective of vector intersection, corresponding to Eq. (15).

We will certainly provide a more detailed logical relationship for the equations in Section 2.2 in the revised paper.

And how do you obtain Eq. (13)?

The derivation of $\mu_{\theta}$ corresponds to Eq. (19) - Eq. (23) in Appendix A.1. We provide a detailed explanation of the derivation process in Author Rebuttal Section 2.

We will certainly clarify the connection between Eq. (13) and Eq. (19) - Eq. (23) in the revised paper.

In experiments, it would be better to unify the evaluation metrics. For example, using PSNR, SSIM, LPIPS for all tasks and datasets.

We provide results in terms of PSNR, SSIM, and LPIPS for all tasks:

Methods	PSNR $\uparrow$	SSIM $\uparrow$	LPIPS $\downarrow$
ISTD Dataset
DMTN [2]	30.42	0.965	0.037
RDDM (SM-Res-N) [3]	30.91	0.962	0.031
Resfusion (ours)	31.81	0.965	0.030
LOL Dataset
Restormer [4]	22.37	0.816	0.141
LLFormer [5]	23.65	0.816	0.169
Resfusion (ours)	24.63	0.860	0.107
Raindrop Dataset
IDT [6]	31.87	0.931	0.058
WeatherDiff $_{64}$ [7]	30.71	0.931	0.060
Resfusion (ours)	32.61	0.938	0.061

The proposed method seems to be sensitive to noise. Can you train the model for the denoising tasks?

The LOL-v2-real [8] dataset includes visual degradations such as decreased visibility, intensive noise, and biased color. As shown in Figure 2 of the Author Rebuttal PDF, Compared to Histogram Equalization, Resfusion can significantly reduce noise, while also achieving a better color offset, demonstrating strong denoising capabilities. We provide results in terms of PSNR, SSIM, and LPIPS on LOL-v2-real dataset:

Methods	PSNR $\uparrow$	SSIM $\uparrow$	LPIPS $\downarrow$
LOL-v2-real Dataset
Restormer [4]	18.69	0.834	0.232
LLFormer [5]	20.06	0.792	0.211
Resfusion (ours)	22.06	0.839	0.175

We will certainly add experiments on the LOL-v2-real dataset into the revised paper.

References

[1] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.

[2] - [7] briefed due to word limit restrictions.

[8] Yang, Wenhan, et al. Sparse gradient regularized deep retinex network for robust low-light image enhancement. IEEE Transactions on Image Processing 30 (2021): 2072-2086.

2024-08-09

Thank you very much for the patient response and constructive suggestions. Below we address specific questions and comments.

the derivation in Section 2.2 is still unclear

We provide a detailed logical relationship for the equations in Section 2.2 in our rebuttal:

To obtain a computable starting point, we introduce the smooth equivalence transformation technique, corresponding to Eq. (7) - Eq. (9).
Because both the forward and backward processes only involve $t \le T'$ , we unify the training and inference processes, corresponding to Eq. (10) - Eq. (14).
We explained the working principle of Resfusion from the perspective of vector intersection, corresponding to Eq. (15).

there is no answer to how to obtain Eq. (9)

Since

$\sqrt{\overline{\alpha}_{T'}}$

is closed to 0.5 as in Eq. (8),

$2\sqrt{\overline{\alpha}_{T'}}-1$

is closed to zero in Eq (7).

Then we can derive Eq. (9) from Eq. (7), where $\hat{x}_{0}$ represents the input degraded image.

2024-08-11

Thank you for the discussion. Any other thoughts from other reviewers?

2024-08-13

Thanks. The second question is clear now. Please add these explanations (in all rebuttals) to the revised draft which would definitely improve the readability of your work. From my side, the overall paper is on the borderline so I will keep my original score.

2024-08-13

Thank you very much for the valuable feedback. We greatly appreciate your recognition of our work. We will certainly provide more detailed explanations in the revised paper.

2024-08-09

I appreciate the authors' efforts in the rebuttal. For the notations and definitions in the paper, I want to note that although you use the same notations from other papers (e.g., DDPM), you still need to define/clarify them again in your draft so that the readers can understand them correctly (otherwise it would be confusing). I believe the main idea of this paper is similar to ResShift and RDDM. However, in the rebuttal, the derivation in Section 2.2 is still unclear and there is no answer to how to obtain Eq. (9). Therefore, I choose to maintain my original score i.e. * Borderline accept*.

审稿意见

评分: 5置信度: 32024-07-13

This paper proposes to start the reverse diffusion process from the noisy degraded images for image restoration. It introduces a weighted residual noise as the prediction target and leverage a smooth equivalence transformation to find the starting noise. The experiments shows competitive performance on shalow removal, low-light enhancement and deraining, with shortened sampling steps.

优点

1, It predicts the residual noise to allow for diffusion directly from the noisy degraded images

2, It transforms the learning of the noise term into the renoise term and follows the same inference process as DDPM

3, Shortened inference steps without redesigning the noise schedule; sota results

缺点

1, ResShift also shifts toward the residual term. What are the advantages and disadvantages compared with ResShift?

2, Visualization of the five inference steps is lacking.

3, What is the results with the complete inference steps (e.g., T=1000)?

4, In discussion, what is the model size? Can a 7.7M model achieve such good results in Figure 7?

问题

See weakness.

局限性

See weakness.

作者回复

2024-08-03

Thank you for the detailed review and thoughtful feedback. Below we address specific questions and comments.

ResShift also shifts toward the residual term. What are the advantages and disadvantages compared with ResShift?

We provide the differences between Resfusion and Resshift:

Similar to RDDM [1], Resshift [2]'s forward process also adopts an accumulation strategy for the residual term and the noise term. Therefore, Resshift also requires the design of a complex noise schedule, which is formulated as equation (10) in reference [2]. Resfusion can directly use the existing noise schedule instead of redesigning the noise schedule.
The reverse process of Resshift is inconsistent with DDPM [3]. The form of Resfusion's reverse inference process is consistent with the DDPM, leading to better generalization and interpretability.
The prediction target of Resshift is $x_{0}$ , while the prediction target of Resfusion is $res\epsilon$ . Given that the essence of $res\epsilon$ is noise with an offset, and LDM models mainly predict noise, the loss function of Resfusion is extremely friendly to fine-tuning techniques such as Lora, which helps further scale up.
In terms of the forward process, Resshift only performs shifting on the residual term. Resfusion not only shifts the residual term but also degrades the ground truth $x_{0}$ , which ensures the image normalization.
Resshift diffuses in the latent space, utilizing the powerful encoding capability of models like VQ-GAN. Resfusion, on the other hand, directly diffuses in the RGB space.
Resshift only explores fixed degradations such as image super-resolution. Resfusion explores more complex degradations including shadow removal, low-light enhancement, and deraining.

We will certainly add the differences between Resfusion and Resshift into Appendix A.2 of the revised paper.

Visualization of the five inference steps is lacking.

We present the visualization results of the five sampling steps and use the pretrained model on the LOL dataset to directly infer images from the Internet (without ground truth). The visual results are stunning, as detailed in Figure 1 in the Author Rebuttal PDF.

We will certainly add the visualization in the revised paper.

What is the results with the complete inference steps (e.g., T=1000)?

We try $T'/T=100/272$ on the Raindrop dataset. All hyperparameters are consistent with the original paper. It is worth noting that increasing T will lead to an increase in training time (as the model needs to learn more scales of resnoise) and a decrease in inference speed. The results of PSNR, SSIM, and LPIPS are provided. Increasing T will result in a slight decrease in PSNR and SSIM (this may be due to the training set and test set not being completely i.i.d.), but it will yield better visual perception (LPIPS).

$T'/T$	PSNR $\uparrow$	SSIM $\uparrow$	LPIPS $\downarrow$
Resfusion (ours)
5/12	32.61	0.938	0.061
100/272	32.04	0.923	0.050

A branch out: Will reducing T in traditional diffusion-based models increase PSNR and SSIM?

We try WeatherDiff $_{64}$ [4] on the Raindrop dataset, the answer is definitely NO. The results of PSNR, SSIM, and LPIPS are provided.

$T$	PSNR $\uparrow$	SSIM $\uparrow$	LPIPS $\downarrow$
WeatherDiff $_{64}$ [4]
5	27.24	0.925	0.074
25	30.71	0.931	0.060

In discussion, what is the model size? Can a 7.7M model achieve such good results in Figure 7?

As mentioned in discussion and detailed in Appendix A.4, for image generation on the CIFAR10 ( $32\times32$ ) dataset, we utilize the same U-net structure as DDIM [5]. The parameter size of the denoising backbone is 35.72M.

References

[1] Liu, Jiawei, et al. Residual denoising diffusion models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.

[2] Yue, Zongsheng, Jianyi Wang, and Chen Change Loy. Resshift: Efficient diffusion model for image super-resolution by residual shifting. Advances in Neural Information Processing Systems, 2024.

[3] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.

[4] Özdenizci, Ozan, and Robert Legenstein. Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE Transactions on Pattern Analysis and Machine Intelligence 45.8 (2023): 10346-10357.

[5] Song, Jiaming, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.

2024-08-13

Thanks for your detailed reply. My concerns are addressed.

2024-08-13

Thank you very much for the recognition of our work and rebuttal. We are delighted to have addressed your concerns. We would greatly appreciate it if you could consider raising the score.

审稿意见

评分: 6置信度: 32024-07-19

This paper proposes a method that leverages generative diffusion for image restoration tasks. The authors suggest incorporating the residual term, defined as the difference between the corrupted and clean images, into the forward and reverse diffusion processes. The forward process is described by a Markov chain, where the probability of the image at each time step $t$ is conditioned on the previous step, $t - 1$ , and the residual term. This results in a forward where at each time step $t$ , the image obtained at the previous time step $t - 1$ is mixed with the residual term and an additional white Gaussian noise. Consequently, the image at each time step can be described as a weighted sum of three components: the clean image, the residual term, and an additive white Gaussian noise.

Since the residual term is the difference between the corrupted and clean images, this weighted sum can be rewritten as a weighted sum of the clean image, corrupted image, and Gaussian noise. The forward process stops when the weight of the clean image becomes approximately zero. Thus, the forward process gradually transfers a clean image into a weighted sum of the corrupted image and white Gaussian noise. Correspondingly, the reverse process starts from this weighted sum and incrementally reduces the corruption and noise. Because the corrupted image is available, the initialization for the reverse process is easily obtained by mixing the corrupted image with white Gaussian noise.

The authors derive the expression for the mean of the reverse process and train a network to predict, at each step, the difference between the corrupted noisy image obtained at that step and the clean image. The proposed algorithm evaluated thorough experiments on three reconstruction tasks: shadow removal, low light enhancement, and deraining, comparing the performance with recent competing methods. In all experiments, the authors apply five diffusion steps. The results show that the proposed algorithm generally outperforms the competitors while consuming fewer computational resources (multiplication operations times the number of diffusion steps).

优点

I believe the idea of conditioning on the residual term is original and significant. It allows deriving a reverse process that starts from the corrupted image mixed with noise rather than from pure noise, reducing the number of diffusion steps required for image restoration.

缺点

Several notations are used without being defined. For example, the definitions of $\alpha_t$ , $\beta_t$ , $\bar{\alpha}_t$ are missing. Although the paper generally follows the notations defined in [1], defining these notations in the current paper would make it easier for readers to follow the explanations.
Some mathematical derivations are not detailed enough. For example, the proof in Appendix A.1 is not sufficiently detailed. Another example is derivation of $\mu_{\theta}$ in Equation (13), which is missing. It is completely fine to rely on derivations in [1] or [2], but I believe including the relevant parts of the derivations in the appendix would greatly help readers follow the derivations.

[1] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.

[2] Calvin Luo. Understanding diffusion models: A unified perspective. arXiv preprint arXiv:2208.11970, 2022.

问题

To provide a more comprehensive background, it might be beneficial to cite [3] which introduced a general framework for leveraging generative diffusion for solving image restoration problems where the observed image is contaminated by linear degradation and additive white Gaussian noise.

[3] Bahjat Kawar, Gregory Vaksman, and Michael Elad. SNIPS: Solving noisy inverse problems stochastically. Advances in neural information processing Systems 34:21757-21769, 2021.

局限性

Yes.

作者回复

2024-08-03

Thank you for the detailed review and thoughtful feedback. Below we address specific questions and comments.

Several notations are used without being defined. For example, the definitions of $\alpha_t$ , $\beta_t$ , $\overline{\alpha}_{t}$ are missing.

For all constant hyperparameters, our definitions are completely consistent with the reference [1].

We will certainly add a detailed definition of the constant hyperparameters in the revised paper.

We provide a detailed definition (which is also declared in Author Rebuttal Section 1):

$\beta_t = 1 - \alpha_t$
$\overline{\alpha}_{t} = \prod_{s=1}^{t}\alpha _s$
the $\Sigma_{\theta}$ is taken fixed as $\widetilde\beta_{t} = \frac{1-\overline{\alpha}_{t-1}}{1-\overline{\alpha}_{t}} \beta_t$

Some mathematical derivations are not detailed enough. For example, the proof in Appendix A.1 is not sufficiently detailed.

By simply performing a change of variable ( $x_0 \rightarrow x_0-R$ , $x_t \rightarrow x_t-R$ , $x_{t-1} \rightarrow x_{t-1}-R$ ), the derivation of Eq. (18) is identical in form to (71)-(84) in the reference [2], where line 6-7 of Eq. (18) corresponds to (73).

We will certainly add a detailed derivation of Eq. (18) in the revised paper.

Another example is derivation of $\mu_{\theta}$ in Equation (13), which is missing.

The derivation of $\mu_{\theta}$ corresponds to Eq. (19) - Eq. (23) in Appendix A.1. We provide a detailed explanation of the derivation process in Author Rebuttal Section 2.

We will certainly clarify the connection between Eq. (13) and Eq. (19) - Eq. (23) in the revised paper.

To provide a more comprehensive background, it might be beneficial to cite [3] which introduced a general framework for leveraging generative diffusion for solving image restoration problems where the observed image is contaminated by linear degradation and additive white Gaussian noise.

SNIPS [3] combines annealed Langevin dynamics and Newton's method to arrive at a posterior sampling algorithm, recovering the clean images from the white Gaussian noise. As a pioneer in exploring the generative diffusion processes to solve the general linear inverse problems ( $y=Hx+z$ ), SNIPS has made remarkable contributions to image restoration. By incorporating the residual term into the diffusion forward process, Resfusion recovers the clean images directly from the noisy degraded images. Also, Resfusion explores more complex scenarios where the degradation $H$ and the noise level are unknown, such as shadow removal, low-light enhancement, and deraining, achieving competitive performance.

We will certainly introduce SNIPS [3] in the background of the revised paper.

References

[1] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.

[2] Calvin Luo. Understanding diffusion models: A unified perspective. arXiv preprint arXiv:2208.11970, 2022.

[3] Bahjat Kawar, Gregory Vaksman, and Michael Elad. SNIPS: Solving noisy inverse problems stochastically. Advances in neural information processing Systems 34:21757-21769, 2021.

2024-08-13

Thank you for the answers. After carefully considering the comments from other reviewers, the authors' feedback, and the discussion with reviewer 4Jy9, I have decided to maintain my original rating, i.e., Weak Accept.

2024-08-13

Thank you sincerely for the professional review and recognition of our work.

作者回复

2024-08-03

We sincerely thank the reviewers for their valuable and encouraging comments. Below we address specific questions and comments.

Section 1: Definition of constant hyperparameters.

For all constant hyperparameters, our definitions are completely consistent with the reference [1]. We provide a detailed definition:

$\beta_t = 1 - \alpha_t$
$\overline{\alpha}_{t} = \prod_{s=1}^{t}\alpha _s$
the $\Sigma_{\theta}$ is taken fixed as $\widetilde\beta_{t} = \frac{1-\overline{\alpha}_{t-1}}{1-\overline{\alpha}_{t}} \beta_t$

Section 2: Explanation of the derivation of $\mu_{\theta}$ .

The derivation of $\mu_{\theta}$ corresponds to Eq. (19) - Eq. (23) in Appendix A.1. We provide a detailed explanation of the derivation process:

The ground truth mean $\widetilde\mu$ for the resnoise-diffusion reverse process is derived from Eq. (18) and formalized by Eq. (19). We aim to approximate the unattainable $\widetilde\mu$ by learning a $\mu_{\theta}$ .
According to Eq. (20), we can simplify $\widetilde\mu$ in Eq. (19) to the form in Eq. (21). By simply performing a change of variables ( $x_0 \rightarrow x_0-R$ , $x_t \rightarrow x_t-R$ ), the derivation process becomes exactly identical in form to the derivation of equations (115)-(124) in the reference [2], where Eq. (20) corresponds to (115) and Eq. (19) corresponds to (116).
According to Eq. (22), we can modify Eq. (21) as Eq. (23). Since $x_t$ is obtainable, we only need to learn $res\epsilon$ , and $\mu_{\theta}$ is formulated as Eq. (13).

Section 3: Truncated noise schedule

For all experiments, we used the truncated version of the Linear Schedule from reference [3] as the noise schedule for $\beta_t$ (or $\alpha_t$ ), which we refer to as the Truncated Linear Schedule. We provide a detailed implementation of the Truncated Linear Schedule in Appendix A.7.

It is worth mentioning that in the Supplementary Material, we provide the relationship between $\sqrt{\overline{\alpha}_{t}}$ , $T$ , and $T'$ under the Truncated Linear Schedule in the ./assets/acc_T_change_table.xlsx file. We provide a detailed code implementation for the Truncated strategy (self._alpha_hat represents $\overline{\alpha}$ ):

self._sqrt_alpha_hat = torch.sqrt(self._alpha_hat)
idx = find_closest_index(self._sqrt_alpha_hat, 0.5)
if 0.5 - self._sqrt_alpha_hat[idx].item() > 0.01:
    # The value after the acceleration point is useless and can be discarded
    self._sqrt_alpha_hat = torch.cat((self._sqrt_alpha_hat[:idx], torch.tensor([0.5])))
    self._alpha_hat = torch.cat((self._alpha_hat[:idx], torch.tensor([0.5 ** 2])))
    self._alpha = torch.cat((self._alpha[:idx], torch.tensor([0.5 ** 2]) / self._alpha_hat[idx-1]))
    self._beta = 1.0 - self._alpha

Section 4: Visualization

References

[1] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.

[2] Calvin Luo. Understanding diffusion models: A unified perspective. arXiv preprint arXiv:2208.11970, 2022.

[3] Nichol, Alexander Quinn, and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. International conference on machine learning. PMLR, 2021.

2024-08-09

Dear reviewers, do the authors' responses answer your questions or address your concerns? Thanks.

2024-08-12

Dear reviewers, as we approach the final two days, please take a moment to review the author's responses and join the discussion. Thank you!

2024-08-14

Dear reviewers, the authors are eagerly awaiting your response. The author-reviewer discussion closes on Aug 13 at 11:59 pm AoE. Thanks!

2024-08-14

We sincerely appreciate Area Chair Ydph. Your responsible feedback has greatly inspired us. We would also like to express our gratitude to all the reviewers for their professional review and recognition of our work.

最终决定Accept (poster)

2024-09-25

The paper introduces Resfusion, a novel framework for image restoration using denoising diffusion probabilistic models (DDPM). The key innovation is incorporating a residual term into the diffusion forward process, enabling the reverse process to start directly from noisy degraded images, thereby reducing the number of diffusion steps. The method shows competitive performance across multiple image restoration tasks, including shadow removal, low-light enhancement, and deraining. We are glad the accept this paper because of its all positive ratings. The paper presents a solid contribution to the field of image restoration with innovative ideas. We encourage the authors to include the suggestions from the reviewers into their camera-ready version.

Resfusion: Denoising Diffusion Probabilistic Models for Image Restoration Based on Prior Residual Noise

摘要

评审与讨论

优点

缺点

问题

局限性

References

优点

缺点

问题

局限性

References

优点

缺点

问题

局限性

References

优点

缺点

问题

局限性

References

Section 1: Definition of constant hyperparameters.

Section 2: Explanation of the derivation of μθ\mu_{\theta}μθ​.

Section 3: Truncated noise schedule

Section 4: Visualization

References

Section 2: Explanation of the derivation of $\mu_{\theta}$ .