PaperHub
4.2
/10
Rejected5 位审稿人
最低1最高6标准差1.9
6
1
6
5
3
4.2
置信度
正确性3.0
贡献度1.8
表达3.0
ICLR 2025

Stochastic Deep Restoration Priors for Imaging Inverse Problems

OpenReviewPDF
提交: 2024-09-18更新: 2025-02-05
TL;DR

We introduce ShaRP, a novel method that leverages an ensemble of image restoration priors to regularize inverse problems.

摘要

关键词
computational imaginginverse problemsdeep learning

评审与讨论

审稿意见
6

This paper introduces Stochastic deep Restoration Priors (ShaRP), a method that leverages an ensemble of deep restoration models to regularize imaging inverse problems. ShaRP improves upon Gaussian denoiser-based methods by handling structured artifacts more effectively, enabling self-supervised training without fully sampled data.

优点

  • The paper is well-written and easy to follow.
  • The theoretical analysis of the convergence is thorough and well-explained.

缺点

  1. Comparison to Diffusion-Based Methods: The paper lacks a comprehensive comparison to diffusion-based methods, such as DiffIR [1] and DDRM [2]. Including these comparisons would strengthen the results and provide clarity on how ShaRP performs relative to other leading methods in the field.

  2. Supervised vs. Self-Supervised ShaRP: In line 154, the authors mention a key contribution: "We implement ShaRP with both supervised and self-supervised restoration models as priors and test it on two inverse problems: CS-MRI and SISR." However, the experimental section does not provide a direct comparison between the self-supervised and supervised versions of ShaRP for the same task. This comparison is necessary to assess the benefits of the self-supervised approach.

  3. Self-Supervised Nature: For a restoration network to be trained on a set of tasks, such as a set of blur kernels HiH_i, access to ground truth data is still required. This approach, which involves sampling multiple times from the ground truth, raises questions about whether the method can truly be considered self-supervised. Clarification is needed regarding the self-supervised claim.

  4. Use of Multiple Degradation Operators: The rationale for using a set of degradation operators H1,H2,,HkH_1, H_2, \ldots, H_k in cases where the target problem involves only a single fixed operator (e.g., H1H_1) is unclear. It would be helpful if the authors could explain why introducing multiple degradation operators is necessary or beneficial when solving a fixed-task problem.

References:

  • [1] DiffIR: Efficient Diffusion Model for Image Restoration
  • [2] Denoising Diffusion Restoration Models

问题

What is the practical inference time of the proposed method in comparison to state-of-the-art (SOTA) methods? Additionally, the visual comparisons presented do not clearly demonstrate significant improvements. It would be beneficial to include more compelling visual examples to better illustrate the advantages of the proposed approach.

评论

We thank the Reviewer for their time and feedback. Please see below for our point-by-point responses to your comments.

Comparison to Diffusion-Based Methods: The paper lacks a comprehensive comparison to diffusion-based methods, such as DiffIR [1] and DDRM [2]. Including these comparisons would strengthen the results and provide clarity on how ShaRP performs relative to other leading methods in the field.

Thank you for bringing these two methods to our attention. We have cited corresponding papers and included comparisons with these two methods in Section D.3 of the supplement.

Supervised vs. Self-Supervised ShaRP: In line 154, the authors mention a key contribution: "We implement ShaRP with both supervised and self-supervised restoration models as priors and test it on two inverse problems: CS-MRI and SISR." However, the experimental section does not provide a direct comparison between the self-supervised and supervised versions of ShaRP for the same task. This comparison is necessary to assess the benefits of the self-supervised approach.

Prompted by your comment, we revised the sentence to eliminate any confusion and added a table to clearly illustrate the performance difference between both versions of ShaRP. It is important to emphasize that the key advantage of the self-supervised approach is not superior performance, but the ability to learn without requiring fully-sampled MRI data—a scenario where training a Gaussian-denoising network is not feasible.

Self-Supervised Nature: For a restoration network to be trained on a set of tasks, such as a set of blur kernels (H_i), access to ground truth data is still required. This approach, which involves sampling multiple times from the ground truth, raises questions about whether the method can truly be considered self-supervised. Clarification is needed regarding the self-supervised claim.

Prompted by your comment, we have clarified the term “self-supervised”. Our approach trains an MMSE restoration network using only undersampled measurements, without any access to the clean images. Specifically, for 8x undersampled MRI restoration, we use two 8x undersampled measurements, one as the training target and another as the input to the restoration network. Note that the combination of these two measurements remains undersampled, ensuring no reliance on fully-sampled data. This terminology is well-established in the CS-MRI literature, as can be seen in references [1–3] below.

[1] Millard, Charles, and Mark Chiew. "A theoretical framework for self-supervised MR image reconstruction using sub-sampling via variable density Noisier2Noise." IEEE transactions on computational imaging (2023).

[2] Gan, Weijie, et al. "Self-supervised deep equilibrium models with theoretical guarantees and applications to MRI reconstruction." IEEE Transactions on Computational Imaging (2023).

[3] Akçakaya, Mehmet, et al. "Unsupervised deep learning methods for biological image reconstruction and enhancement: An overview from a signal processing perspective." IEEE Signal Processing Magazine 39.2 (2022): 28-44.

Use of Multiple Degradation Operators: The rationale for using a set of degradation operators (H_1, H_2, \ldots, H_k) in cases where the target problem involves only a single fixed operator (e.g., (H_1)) is unclear. It would be helpful if the authors could explain why introducing multiple degradation operators is necessary or beneficial when solving a fixed-task problem.

  1. The strength of our framework lies in its versatility, enabling seamless integration of a wide-range of restoration models within a unified formulation in eq. (6). Note how our approach achieves performance improvements over popular methods using Gaussian-denoising priors.
  2. It is worth highlighting that without our framework, the direct application of a restoration model trained for 8×8\times MRI reconstruction to the 4×4\times and 6×6\times scenarios results in a significant performance degradation. This is shown in Section C.2 of the supplementary material. Our framework enables a principled integration of the 8x model as prior without re-training and without performance degradations.
评论

Thank you for your detailed responses and for addressing the concerns raised. After reviewing the other reviewers' comments and your rebuttals, I have some additional thoughts.

Firstly, I concur with reviewer XTiX's observations regarding the argument about the inadequacy of Gaussian-denoiser networks for solving general inverse problems, at least from a theoretical standpoint. A Gaussian denoiser is sufficient for learning the image prior, with the data fidelity term addressing noise based on the corruption in measurements. Additionally, to ensure the learned prior is independent of the forward operator and generalizable, a Gaussian denoiser is preferable to a restoration network tied to specific degradation operators.

In this way, it appears that the key novelty in your work is the use of multiple degradation operators. While this approach adds value and has potential benefits, I am not entirely convinced by the arguments presented against the use of Gaussian denoisers in this context.

Thus, I prefer to maintain my original rating of 6.

评论

Thank you for the additional feedback. We provide additional responses below. Please let us know if there is anything we can provide to increase your score for our paper.

Firstly, I concur with reviewer XTiX's observations regarding the argument about the inadequacy of Gaussian-denoiser networks for solving general inverse problems, at least from a theoretical standpoint. A Gaussian denoiser is sufficient for learning the image prior, with the data fidelity term addressing noise based on the corruption in measurements. Additionally, to ensure the learned prior is independent of the forward operator and generalizable, a Gaussian denoiser is preferable to a restoration network tied to specific degradation operators.

Your statement highlights several misconceptions that can better clarify our work.

  1. We agree with the premise that Gaussian-denoiser networks can learn an image prior (in the sense that are sufficient). While Gaussian-denoisers can be used as image priors prior, they are only optimal on images corrupted with pure Gaussian noise. Our work shows that by training on more complex (structured) noise types—due to each degradation operator being different—we can further improve the performance on different image reconstruction tasks.
  2. Note that a Gaussian-denoiser has its own forward operator, which is H = I. It is by now accepted that Gaussian-denoiser networks trained on a variety of noise levels do better than the ones trained on a single noise level. Our work further expands this view by arguing that networks trained on a variety of degradation types, which may include H = I—can do even better. We see no contradiction there with the usefulness of Gaussian denoisers for learning priors.

In this way, it appears that the key novelty in your work is the use of multiple degradation operators. While this approach adds value and has potential benefits, I am not entirely convinced by the arguments presented against the use of Gaussian denoisers in this context.

We disagree with the ‘limited novelty’ assessment. While we recognize that no paper is without limitations, our work introduces several new ideas, theories, and results that we believe warrant recognition as novel contributions.

  1. More complex ensemble of denoisers: It is known that an ensemble of Gaussian denoisers trained at different noise levels does better than a single denoiser trained at one noise level. Our work shows that extending the “ensemble of denoisers” to include more complex (structured) noise types—due to each degradation operator—can further improve the performance. It is rather surprising that this non-trivial extension to a more complex “ensemble of denoisers” can still be used as an implicit prior. It is worth emphasizing that our framework can seamlessly incorporate both Gaussian denoisers and more complex structured denoisers into one closed-form implicit prior. This theoretical and conceptual novelty in our work deserves to be acknowledged.

  2. Training from undersampled measurements: It is not possible to train Gaussian denoisers without fully-sampled measurements. Our framework provides an elegant solution by suggesting to train restoration networks on available undersampled measurements and using them as an implicit prior without retraining. This is a non-trivial contribution that deserves acknowledgement.

  3. State-of-the-art performance: Our claims are backed up by impressive empirical performance on two separate inverse problems, MRI reconstruction and image super-resolution. This improvement due to the incorporation of non-Gaussian denoisers—or as we call them restoration operators—deserves to be acknowledged.

评论

Dear Authors,

I acknowledge the contributions in your paper and the good writing, which is why I assigned a score of 6. I’d like to clarify my previous comment: "The key novelty in your work appears to be the use of multiple degradation operators, which adds value but does not entirely convince me in the context of arguments against Gaussian denoisers."

It seems the "more complex ensemble of denoisers" is due to using "multiple degradation operators". Your claimed state-of-the-art performance might largely result from these operators sampling more information compared to using a single one, as in most comparison methods. However, this point would benefit from further experimental validation.

Additionally, the contribution of "training from undersampled measurements" is valuable, but it is also a common approach in self-supervised image restoration, making it less novel.

I hope these clarifications help in strengthening your work. Overall, I believe it has potential, and further experiments may help solidify the arguments presented.

Best regards,

审稿意见
1

This paper extends the "deep restoration priors" approach from Hu et al 2024c from one degradation operator to multiple ones.

优点

The paper is clearly written, for the most part.

缺点

This paper is a trivial extension of Hu et al 2024c. In this paper, the degradation operator is randomly chosen from a set, whereas in Hu et al 2024c there was a single operator. But this change led to no new challenges or questions. For example, the "ShaRP" regularizer in (6) is a trivial extension of (9) in Hu et al 2024c that now takes an expectation over H. Figure 2 in this paper is a direct copy of Figure 1 from Hu et al 2024c. The experiments are conducted on different linear inverse problems, but there is no intellectual contribution there.

问题

I don't have any questions for the authors.

评论

We disagree with your assessment of our paper as a "trivial extension" of Hu et al. (2024c) and with the argument that we “directly copied” a figure from Hu et al. (2024c).

None of the figures in our submission are copies from any other paper. All the figures represent different experiments, datasets, and analyses involving our method, and of course they are not included or related to Figure 1 in Hu et al. (2024c) or any other figure in previous works. We respectfully request the reviewer re-read both papers more carefully and withdraw the baseless accusation. We take unfounded accusations of plagiarism very seriously, and the reviewer should too.

ShaRP is a Non-Trivial Extension:

  1. New regularizer: The regularizer in ShaRP is different from that in Hu et al. (2024c) because it promotes solutions whose multiple degraded observations resemble realistic degraded images. This leads to a richer image prior that integrates information from diverse degradation types, leading to more robust restoration capabilities.

  2. New algorithm: (a) ShaRP is a stochastic algorithm, while DRP from Hu et al. (2024c) is deterministic. Saying that ShaRP is a trivial extension due to the use of restoration operators is analogous to saying that SGD is a trivial extension of the gradient descent due to the use of the gradient. (b) ShaRP eliminates the need for a scaled proximal operator used DRP from Hu et al. (2024c). Thus, unlike the algorithm DRP, ShaRP doesn’t need the assumption that HTHH^TH must be positive definite, making it applicable to many more degradation operators.

  3. New theory: Our Theorem 1 is completely novel. It shows that our novel regularizer can be efficiently minimized using SGD. Hu et al. (2024c) doesn’t have any analogous result.

  4. Training from undersampled measurements: ShaRP doesn’t require fully sampled MRI measurements for training, which is not possible using the DRP method in Hu et al. (2024c). This flexibility is an essential for MRI and is a significant breakthrough not achieved by any previous method.

  5. Much better performance: ShaRP outperforms DRP from Hu et al. (2024c) across all experiments. ShaRP is on average 3.65 dB better than DRP on CS-MRI in Table 1 and 0.5 dB better than DRP on super-resolution in Table 3.

We hope that this clarification addresses your concerns.

Sincerely, The Authors

评论

Thanks to the authors for their response.

  1. Yes, I understand that this paper uses multiple degradation operators, whereas Hu'24 used a single operator. My point is that the use of multiple operators is a trivial extension that leads to no new challenges or insights, and thus is insufficient for acceptance in a high-profile conference such as ICLR.

  2. Sure, ShaRP plugs in a different optimization algorithm than Hu'24, and in particular the SNORE approach from (Renaud et al 2024b). But there is no intellectual contribution to selecting a different off-the-shelf algorithm.

  3. Theorem 1 definite not "completely novel" with "no analogous result" in Hu'24. It is basically identical to (16) from Hu'24 except for one expectation that accounts for the SNORE stochasticity and another that accounts for the multiple stochastic operators HH. All steps in the proof are the same. In particular,

    • (5) matches (8) in Hu'24;
    • the equation on line 830 matches (14) in Hu'24;
    • (11) matches (15) in Hu'24;
    • (7) matches (16) in Hu'24 except for the expectations.
  4. The proposed method doesn't require fully sampled measurements only because the existing SNORE approach was applied. Plugging in a different optimization approach from the one chosen in Hu'24 is not a significant contribution.

  5. Sure, the performance is expected to get better as one adds more operators. But again, going from one to several restoration operators (and plugging in different off-the-shelf optimization algorithm) is not sufficiently significant for a publication in ICLR.

评论

We thank the Reviewer for their time and feedback. Please see below for our point-by-point responses to your comments.

Yes, I understand that this paper uses multiple degradation operators, whereas Hu'24 used a single operator. My point is that the use of multiple operators is a trivial extension that leads to no new challenges or insights, and thus is insufficient for acceptance in a high-profile conference such as ICLR.

We respectfully disagree with the ‘a trivial extension that leads to no new challenges or insights’ assessment. While we recognize that no paper is without limitations, our work introduces several new ideas, theories, and results that we believe warrant recognition as novel contributions.

  1. More complex ensemble of denoisers: It is known that an ensemble of Gaussian denoisers trained at different noise levels does better than a single denoiser trained at one noise level. Our work shows that extending the “ensemble of denoisers” to include more complex (structured) noise types—due to each degradation operator—can further improve the performance. It is rather surprising that this non-trivial extension to a more complex “ensemble of denoisers” can still be used as an implicit prior. It is worth emphasizing that our framework can seamlessly incorporate both Gaussian denoisers and more complex structured denoisers into one closed-form implicit prior. This theoretical and conceptual novelty in our work deserves to be acknowledged.
  2. Training from undersampled measurements: It is impossible to train Gaussian denoisers without fully-sampled measurements. Our framework provides an elegant solution by suggesting to train restoration networks on available undersampled measurements and using them as an implicit prior without retraining. This is a non-trivial contribution that deserves acknowledgement.
  3. State-of-the-art performance: Our claims are backed up by impressive empirical performance on two separate inverse problems, MRI reconstruction and image super-resolution. This improvement due to the incorporation of non-Gaussian denoisers—or as we call them restoration operators—deserves to be acknowledged.

Sure, ShaRP plugs in a different optimization algorithm than Hu'24, and in particular the SNORE approach from (Renaud et al 2024b). But there is no intellectual contribution to selecting a different off-the-shelf algorithm.

There is no direct connection between ShaRP and SNORE. SNORE is a denoiser-prior-based method, similar to PnP, whereas our approach leverages restoration networks. The two corresponding algorithms are very different. The reviewer is pointing out a superficial similarity and conflating this with a real connection without providing specifics. One might suggest that the intellectual contribution is in plain sight if one looks a bit more carefully past the superficial similarity. We’ve endeavored to clarify this point in our various responses with little success. We hope that the reviewer is sufficiently charitable and kind as to look a bit more carefully at the work.

Theorem 1 definite not "completely novel" with "no analogous result" in Hu'24. It is basically identical to (16) from Hu'24 except for one expectation that accounts for the SNORE stochasticity and another that accounts for the multiple stochastic operators . All steps in the proof are the same. In particular, (5) matches (8) in Hu'24; the equation on line 830 matches (14) in Hu'24; (11) matches (15) in Hu'24; (7) matches (16) in Hu'24 except for the expectations.

Please revisit our proof and note that ShaRP’s proof is not merely a repetition of previous work or that 'all steps in the proof are the same.' The similarity you mentioned highlights a key contribution of our paper: we extend the formulation in a mathematically elegant way, generalizing from a single restoration prior to an ensemble of priors. This extension not only allows for a clear mathematical explanation of the ensemble but also introduces a straightforward stochastic approach to implement this regularizer effectively.

The proposed method doesn't require fully sampled measurements only because the existing SNORE approach was applied. Plugging in a different optimization approach from the one chosen in Hu'24 is not a significant contribution.

Unlike methods such as SNORE, ShaRP can leverage MMSE restoration models as priors, even when only undersampled measurements are available as training data. This capability, rigorously established in Theorem 3 and Section B.1.2, stems from ShaRP's ability to directly learn accurate priors from undersampled data. Furthermore, our empirical results demonstrate the exceptional performance of these priors in mismatched settings, surpassing existing denoiser-based approaches. This contrasts with previous work like SNORE, which relies exclusively on denoiser priors and lacks the capacity to learn from undersampled measurements.

评论

Sure, the performance is expected to get better as one adds more operators. But again, going from one to several restoration operators (and plugging in different off-the-shelf optimization algorithm) is not sufficiently significant for a publication in ICLR.

We disagree with the assertion that our performance improvement was expected. In science, no outcome can be presumed without rigorous proof of concept. Our work introduces a novel approach to solving imaging inverse problems, demonstrating not only superior performance but also a broader generalization of the concept of priors. We believe this represents a significant advancement in the field, warranting recognition at ICLR. Moreover, we are confident that our work will inspire researchers to expand their perspectives on designing and selecting priors tailored to their specific target problems.

We are disappointed by the highly dismissive review and the extremely low score. While we acknowledge the reviewer's concerns, we respectfully disagree with the assessment of our paper's contribution. Our formulation and analysis differ significantly from the referenced literature, as detailed in our response. The reviewer's insistence on trivializing our work and the use of a score of 1 suggests a lack of objectivity. We believe our paper presents a valuable contribution to the field and hope for a more fair and constructive evaluation. We respectfully request that you maintain professionalism and evaluate our work based on facts instead of misinterpretations, especially with confidence of your evaluation being 5.

Furthermore, we have not yet received your response regarding the accusation of 'direct copying'.

评论

With all due respect, I'm not hearing anything new from the authors here.

results that we believe warrant recognition as novel contributions

  1. More complex ensemble of denoisers
  2. Training from undersampled measurements: It is impossible to train Gaussian denoisers without fully-sampled measurements.
  3. State-of-the-art performance
  1. Yes, I acknowledge that you extended the idea from Hu'24 from one to multiple denoisers. I just don't find that sufficient for acceptance to ICLR.

  2. There is a large literature on recovering images without fully sampled training data. Using those recoveries, one can train a Gaussian denoiser by adding noise.

  3. Every submitted paper claims their performance is state-of-the-art. Such claims are not sufficient for acceptance.

There is no direct connection between ShaRP and SNORE.

I disagree. First, the connection was stated by the authors themselves on line 151. Second, the connection is obvious after comparing the authors' Alg. 1 to Alg. 2 in the SNORE paper, Renaud'24.

Please revisit our proof and note that ShaRP’s proof is not merely a repetition of previous work or that 'all steps in the proof are the same.'

Not only have I read your proof very carefully, but I have provided line-by-line correspondences between all of it's key steps and those in Hu'24, showing that they are essentially identical.

Unlike methods such as SNORE, ShaRP can leverage MMSE restoration models as priors.

Yes, but using MMSE restoration models as priors was the main contribution of Hu'24. Your contribution is just to extend Hu'24 from one to multiple restoration models and swap the proximal-gradient algorithm for SNORE.

We disagree with the assertion that our performance improvement was expected.

No problem. We can agree to disagree here.

Our formulation and analysis differ significantly from the referenced literature, as detailed in our response.

Your formulation materially differs from Hu'24 only in using multiple restoration operators (instead of one) and swapping the use of proximal-gradient for SNORE. I gave a line-by-line correspondence between the steps in your proof and those in Hu'24. I'm not sure what else to say.

We respectfully request that you maintain professionalism and evaluate our work based on facts instead of misinterpretations, especially with confidence of your evaluation being 5.

Sorry but where have I been unprofessional? And where have I not used facts? I've provided explicit line-by-line comparisons of your work to Hu'24 and Renaud'24, showing how little they differ. If those are not facts, then I don't know what is. Based on those facts, my confidence is very high indeed.

In the end, the question is whether your contribution is significant enough to warrant acceptance to ICLR. Based on the high degree of overlap with Hu'24 and Renaud'24, I feel that it is not.

评论

There is a large literature on recovering images without fully sampled training data. Using those recoveries, one can train a Gaussian denoiser by adding noise.

If you believe this approach can address general inverse problems, we kindly ask for a reference to papers that specifically demonstrates such use. In particular, for the setting where there is no ground-truth for training the recovery algorithm.

I disagree. First, the connection was stated by the authors themselves on line 151. Second, the connection is obvious after comparing the authors' Alg. 1 to Alg. 2 in the SNORE paper, Renaud'24.

While we cited the connection in line 151 of our paper, this was to highlight high-level similarities, not equivalences. Your attempt to turn a citation to equivalence is disingenuous. Upon comparing Algorithm 1 from our paper to Algorithm 2 from SNORE (Renaud'24), the key differences become evident. To clarify further:

  1. Our ShaRP algorithm: xk=xk1γ(g(xk1)+(τ/σ2)HTH(xk1R(s,H))x_k = x_{k−1} − \gamma (\nabla g(x_{k−1}) + (\tau /\sigma^2 )H^T H(x_{k−1} − R(s, H)) .

  2. SNORE algorithm: xk=xk1δ(g(xk1)+(α/σ2)(xk1D(xk1))x_k = x_{k−1} − \delta (\nabla g(x_{k−1}) + (\alpha / \sigma^2 ) (x_{k−1} − D(x_{k−1})) .

We wish the other researchers could see the difference.

While the forms may look superficially similar, the differences in key terms—such as the role of R(s,H)R(s,H) in ShaRP versus D(xk1)D(x_{k-1}) in SNORE—indicate fundamentally distinct approaches. This key difference leads to different regularization and fundamentally different concepts.

审稿意见
6

This paper introduces a novel concept termed ShaRP for stochastic priors for the regularization of inverse problems. At its core, a set of bb restoration problems is considered, for which the MMSE is minimized. In this setting, ShaRP is the expectation of the probability of the degraded version of the images given the probability density of the observations. Under certain assumptions, a closed-form solution of the gradient of the regularizer is derived and the convergence of f\nabla f is shown. The results are numerically tested for CS-MRI and SISR.

优点

The paper introduces a novel concept for stochastic priors, which is complemented by two important theoretical assertions (see summary). The assumptions can be regarded as mild. Both results are highly relevant for numerical experiments. The paper remarkably advances SOTA for CS-MRI and is on par with competing methods in the case of SISR.

缺点

The presentation of the paper could be improved in some places. In particular, I am missing a more concise introduction to the motivation behind the actual concept of ShaRP in Section 4. The parameter bb representing the number of restoration problems is essential for the numerical performance. However, I am lacking a substantial discussion of the role of this parameter. In particular, no details about the role of α\alpha (see Section 5.1) and its impact on the results were provided.

问题

  1. Can you please provide more details about the impact of the choice of bb on the numerical results? An ablation study might help here. Likewise, you could numerically evaluate the impact of α\alpha.
  2. What happens if you modify the number of restoration priors in B.1.1?
  3. Please rewrite the motivation in Section 4. In particular, I am missing a good motivation and reasoning for the actual definition in equation (6), which might help the reader to better understand ShaRP. In addition, the object Gσ(sHx)G_\sigma(s-Hx) could be better motivated from a mathematical point of view.
  4. The results in Table 4 do not show a clear tendency for DiffPIR, DRP, or ShaRP. Is there a reason for this available? Are there specific conditions under which each method performs best?
评论

We thank the Reviewer for their time and feedback. Please see below for our point-by-point responses to your comments.

The presentation of the paper could be improved in some places. In particular, I am missing a more concise introduction to the motivation behind the actual concept of ShaRP in Section 4. The parameter bb representing the number of restoration problems is essential for the numerical performance. However, I am lacking a substantial discussion of the role of this parameter. In particular, no details about the role of α\alpha (see Section 5.1) and its impact on the results were provided.

We have edited the manuscript to improve the presentation. The essence of our framework is captured by the regularizer in eq. (6), which is maximized if degraded versions of xx are highly probable in the distribution p(sH)p(s|H), where HH is sampled from p(H)p(H). Hence, mathematically our regularizer can incorporate infinitely many degradation operators. While adding the ability to restore from more degradations improves the performance, it increases the computational complexity of training the restoration network. The revised manuscript includes a new discussion and an ablation study in the supplementary material showing the impact of bb.

Can you please provide more details about the impact of the choice of α\alpha on the numerical results? An ablation study might help here. Likewise, you could numerically evaluate the impact of bb.

The revised manuscript includes the requested ablation studies in Section E.2 (see also below).

What happens if you modify the number of restoration priors in B.1.1?

We added an ablation study to Section E.2 on the performance under different values of b.

Please rewrite the motivation in Section 4. In particular, I am missing a good motivation and reasoning for the actual definition in equation (6), which might help the reader to better understand ShaRP. In addition, the object Gσ(sHx)G_\sigma(s - Hx) could be better motivated from a mathematical point of view.

We have edited the corresponding section of the paper. Our regularizer is minimized if degraded versions of xx are highly probable in the distribution p(sH)p(s|H), where H is sampled from p(H)p(H). In other words, a solution x is considered good if, for all considered HH, its degraded version HxHx matches the degraded versions HxHx^* of clean images xp(x)x^* \sim p(x).

The results in Table 4 do not show a clear tendency for DiffPIR, DRP, or ShaRP. Is there a reason for this available? Are there specific conditions under which each method performs best?

Prompted by your comment, we included a more detailed discussion for this table.

The table shows that ShaRP achieves the best performance in terms of distortion metrics such as PSNR and SSIM, while DiffPIR excels in the perceptual quality metric LPIPS. This trade-off arises because DiffPIR operates as a generative method that inherently favors perceptual quality. ShaRP consistently outperforms DRP in both distortion and perceptual metrics, showing the advantage of using a prior based on multiple degradation operators.

评论

Thanks for the precise answers and the clarifications in the paper. I appreciate your feedback, but I am not enthusiastic about the manuscript, which is why I do not change my evaluation. One major reason for this judgement is the incremental novelty in this field (e.g., in comparison with Hu et al 2024c) as also outlined by two other reviewers.

评论

Thank you for the additional feedback. We provide additional responses below. Please let us know if there is anything we can provide to increase your score for our paper.

We respectfully disagree with the ‘incremental novelty’ assessment. While we recognize that no paper is without limitations, our work introduces several new ideas, theories, and results that we believe warrant recognition as novel contributions.

  1. More complex ensemble of denoisers: It is known that an ensemble of Gaussian denoisers trained at different noise levels does better than a single denoiser trained at one noise level. Our work shows that extending the “ensemble of denoisers” to include more complex (structured) noise types—due to each degradation operator—can further improve the performance. It is rather surprising that this non-trivial extension to a more complex “ensemble of denoisers” can still be used as an implicit prior. It is worth emphasizing that our framework can seamlessly incorporate both Gaussian denoisers and more complex structured denoisers into one closed-form implicit prior. This theoretical and conceptual novelty in our work deserves to be acknowledged.
  2. Training from undersampled measurements: It is not possible to train Gaussian denoisers without fully-sampled measurements. Our framework provides an elegant solution by suggesting to train restoration networks on available undersampled measurements and using them as an implicit prior without retraining. This is a non-trivial contribution that deserves acknowledgement.
  3. State-of-the-art performance: Our claims are backed up by impressive empirical performance on two separate inverse problems, MRI reconstruction and image super-resolution. This improvement due to the incorporation of non-Gaussian denoisers—or as we call them restoration operators—deserves to be acknowledged.
评论

Thanks for pointing this out! I am aware of these contributions. I explicitly disagree with reviewer E3Gi stating that it is a "trivial extension". Nevertheless, I agree with bQ8e in nearly all arguments, which is my rating is 6.

审稿意见
5

The paper proposes a new approach for learning a regularizer for linear imaging inverse problems. While Gaussian denoisers have been successfully used as effective image priors, the authors extend the idea by using more general restoration operators as image priors. In the numerical experiments, the authors show the reconstruction performance with their new “ShaRP” regularizers for two important inverse problems (compressive MRI and single-image super-resolution). The numerical experiments are performed with ShaRP regularizers trained in supervised and self-supervised manners.

优点

  1. Using general restoration operators instead of simple Gaussian denoisers is interesting and novel. Theorem 1 also puts this idea on a solid theoretical foundation.

  2. The experimental results are compelling. Comparison of ShaRP with different baseline methods, especially PnP algorithms that utilize Gaussian denoisers, provides evidence for the claim made by the authors in the abstract.

缺点

  1. I am not particularly convinced about the motivation for using a general restoration prior, that arises out of training a deep reconstruction operator for linear inverse problems (with several degradation operators), for solving another linear inverse problem. While learning Gaussian denoisers is a problem that is fundamentally easier than solving a general inverse problem (with an ill-posed operator), this is not the case here.

  2. The novelty in the convergence analysis (namely, the proof of Theorem 2) is unclear to me. To my understanding, the assumptions and the result are along the same lines as in the paper entitled “A Guide Through the Zoo of Biased SGD” by Demidovich et al. (which the authors have cited).

  3. Although the authors show the performance of shaRP trained in an unsupervised manner, I don’t think the training loss used in the unsupervised case (e.g., the loss in Algorithm 3) corresponds to a restoration operator that approximates the conditional expectation E[xs,H]E[x|s, H]. Consequently, the theoretical analysis does not explain the experimental results with a restoration operator trained in an unsupervised manner.

问题

  1. Page 1: “...Tweedie’s formula (Robbins, 1956; Efron, 2011) seemingly implies that Gaussian denoising alone might be sufficient for learning priors,...”: That is indeed true and this work does not disprove this fact.

  2. Page 2: “...ShaRP provides a richer and more flexible representation of image priors…”: At this stage, it is rather vague as to what “richer and more flexible” means.

  3. Page 2: “Unlike Gaussian denoisers, the restoration models in ShaRP can often be directly trained in a self-supervised manner”: Even Gaussian denoisers can be trained in an unsupervised manner using only noisy images (e.g., using a SURE loss).

  4. It might be good to put some of the math background in the intro, effectively shortening the material in the first three pages.

  5. Page 3: “We introduce a novel regularization concept for inverse problems that encourages solutions that produce degraded versions closely resembling real degraded images.”: Firstly, this would be the case if the degradation is the same for which the restoration network is trained, not the forward operator “A” for which you want to solve the inverse problem. Secondly, can one not promote this property using a simple Gaussian denoiser together with a data-consistency loss?

  6. Theorem 1: “Assume that the prior density pxp_x is non-degenerate over Rn\mathbb{R}^n”: This is almost always not true.

  7. Assumption 3: Define what “b(x)b(x)” is.

Requested changes:

  1. Provide a comprehensive review of the convergence analysis of biased SGD in terms of the assumptions and results to put your theoretical contributions in perspective.

  2. Clarify whether one can approximate the conditional expectation E[xs,H]E[x|s, H] using a restoration operator trained on a self-supervised loss. If not, the interpretation of ShaRP as biased SGD does not hold in this case.

评论

We thank the Reviewer for their time and feedback. Please see below for our point-by-point responses to your comments.

I am not particularly convinced about the motivation for using a general restoration prior, that arises out of training a deep reconstruction operator for linear inverse problems (with several degradation operators), for solving another linear inverse problem. While learning Gaussian denoisers is a problem that is fundamentally easier than solving a general inverse problem (with an ill-posed operator), this is not the case here.

Prompted by the feedback, we have revised the paper to better motivate our approach. An important point often overlooked in the literature is that Gaussian-denoiser networks are suboptimal when used as priors to restore other, non-Gaussian artifacts. Our framework allows for tailoring the prior to the artifacts of a given inverse problem, providing an advantage over Gaussian-denoising networks. Our experiments show that restoration networks trained to remove MRI of artifacts at 8×8\times can serve as effective priors for MRI reconstruction at 4×4\times and 6×6\times accelerations. Additionally, our framework remains compatible with Gaussian-denoising networks, as they can always be incorporated into our prior using H=IH = I.

The novelty in the convergence analysis (namely, the proof of Theorem 2) is unclear to me. To my understanding, the assumptions and the result are along the same lines as in the paper entitled “A Guide Through the Zoo of Biased SGD” by Demidovich et al. (which the authors have cited).

Prompted by your feedback, the revised paper discussed the goal of presenting Theorem 2. Our contribution is not the general analysis of biased SGD, but rather showing that even a restoration network not trained to be an ideal MMSE estimator can maintain stable convergence performance within our framework (as shown in Figure 2). This analysis is crucial as it provides theoretical grounding for the stability of ShaRP, especially in practical scenarios where the learned restoration model may be imperfect.

Although the authors show the performance of shaRP trained in an unsupervised manner, I don’t think the training loss used in the unsupervised case (e.g., the loss in Algorithm 3) corresponds to a restoration operator that approximates the conditional expectation E[xs,H]E[x|s,H]. Consequently, the theoretical analysis does not explain the experimental results with a restoration operator trained in an unsupervised manner.

Thanks for pointing it out. We have included the full proof in the new Supplement A.3. As shown in the new Theorem 3, in expectation learning using the loss in Algorithm 3 is equivalent to learning using the fully-supervised L2 loss.

Page 1: “...Tweedie’s formula (Robbins, 1956; Efron, 2011) seemingly implies that Gaussian denoising alone might be sufficient for learning priors,...”: That is indeed true and this work does not disprove this fact.

Indeed, we do not seek to challenge the Tweedie formula, but rather emphasize that Gaussian-denoiser networks are suboptimal when used as priors to restore images degraded by other, non-Gaussian artifacts. Hence, the question is not whether Gaussian denoising can enable learning priors, but optimality of these priors when used to solve inverse problems.

Page 2: “...ShaRP provides a richer and more flexible representation of image priors…”: At this stage, it is rather vague as to what “richer and more flexible” means.

We revised that sentence for clarity: “The key benefit of ShaRP relative to prior work lies in its versatility, enabling seamless integration of a wide-range of restoration models trained on multiple degradation types…”

Page 2: “Unlike Gaussian denoisers, the restoration models in ShaRP can often be directly trained in a self-supervised manner”: Even Gaussian denoisers can be trained in an unsupervised manner using only noisy images (e.g., using a SURE loss).

Thanks for pointing this out. We revised that sentence for clarity: “Unlike Gaussian denoisers, the restoration models in ShaRP can often be directly trained without fully sampled measurement data.” Self-supervised Gaussian denoiser training methods, such as SURE, rely on noisy but fully-sampled data. We include an additional discussion highlighting the differences between self-supervised training for denoising and for restoration/reconstruction.

It might be good to put some of the math background in the intro, effectively shortening the material in the first three pages.

There is not much math that can be easily moved into the intro. However, we will make an effort to include more in the camera-ready version.

评论

Page 3: “We introduce a novel regularization concept for inverse problems that encourages solutions that produce degraded versions closely resembling real degraded images.”: Firstly, this would be the case if the degradation is the same for which the restoration network is trained, not the forward operator “A” for which you want to solve the inverse problem. Secondly, can one not promote this property using a simple Gaussian denoiser together with a data-consistency loss?

We have revised this section for clarity.

  1. Note that the "real degraded images" in the original manuscript did not refer to the measurement yy. Our regularized in eq. (6) is minimized if degraded versions of x are highly probable in the distribution p(sH)p(s|H), where H is sampled from p(H)p(H). In other words, a solution xx is considered good if, for all considered HH, its degraded version HxHx matches the degraded versions HxHx^* of clean images xp(x)x^{*}\sim p(x).

  2. The prevailing approach in the literature is to use Gaussian-denoiser networks as priors. Tables 1 and 3 in our paper show that our prior can do better by leveraging restoration networks better tailored to the inverse problem. This is due to the suboptimality of Gaussian-denoiser networks when applied to the iterates of reconstruction algorithms that are degraded by other, non-Gaussian artifacts.

Theorem 1: “Assume that the prior density pxp_x is non-degenerate over RR”: This is almost always not true.

This assumption is only for mathematical convenience and our analysis can be generalized to the degenerate case. Note also that technically, we can always approximate the degenerated density with a filtered version using an arbitrary small Gaussian kernel to make it fully-supported.

Assumption 3: Define what “b(x)” is.

The definition of b(x) is given in eq. (9) of the revised paper.

评论

Thanks for the response.

I am not quite convinced about the argument pointing at the inadequacy of Gaussian-denoiser networks for solving general inverse problems (at least from a theoretical point of view). As I understand, a Gaussian denoiser is enough to learn the image prior, while the data fidelity term (which should be chosen based on the noise corrupting the measurement) should guide an iterative reconstruction method to deal with the noise in the data. Moreover, if one wants to ensure that the learned prior is independent of the forward operator (so that it generalizes to any forward operator in principle), one should prefer a Gaussian denoiser over a restoration network (corresponding to a set of degradation operators).

Regarding Theorem 2: I understand what the theorem seeks to establish. My question was: Does Theorem 2 follow as a corollary of the existing results on SGD? If so, it would be mildly misleading to claim this as a theoretical contribution of this paper.

Why is the proof in A.3 written for the specific case of undersampled MRI? Is it because of Assumption 4, which is specific to this case? If so, then the statement "the restoration models in ShaRP can often be directly trained without fully sampled measurement data" needs some clarification (that it only applies to a sub-class of linear inverse problems, CS-MRI being one of them).

Thanks for clarifying the phrase "real degraded images". I would argue that it is probably more relevant to promote solutions that are consistent with the measurement yy, and not measurements generated using a randomly sampled degradation operator.

Overall, I would like to maintain my original evaluation score for the paper, primarily because the contributions are somewhat limited in my view.

评论

Thank you for the additional feedback. We provide additional responses below. Please let us know if there is anything we can provide to increase your score for our paper.

I am not quite convinced about the argument pointing at the inadequacy of Gaussian-denoiser networks for solving general inverse problems (at least from a theoretical point of view). As I understand, a Gaussian denoiser is enough to learn the image prior, while the data fidelity term (which should be chosen based on the noise corrupting the measurement) should guide an iterative reconstruction method to deal with the noise in the data. Moreover, if one wants to ensure that the learned prior is independent of the forward operator (so that it generalizes to any forward operator in principle), one should prefer a Gaussian denoiser over a restoration network (corresponding to a set of degradation operators).

Your statement reflects several misconceptions that we believe can be addressed to better clarify our work.

  1. We agree with the premise that Gaussian denoiser networks are effective at learning image priors. While Gaussian-denoiser networks can learn the prior, the network is only optimal on images corrupted with pure Gaussian noise. For example, a Gaussian-denoiser network is suboptimal within an iterative reconstruction algorithm due to the image being corrupted by structured artifacts. Our work demonstrates that by training on more complex and structured noise types—specific to different degradation operators—we can achieve significant improvements in performance.
  2. It is important to note that a Gaussian denoiser inherently assumes a forward operator H = I. It is well-known that Gaussian denoiser networks trained on multiple noise levels outperform those trained on a single noise level. Our work builds on this understanding by extending the argument: networks trained on a variety of “generalized noise” types—arising from different degradation operators, including H = I—can deliver even better results. This perspective complements, rather than contradicts, the demonstrated utility of Gaussian denoisers in learning image priors.

Regarding Theorem 2: I understand what the theorem seeks to establish. My question was: Does Theorem 2 follow as a corollary of the existing results on SGD? If so, it would be mildly misleading to claim this as a theoretical contribution of this paper.

Our main claim is Theorem 1. We do not claim the analysis of SGD novel, but see value in its application to show stability of ShaRP. We will revise the paper by explicitly stating that the proof of Theorem 2 is the application of existing results on SGD and citing the literature.

Why is the proof in A.3 written for the specific case of undersampled MRI? Is it because of Assumption 4, which is specific to this case? If so, then the statement "the restoration models in ShaRP can often be directly trained without fully sampled measurement data" needs some clarification (that it only applies to a sub-class of linear inverse problems, CS-MRI being one of them).

We believe this is a minor wording issue. We will rephrase the statement to "the restoration models in ShaRP can sometimes be directly trained without fully sampled measurement data". The proof is based on Assumption 4, which is applicable to several important inverse problems, including MRI, inpainting, optical diffraction tomography, and interferometric astronomy.

Thanks for clarifying the phrase "real degraded images". I would argue that it is probably more relevant to promote solutions that are consistent with the measurement, and not measurements generated using a randomly sampled degradation operator.

Your comment seems to suggest a possible misunderstanding of our method. As shown in line 5 of Algorithm 1, the gradient of the data-fidelity term g is explicitly incorporated, ensuring that our algorithm consistently promotes solutions aligned with the measurements y=Ax+ey=Ax+e. Simultaneously, the prior in Eq. (6) encourages solutions that are highly probable degraded versions of natural images across all considered degradations Hp(H)H\sim p(H).

评论

Again, I'm afraid I have to disagree with the rebuttal. The optimality of Gaussian denoisers for learning the image prior has nothing to do with the noise being Gaussian or non-Gaussian for the intermediate iterates of an iterative scheme. The optimality of Gaussian denoisers as image prior is a consequence of Tweedie's formula. The misconception is on the part of the authors and not on my part. Further, a Gaussian denoiser trained on multiple noise levels is still independent of any specific degradation operator, which, as one might argue, is a desirable feature if one is interested in the generalizability of the learned prior. I am, however, willing to accept that a more general restoration network can perform better "empirically".

"Your comment seems to suggest a possible misunderstanding of our method. As shown in line 5 of Algorithm 1, the gradient of the data-fidelity term g is explicitly incorporated, ensuring that our algorithm consistently promotes solutions aligned with the measurements y=Ax+ey=Ax+e". There is no misunderstanding here. That is what I referred to in my last communication. I meant to say that it is irrelevant whether the solution is consistent with degraded measurements corresponding to a randomly sampled degradation operator (which is different from the true forward operator).

In short, I don't see any concrete reasons to increase my score.

评论

Again, I'm afraid I have to disagree with the rebuttal. The optimality of Gaussian denoisers for learning the image prior has nothing to do with the noise being Gaussian or non-Gaussian for the intermediate iterates of an iterative scheme. The optimality of Gaussian denoisers as image prior is a consequence of Tweedie's formula. The misconception is on the part of the authors and not on my part. Further, a Gaussian denoiser trained on multiple noise levels is still independent of any specific degradation operator, which, as one might argue, is a desirable feature if one is interested in the generalizability of the learned prior. I am, however, willing to accept that a more general restoration network can perform better "empirically".

We appreciate the reviewer's time and effort in engaging with us. We'd like to clarify that while Tweedie's formula establishes a connection between a denoiser and the score function (e.g., a possible image prior given by logp(y))\nabla log p(y)), it does not inherently imply optimality in any general sense. If denoisers were indeed optimal, methods using them as priors would universally yield superior performance on arbitrary reconstruction tasks, which is clearly not the case.

In practice, the effectiveness of a denoiser is tied to how well the noise model it's trained on matches the actual noise in the image. Our approach allows the network to learn a more robust and generalizable prior by training on a diverse set of "generalized noise" (including those from various degradation operators). This enables learning priors from incomplete measurements (Tweedie's formula doesn't say anything about this), a capability that traditional Gaussian-denoiser approaches lack. Furthermore, our empirical results demonstrate the superior performance of our method. We appreciate the reviewer's acknowledgment of our method's empirical advantages. We believe this work offers a valuable contribution to the field by providing a more generalized and effective approach to learning image priors.

审稿意见
3

This paper extends the work of [Hu et al., 2024c] for the regularization of inverse problems using a deep restoration prior trained on multiple restoration tasks. The authors theoretically analyze the induced objective function and the convergence of the proposed algorithm. Empirical results demonstrate that the framework outperforms Gaussian denoiser priors in image reconstruction tasks, including MRI and super-resolution.

优点

  1. The experiments show that training a deep restoration network on multiple tasks can improve its regularization capability.

  2. Theoretical analysis is also provided.

缺点

  1. The paper seems limited in novelty and closely resembles [Hu et al., 2024c]. The main contribution is the training of a restoration prior across the same task with varying levels of ill-posedness. The authors should clearly explain how the proposed methodology differs from [Hu et al., 2024c].

  2. The authors claim that the proposed prior trained on a general restoration task outperforms Gaussian denoisers; however, there is a lack of sufficient experiments to support this claim. Gaussian denoisers can be used for solving general inverse problems, but the experiments presented here are confined to the same inverse problem and forward operator on which the prior was trained, although with different levels of ill-posedness. For example, in Section 5.1, the restoration prior is trained on MRI with an x8 acceleration and applied to the same problem with x4 and x6 acceleration factors. Although Algorithm 1 suggests that the proposed prior can be used to solve any other inverse problems, the experiments are limited to the same problem type. To validate the claimed superiority over Gaussian denoisers, the authors are requested to include experiments where the trained prior is applied to a different inverse problem from the one it was originally trained on. For example, training the prior on super-resolution and using it for solving MRI reconstruction. You can also train your prior on multiple distinct tasks as suggested by Algorithm I, for example training the prior on super-resolution and image in-painting tasks, and use it for solving a different task like MRI. This comparison is crucial as Gaussian denoisers can solve any other different inverse problems once trained.

Hu, Yuyang, et al. "A Restoration Network as an Implicit Prior." ICLR 2024.

问题

Please take a look at the comments under weaknesses

评论

We thank the Reviewer for their time and feedback. Please see below for our point-by-point responses to your comments.

The paper seems limited in novelty and closely resembles [Hu et al., 2024c]. The main contribution is the training of a restoration prior across the same task with varying levels of ill-posedness. The authors should clearly explain how the proposed methodology differs from [Hu et al., 2024c].

The revised manuscript explicitly highlights differences between our work and Hu et al., [2024c]. Our work introduces a new theoretical framework and algorithm, resulting in significant performance gains as extensively detailed in the paper. The similarities are superficial and do not reflect the underlying novelty of our approach.

Summary of the new contributions of this work:

  1. New regularizer: Our regularizer in eq. (6) is completely novel and different from that in [Hu et al. 2024c]. There is no analogous regularizer in the literature, as ours is the first one to provide a closed-form expression for a prior incorporating multiple restoration networks.

  2. New algorithm: Algorithm 1 is a completely novel stochastic algorithm that is different from the deterministic algorithm in [Hu et al. 2024c]. To the best of our knowledge, no existing iterative reconstruction algorithm in the literature uses a degradation-restoration pair sampled at each iteration.

  3. New theory: Our Theorem 1 is completely novel. It shows that our regularizer can be efficiently minimized using SGD. [Hu et al. 2024c] doesn’t have any analogous result. Training from undersampled measurements: Our ShaRP method doesn’t require fully sampled MRI measurements for training, which is not possible using the DRP method in [Hu et al. 2024c]. This flexibility is an essential for MRI and is a breakthrough not achieved by any existing PnP method in the literature.

  4. Much better performance: We show that ShaRP outperforms DRP from [Hu et al. 2024c] across all experiments. It is on average 3.65 dB better than DRP on CS-MRI in Table 1 and 0.5 dB better than DRP on super-resolution in Table 3.

评论

The authors claim that the proposed prior trained on a general restoration task outperforms Gaussian denoisers; however, there is a lack of sufficient experiments to support this claim. Gaussian denoisers can be used for solving general inverse problems, but the experiments presented here are confined to the same inverse problem and forward operator on which the prior was trained, although with different levels of ill-posedness. For example, in Section 5.1, the restoration prior is trained on MRI with an x8 acceleration and applied to the same problem with x4 and x6 acceleration factors. Although Algorithm 1 suggests that the proposed prior can be used to solve any other inverse problems, the experiments are limited to the same problem type. To validate the claimed superiority over Gaussian denoisers, the authors are requested to include experiments where the trained prior is applied to a different inverse problem from the one it was originally trained on. For example, training the prior on super-resolution and using it for solving MRI reconstruction. You can also train your prior on multiple distinct tasks as suggested by Algorithm I, for example training the prior on super-resolution and image in-painting tasks, and use it for solving a different task like MRI. This comparison is crucial as Gaussian denoisers can solve any other different inverse problems once trained.

  1. Prompted by your feedback, we clarified our claims in the paper. Our claim is not that any restoration network is better than the Gaussian-denoising network for a given inverse problem. Rather, we argue that the versatility of our framework allows for tailoring the prior to better align with the specific characteristics of the inverse problem, providing an advantage over Gaussian-denoising networks. This perspective is supported by our experiments, which show that restoration networks trained to remove MRI artifacts at 8×8\times can serve as effective priors for MRI reconstruction at 4×4\times and 6×6\times accelerations (Fig. 3). Our framework also remains compatible with Gaussian-denoising networks, as they can always be incorporated into our prior using H=IH = I.

  2. Note that without our framework, the direct application of a restoration model trained for 8×8\times MRI reconstruction to 4×4\times and 6×6\times results in a significant performance degradation (see Section C.2 of the supplement). Our framework enables a principled integration of the 8x model as prior without re-training and without performance degradations.

  3. We have included a new experiment in the Supplement E.1. Per your suggestion, we trained a super-resolution network for MRI and applied it as a restoration prior for solving 4x4x MRI reconstruction problems. As demonstrated in Section E.1, the SR model can serve as an effective restoration prior for the mismatched CS-MRI task, outperforming denoisers in certain scenarios, such as 4×4\times uniform CS-MRI.

评论

Thank you for revising the manuscript and conducting the requested experiments. As you explained in the revised manuscript and shown in Table 7, the primary advantage of the proposed algorithm happens when training a prior on the same forward model across different levels of ill-posedness. While this is interesting and can be practical for certain applications, I still believe the paper lacks sufficient novelty for publication at ICLR.

评论

Thank you for the additional feedback. We provide additional responses below. Please let us know if there is anything we can provide to increase your score for our paper.

Thank you for revising the manuscript and conducting the requested experiments. As you explained in the revised manuscript and shown in Table 7, the primary advantage of the proposed algorithm happens when training a prior on the same forward model across different levels of ill-posedness. While this is interesting and can be practical for certain applications, I still believe the paper lacks sufficient novelty for publication at ICLR.

We respectfully disagree on the novelty, but let us clarify some of your misconceptions:

  1. Mathematically our proposed prior in eq. (6) can include a wide-variety of restoration networks, including Gaussian denoisers, super-resolution networks, inpainting networks, and MRI reconstruction networks. Nothing in our formulation restricts our framework to use the same forward model. To the best of our knowledge, no prior work has introduced a prior with this level of versatility.
  2. Section 5.2 shows how our algorithm outperforms denoiser-prior baselines when the prior is trained on a different forward model. For example, a prior trained on deblurring performs better than a denoiser when applied to the single-image super-resolution task.
  3. Your comment overlooks another key advantage of our approach: the ability to train priors without requiring fully-sampled data. This is a significant benefit in scenarios where training Gaussian denoisers is infeasible, giving our framework a clear edge.
评论

Thank you all for the feedback. We provide detailed answers to all the comments below. To better address some of them, we ran additional experiments. These results were included in the dedicated sections of the revised supplementary material. We have highlighted the revised parts in the paper and the supplementary material with orange.

Below, we address two general questions that highlight the motivation for our work:

Why go beyond Gaussian denoisers to general restoration priors?

An important point often overlooked in the literature is that Gaussian-denoiser networks are suboptimal when applied to images degraded by other, non-Gaussian artifacts. In this paper, we derive a closed-form prior that leverages multiple restoration networks that can better deal with non-Gaussian artifacts due to the inverse problem. As shown in the paper, this leads to performance that significantly surpasses methods using Gaussian-denoising networks as priors.

Why propose a new prior based on restoring from multiple types of degradations? The strength of our framework lies in its versatility, enabling seamless integration of a wide-range of restoration models within a unified formulation. Note how our approach achieves substantial performance improvements over methods using Gaussian-denoising priors, as well as the DRP method from Hu et al. (2024c), which relies on a single restoration model (see Tables 1 and 3). Additionally, in principle, our framework remains compatible with Gaussian-denoising networks, as they can be seamlessly incorporated into our prior using H=IH = I. Furthermore, our prior can be trained without fully-sampled measurements: Unlike Gaussian denoisers, ShaRP models can be trained without requiring fully sampled/clean measurement data, making it more versatile (see Table 2).

评论

Thank you all for the additional feedback. One aspect of the review process that surprised us was the assessment regarding a perceived lack of novelty in our work. We respectfully disagree with this evaluation and emphasize that any resemblance to prior work is purely superficial. While we recognize that no paper is without limitations, our work introduces several new ideas, theories, and results that we believe warrant recognition as novel contributions.

  1. More complex ensemble of denoisers: It is known that an ensemble of Gaussian denoisers trained at different noise levels does better than a single denoiser trained at one noise level. Our work shows that extending the “ensemble of denoisers” to include more complex (structured) noise types—due to each degradation operator—can further improve the performance. It is rather surprising that this non-trivial extension to a more complex “ensemble of denoisers” can still be used as an implicit prior. It is worth emphasizing that our framework can seamlessly incorporate both Gaussian denoisers and more complex structured denoisers into one closed-form implicit prior. This theoretical and conceptual novelty in our work deserves to be acknowledged.
  2. Training from undersampled measurements: It is not possible to train Gaussian denoisers without fully-sampled measurements. Our framework provides an elegant solution by suggesting to train restoration networks on available undersampled measurements and using them as an implicit prior without retraining. This is a non-trivial contribution that deserves acknowledgement.
  3. State-of-the-art performance: Our claims are backed up by impressive empirical performance on two separate inverse problems, MRI reconstruction and image super-resolution. This improvement due to the incorporation of non-Gaussian denoisers—or as we call them restoration operators—deserves to be acknowledged.
AC 元评审

This paper addresses deep learning reconstruction for computational imaging. It builds on the work on deep restoration priors by Hu et al. from 2024 by introducing multiple degradation operator in place of one, and using a different optimization algorithm which is an adaptation of SNORE (Renaud et al 2024b). This new combination results in improved numerical results, in particular on subsampled MRI.

Reviewers made positive remarks about the clarity and the numerical results, but there has been a broad agreement that the novelty in this paper relative to Hu et al. 24 and other prior work is limited and not sufficient for acceptance at ICLR. While the rebuttal phase has been intense, no fundamentally new information emerged from the discussions, while the reviewers made precise arguments to support their position.

The manuscript may benefit from a "full-disclosure" rewrite that proactively addresses the criticism of N69Z, XTiX, and E3Gi. A focus on domain-specific numerics (MRI, medical imaging in general) and practical impact may further strengthen the paper. The reviewers provided precise, detailed remarks that should help guide these revisions effectively.

While the leading-order criticisms were about novelty, part of the dialogue between the authors and the reviewers revolved around the meaning of "prior" and its relation to denoisers via Tweedie's formula. A prior distribution doesn't depend on a specific observation process. Priors learned by Gaussian denoisers (via Tweedie) and "priors" that result from restoration networks which, as the authors explain, are tuned to a particular problem, are different objects. The authors are justified in stating that "In practice, the effectiveness of a denoiser is tied to how well the noise model it's trained on matches the actual noise in the image". But this is a statement about structured denoising more than about priors. A bona fide prior and a known forward model allow to compute any estimator, at least in principle. They can be used to generate a dataset and do any kind of ML. The question is how to build effective algorithms that exploit this (with finite samples) and at this point one may obtain improved results by schemes such as the one proposed by the authors. This distinction might benefit from further clarification in the revised manuscript.

For these reasons I am recommending a rejection. Nonetheless, I hope that the authors will find the reviewers' feedback and the discussions valuable as they work on refining their paper.

审稿人讨论附加意见

Most reviewers initially pointed out the limited novelty relative to Hu et al 2024, Renaud et al 2024b, and other prior work. The authors rebuttal consisted primarily in amplifying their claims of novelty, but it did not address the reviewers' criticism in a substantial way. E3Gi responded by providing a very detailed argumentation for their assessment of novelty. The discussion then escalated in intensity but no new facts emerged which would challenge these arguments. N69Z offered a similar assessment of novelty and in particular pointed out that the method is primarily useful when applying to the same imaging problem with different level of ill posedness. XTiX pointed out the universality of Gaussian denoisers in learning priors to which the authors responded by discussing structured denoising, but not truly priors. bQ8e concurred with XTiX. The authors insisted on their interpretation and I feel it might have been more productive had they acknowledged the fact that Gaussian denoisers fully characterize prior distributions, whereas structured denoisers don't.

My decision is based on a careful analysis of the discussions and on the fact that the reviewers' arguments were expertly, substantial, and correct. The majority of new information that emerged in the discussion was from the reviewers' side. Finally, there was a clear consensus among reviewers, especially after rebuttals.

最终决定

Reject