PaperHub
8.2
/10
Spotlight3 位审稿人
最低4最高6标准差0.8
5
6
4
3.3
置信度
创新性2.3
质量3.0
清晰度3.0
重要性2.7
NeurIPS 2025

Zero-shot Denoising via Neural Compression: Theoretical and algorithmic framework

OpenReviewPDF
提交: 2025-05-11更新: 2025-10-29
TL;DR

We propose a theoretically-founded compression-based zero-shot image denoising method.

摘要

关键词
compression-based denoisingzero-shot image denoisingneural compression

评审与讨论

审稿意见
5

The authors propose a novel method for zero-shot image denoising via neural compression in the single-image setting. Specifically, given a single noisy image, an encoder-decoder neural network with a rate constraint is trained to compress and decompress overlapping patches extracted from the image. Once trained, the denoised image is reconstructed by applying the compression-decompression process to all patches and aggregating the outputs. The approach builds on information-theoretic principles, assuming that clean images are more structured and thus more compressible than noisy ones. The rate constraint naturally biases the model toward reconstructing structured (clean) components while suppressing unstructured (noisy) variations.

优缺点分析

The proposed method is well-justified theoretically and backed by finite-sample performance guarantees for both AWGN and Poisson noise models. Extensive experiments are conducted on diverse image domains (natural, microscopy, real-world camera images) and across multiple noise types. The results are consistently strong and demonstrate state-of-the-art performance among zero-shot methods. The architecture and optimization procedure are well-detailed, and the authors provide reproducibility information.

问题

I do not have questions for the authors.

局限性

The authors address the limitations in section 6.

最终评判理由

I have no concerns about this paper.

格式问题

I do not have formatting concerns.

作者回复

We appreciate the reviewer’s positive feedback.

审稿意见
6

This paper is composed of two parts. The first part focuses on the relationship between compression and denoising, and presents several theoretical results that give upper bounds to the denoising performance of a compression-based denoiser. The second part applies compression-based denoising to the "zero-shot" setting, where only a single noisy image is available. The original high-resolution image is broken up in small patches that are used to train a neural compressor. Numerical experiments demonstrate the method performs comparably to other zero-shot baselines.

优缺点分析

Compression-based denoising

The paper makes novel contributions to compression-based denoising through Theorems 1 to 3, which show that better compression performance (as measured by rate and distortion) leads to an improved upper bound on denoising performance. However, it is in my opinion difficult to evaluate these results due to the choice of setting. Indeed, denoising performance is here measured in the worst case over a deterministic class of signals, rather than average MSE over a signal distribution as is more common. In particular, this is the metric that is used in the numerical comparisons of Table 1. Thus, it is unclear if additional factors compared to optimal MMSE performance (such as log2n\log_2 n in Corollary 1) are due to this choice of setting, looseness of the bounds, or a fundamental underperformance of compression-based denoising.

Related to the above point, the authors propose to select the codeword cc via a maximum a posteriori procedure, rather than minimum mean squared error (eqs. (2) vs (3)). Although it is stated that the former is always the best choice, it seems to me that the upper bound in Theorem 3 becomes more advantageous to that of Theorem 2 in the regime of small rates and large distortions.

Application to zero-shot denoising

A strength of the paper is its experimental results, which show superior zero-shot denoising performance to other approaches. I note that there are other very recent concurrent works, which build on ZS-N2N and may be relevant to this paper:

Bai, J., Zhu, D., & Chen, M. (2025). Dual-sampling noise2noise: Efficient single image denoising. IEEE Transactions on Instrumentation and Measurement.

Ma, Q., Jiang, J., Zhou, X., Liang, P., Liu, X., & Ma, J. (2025). Pixel2Pixel: A Pixelwise Approach for Zero-Shot Single Image Denoising. IEEE Transactions on Pattern Analysis and Machine Intelligence.

As defined by NeurIPS policy for concurrent works, these works do not qualify as prior work, but could be of interest to the authors.

A critical component of the method is the chosen patch size. I could not find it mentioned in the paper (though inspection of the figure indicates that it might be 8×88 \times 8?). How was it selected? It seems to me that different choices of patch sizes would greatly affect the results. For instance, a small patch size limits the range of spatial dependencies in the signal that can be exploited by the denoiser, effectively limiting it to small noise levels. However, a large patch size leads to a small number of patches available to train the compressor.

问题

  • Do the authors expect the dependence of their bounds in rate RR and distortion δ\delta to be tight?
  • The authors propose to select the RD tradeoff parameter λ\lambda such that the distortion of the compressed noisy patches becomes comparable to the noise level. Another natural choice would be to optimize the upper bounds given in the Theorems along the RD curve. For instance, Theorem 1 suggests to set λ=(8ln2)(1+2η)2σz2n\lambda = (8\ln 2)(1 + 2\sqrt \eta)^2\frac{\sigma_z^2}{n}. How do these two criteria compare?
  • More broadly, I am surprised that "compression performs better at denoising than denoising". That is, why does compressing 8x8 patches would outperform an approach tailored to denoising such as SURE or any other mentioned in the paper? The authors mention an increased robustness to overfitting, but it is not intuitive to me why a compressor would overfit less than a denoiser (e.g., in SURE, the divergence of the denoiser is explicitly regularized).

局限性

yes

最终评判理由

I have no remaining concerns.

格式问题

none

作者回复

We thank the reviewer for their valuable feedback. We first address the specific questions raised, followed by a response to the points noted in the weaknesses section. In our response to Question 1, we also address some of the comments regarding the choice of settings in the theoretical results, as mentioned in the Strengths And Weaknesses.

Q1. Tightness of bounds in Section 3.

Our analysis is framed in a deterministic, high-probability setting: we provide uniform high-probability guarantees on reconstruction error for signals in a fixed class, without assuming any prior distribution. This framework is common in the information-theoretic literature on compression-based inference (e.g., compression-based compressed sensing), and it offers practical advantages, particularly in the zero-shot denoising scenario where the underlying data distribution may be unknown or difficult to model. The deterministic nature of our bounds ensures they hold uniformly over the signal class, rather than in expectation.

Regarding the tightness of the bounds, in the discussion following Corollary 1, we have compared our result with the optimal Bayesian performance (in terms of information dimension). In the revised version, we will also include a comparison to known minimax rates for kk-sparse signal recovery under Gaussian noise. For the case k=knk=k_n, such that kn/n0k_n/n\to 0, as nn grows without bound, the minimax risk in known to scale as σz2klog(n/k)n{\sigma_z^2 k \log(n/k) \over n} [Johnstone2019] [Donoho1994] [Donoho1992]. This further confirms the tightness of our bound in the worst-case setting, and potential optimality of compression-based compressed sensing.

Finally, while in Table 1 we report average denoising performance, our method is designed to perform well on each individual instance, consistent with the worst-case theoretical guarantees we provide. This aligns with the goals of zero-shot denoising, where robustness to the specific realization is more critical than expected-case optimality.

[Johnstone2019] Johnstone, I. M. Gaussian Estimation: Sequence and Wavelet Models.

[Donoho1994] Donoho, D. L., et al. Minimax risk over p\ell_p balls for q\ell_q losses.

[Donoho1992] Donoho, D. L., et al. Maximum entropy and the nearly black object.

Q2. Selecting λ\lambda based on theoretical upper bounds.

Theorem 1 provides a non-asymptotic upper bound on the reconstruction error, assuming that a compression code with a given rate RR and distortion δ\delta is available. The constants appearing in the bound (such as (8ln2)(1+2η)2(8 \ln 2)(1 + 2\sqrt{\eta})^2) characterize the relationship between rate and MSE, but they are not intended to prescribe a specific choice of the Lagrange multiplier λ\lambda used in training.

Indeed, in principle, one could consider minimizing the upper bound in Theorem 1 as a function of δ\delta (or RR), which in turn could suggest an “optimal” rate-distortion operating point for denoising. However, solving such an optimization is highly non-trivial. Specifically, note that the upper bound in Theorem 1 can be written as

u(\delta) = \sqrt{\delta} + \sigma_z \zeta \sqrt{R(\delta)}, $$where $\zeta = 2 \sqrt{\frac{2 \ln 2}{n}}(1 + 2\sqrt{\eta})$, and $R(\delta)$ is the rate corresponding to distortion $\delta$. Differentiating and setting $u'(\delta) = 0$ gives:

\frac{1}{2\sqrt{\delta}} + \sigma_z \zeta \cdot \frac{R'(\delta)}{2\sqrt{R(\delta)}} = 0. $$However, since the function R(δ)R(\delta) is generally not known in closed form for real-world data distributions or learned compressors, this optimization cannot be carried out analytically or even tractably in practice.

In contrast, during training, λ\lambda is used as a Lagrange multiplier in a rate-distortion objective to implicitly select an operating point on the empirical RD curve of the single image. This RD curve is not analytically accessible and varies from image to image, making the mapping from λ\lambda to (R,δ)(R, \delta) nontrivial and data-dependent.

Our proposed heuristic, choosing λ\lambda such that the distortion roughly matches the noise level, is based on signal recovery intuition and performs well empirically, as demonstrated in our experiments and in Section 4, "Setting the hyperparameter λ\lambda".

Q3. Seemingly surprising effectiveness of compression at denoising.

This is an insightful and intuitive question. While it may seem surprising that a compressor, which is not explicitly trained to denoise, can outperform methods designed specifically for denoising, the idea of using compression codes to solve inverse problems such as denoising and compressed sensing has a long history in information theory and signal processing. Prior work has shown that lossy compression codes can be used as a powerful implicit prior for solving inverse problems (see, e.g., [Donoho2002, Weissman2005, Jalali2016, Rezagah2017, Chang1997, Chang2000, Natarajan2002]).

In the zero-shot setting, where only a single noisy observation is available and no clean data is accessible, most learning-based denoisers rely on indirect forms of regularization to prevent overfitting. These include masking strategies (ZS-N2S), architectural bias (DIP), or underparameterized networks (Deep Decoder). While useful, these techniques are often heuristic and do not provide a principled way to control information flow from input to output.

In contrast, compression-based denoising imposes a fundamental information-theoretic constraint via the rate-distortion trade-off. This restricts the number of bits that can pass through the model's bottleneck, acting as a principled regularizer that prevents the model from memorizing noise. The model is therefore implicitly biased toward compressible (structured) reconstructions, which aligns well with the nature of clean signals.

Although SURE provides an unbiased estimate of the MSE under Gaussian noise with known variance, its effectiveness depends on accurate noise modeling and stable estimation of the divergence term. In practice, especially in the zero-shot setting with a single noisy image and no access to clean data, these conditions may not hold. Moreover, SURE implementations often rely on Monte Carlo approximations of the divergence, which can introduce significant variance, particularly in deep neural networks.

Thus, while traditional denoising methods are explicitly designed for the task, compression-based approaches like ZS-NCD offer a robust and principled alternative in data-scarce regimes, combining inductive bias with effective optimization.

[Chang1997] Chang, S. G., et al. Image denoising via lossy compression and wavelet thresholding.

[Chang2000] Chang, S. G., et al. Adaptive wavelet thresholding for image denoising and compression.

[Donoho2002] Donoho, D. L. The kolmogorov sampler.

[Natarajan2002] Natarajan, B. K. Filtering random noise from deterministic signals via data compression.

[Weissman2005] Weissman, T., et al. The empirical distribution of rate-constrained source codes.

[Jalali2016] Jalali, S., et al. From compression to compressed sensing.

[Rezagah2017] Rezagah, F. E., et al. Compression-based compressed sensing.

[Soltanayev2018] Soltanayev, S., et al. Training deep learning based denoisers without ground truth data.

[Metzler2018] Metzler, C. A., et al. Unsupervised learning with stein's unbiased risk estimator.

W1. Better bound in Theorem 3 over Theorem 2.

Note that while the bound in Theorem 2 is on MSE, the bound in Theorem 3 is on RMSE. To compare the results of the two theorems, we can square the bound in Theorem 3, which shows that while the MSE in Theorem 2 scales as R/n\sqrt{R/n}, the MSE in Theorem 3 scales as R2/nR^2/n. This suggests that, unlike in Theorem 2, for the bound in Theorem 3 to be small, the rate RR cannot scale linearly with nn. Therefore, minimizing the log-likelihood over a compression code should, in principle, yield better performance under Poisson noise. However, due to the highly non-convex nature of the log-likelihood loss, minimizing the MSE loss may lead to better performance in practice. A brief discussion of this observation, along with empirical results, is provided in Appendix B.2.

W2. Two concurrent zero-shot denoisers.

We thank the reviewer for bringing these results to our attention. The recent works advance the ZS-N2N framework. We will cite both references in the Related Work Section (Section 2) of our manuscript. Additionally, we have tested DS-N2N and Pixel2Pixel on the Kodak24 dataset under AWGN using the public code provided by the authors. Average PSNR/SSIM over the 24 images are reported in the table below.

σ=15\sigma=15σ=25\sigma=25σ=50\sigma=50
DS-N2N32.31 / 0.880329.64 / 0.804425.42 / 0.6378
Pixel2Pixel31.31 / 0.870729.89 / 0.809826.55 / 0.6873
ZS-NCD (ours)33.18 / 0.902630.60 / 0.814427.89 / 0.7464

[Bai2025] DS-N2N: Bai, J., et al. Dual-sampling noise2noise: Efficient single image denoising.

[Ma2025] Pixel2Pixel: Ma, Q., et al. Pixel2Pixel: A Pixelwise Approach for Zero-Shot Single Image Denoising.

W3. Patch size in ZS-NCD.

We thank the reviewer for pointing this out. We indeed use a patch size of 8×88 \times 8 in all experiments and will clearly state this in the revised version of the paper.

Empirically, we observe that ZS-NCD is fairly robust to patch size, as long as the size is within a reasonable range, i.e., large enough to contain meaningful structure, but small enough to ensure enough training data per image.

To support this, we include below denoising results of ZS-NCD for different patch sizes, reported as PSNR (dB)/SSIM, on the kodim05 image from the Kodak24 dataset under AWGN.

σ\sigma4x48x816x1632x32
1531.76 / 0.924631.63 / 0.928631.66 / 0.928531.70 / 0.9280
2528.46 / 0.843928.81 / 0.861529.01 / 0.864728.92 / 0.8621
5024.79 / 0.726425.60 / 0.760825.58 / 0.760825.60 / 0.7613
评论

Thank you for your detailed reply, which has addressed my questions. It is interesting to see that indeed the optimal patch size increases with the noise level. It could be interesting in future work to study this further.

I will increase my score. This paper makes valuable theoretical and practical contributions to the problem of zero-shot denoising.

审稿意见
4

Focus Issue

Zero-shot denoising: Denoising observations without training samples or clean reference images.

Contributions

The paper proposes the Zero-Shot Neural Compression Denoiser (ZS-NCD):1.It optimizes on patches extracted from a single noisy image. The final reconstruction is obtained by aggregating outputs from the trained model over overlapping patches. 2.It provides novel finite-sample theoretical results describing the achievable upper bound on reconstruction error for maximum likelihood-based compression denoisers, further establishing the theoretical foundation of compression-based denoising.

Performance

ZS-NCD naturally avoids overfitting and eliminates the need for manual regularization or early stopping. ZS-NCD achieves state-of-the-art performance among zero-shot denoisers for both Gaussian and Poisson noise removal tasks.

优缺点分析

Quality and Clarity

The presentation of key issues and solutions is relatively clear, and the figures and tables are displayed in a neat and concise manner. Therefore, I tend to give a positive rating in terms of Quality and Clarity.

Significance and Originality

The proposed problem, "Zero-shot denoising via neural compression," is valuable, and its innovativeness is also recognizable. However, the baseline models used for comparison are somewhat outdated, with most dating back to 2020 or earlier. Additionally, I do not see a significant leading edge in the performance metrics. Therefore, the claim that "S-NCD achieves state-of-the-art performance among zero-shot denoisers for both Gaussian and Poisson noise removal tasks" fails to convince me. Furthermore, the experimental discussion regarding "naturally avoiding overfitting and eliminating the need for manual regularization or early stopping" is overly simplistic and almost glossed over. In summary, I tend to give a relatively conservative rating.

问题

In addition to the doubts raised in the "Strengths And Weaknesses" section, the article does not present the performance of the proposed method on completely clean images. Specifically, ZS-NCD achieves the desired results by overlapping and training patches from the same image. This raises the following questions:

Will it significantly degrade the quality of clean images?

Is the computational cost acceptable since a model has to be trained for each individual image?

Does a model trained on a specific image possess generalizability? If not, would it be impossible to complete the denoising task for a large number of images within a reasonable timeframe?

If the authors can address these questions, I would consider raising the score.

局限性

Yes

最终评判理由

We still have some unresolved concerns, and the authors have not provided further clarification. Nevertheless, we believe that this method represents the current state-of-the-art (SOTA), and some concerns are inherent issues within the field. Therefore, we have decided to give a score of 4.

格式问题

The text in images similar to Figure 6 is too small.

The appendix simply lists images without adequate elaboration.

作者回复

We thank the reviewer for their valuable feedback. We first address the specific questions raised, followed by a response to the points noted in the weaknesses section.

Q1. Will it significantly degrade the quality of clean images?

ZS-NCD learns a lossy compression code using the patches extracted from the input noisy image. When the input is a noisy image, enforcing a rate-distortion trade-off via a penalty on the rate effectively suppresses noise and yield a denoised image. The denoising performance depends on the rate constraint. To achieve the best performance, one needs to set the rate penalty such that the operating rate-distortion point is consistent with the noise level. If the input image is a clean noise-free image, the output may or may not be distorted depending on the rate penalty. In the case of no rate penalty (λ=0\lambda = 0), ZS-NCD is able to recover an almost lossless reconstruction of the clean image, provided the network bottleneck has sufficient capacity for the required entropy (as distortion goes to zero, required rate grows without bound).

This contrasts with methods like ZS-N2N, ZS-N2S and S2S, which inherently cannot recover a clean image due to disjoint-pixel training strategies or masking.

To further clarify this point, we empirically evaluated different methods on a clean image (kodim05 from Kodak24 dataset). The results, reported as PSNR (dB)/SSIM, are shown in the following table. nbn_b denotes the dimension of latent code in the bottleneck of ZS-NCD.

kodim05 (σ=0\sigma=0)
DIP42.02 / 0.9913
DD28.14 / 0.8736
ZS-N2N33.46 / 0.9845
ZS-NCD (nb=32n_b=32)35.52 / 0.9797
ZS-NCD (nb=128n_b=128)54.57 / 0.9997

Q2. Is the computational cost acceptable since a model has to be trained for each individual image?

Training a model per noisy image is an inherent aspect of zero-shot learning-based denoisers, which assume no access to training datasets or noisy/clean image pairs. This is true for all zero-shot learning-based denoisers, and not just ZS-NCD. This implies that compared to semi-supervised and supervised methods, such algorithms require a higher computational complexity during test time. However, the trade-off is justified in cases where access to data is very limited. This is true in various biomedical imaging and scientific imaging applications.

We provide timing and hardware details in Appendix C (lines 638–643). Note that ZS-NCD has not yet been optimized for speed. We believe that there is considerable room for reducing the training time through more efficient architecture designs, weight reuse and adaptive training. The goal of this work is to both empirically and theoretically demonstrate the promise of compression-based zero-shot learning-based denoising, as a novel framework.

Q3. Does a model trained on a specific image possess generalizability? If not, would it be impossible to complete the denoising task for a large number of images within a reasonable timeframe?

This is an interesting question. The notion of generalizability is different in zero-shot learning-based denoisers compared to supervised methods. In zero-shot settings, the model is trained and evaluated on a single noisy image. Therefore, generalization beyond the observed image is not a goal. However, exploring generalization across images is an intriguing question. Zero-shot denoising methods, such as DIP and deep decoder, are image-dependent and fail completely when applied to an unseen noisy image. On the other hand, ZS-NCD, due to its compression-based structure and the fact that it is trained on image patches, shows promising generalizability when trained on a noisy image and applied to another unseen noisy image.

To empirically verify this point, in the following two tables, we report the performance of ZS-NCD trained on a noisy image and used to denoise another image, for two different noise levels (σ=15\sigma=15 and σ=50\sigma=50). In each case, we also report the performance achieved when ZS-NCD is used in its original form, i.e., using the same training and test image. It can be observed that, despite expected performance drop, still ZS-NCD shows reasonable and promising generalization. kodim01 and kodim05 are chosen from Kodak24 dataset.

AWGN (σ=15\sigma=15)Test on kodim01Test on kodim05
Train on kodim0131.28 / 0.905926.71 / 0.8944
Train on kodim0530.29 / 0.887431.63 / 0.9286
AWGN (σ=50\sigma=50)Test on kodim01Test on kodim05
Train on kodim0126.01 / 0.724523.59 / 0.7300
Train on kodim0525.19 / 0.696625.60 / 0.7608

W1. More recent baselines comparison.

The most recent state-of-the-art learning-based zero-shot denoiser we were aware of at the time of submission was ZS-N2N [Mansour2023], published in 2023. ZS-N2N is included in our comparisons as reported for example in Table 1 of the manuscript. We have since become aware of two concurrent zero-shot denoising methods: DS-N2N [Bai2025] and Pixel2Pixel [Ma2025], pointed to us by Reviewer FvQP. Both algorithms are variations of ZS-N2N [Mansour2023].

In the revised version of our manuscript we will cite both papers in the Related Work section. Additionally, we have tested DS-N2N and Pixel2Pixel on the Kodak24 dataset using the public code provided by the authors on our own machines, in line with our policy of reproducing all denoisers performances under the same condition for fairness. In the table below, we compare the performance of our proposed ZS-NCD under AWGN. The table reports the average PSNR (dB) and SSIM over the 24 images in Kodak24 dataset.

σ=15\sigma=15σ=25\sigma=25σ=50\sigma=50
ZS-N2N (2023)32.30 / 0.865029.54 / 0.779825.82 / 0.6151
DS-N2N (2025)32.31 / 0.880329.64 / 0.804425.42 / 0.6378
Pixel2Pixel (2025)31.31 / 0.870729.89 / 0.809826.55 / 0.6873
ZS-NCD (ours)33.18 / 0.902630.60 / 0.814427.89 / 0.7464

[Mansour2023] ZS-N2N: Mansour, Y., & Heckel, R. (2023). Zero-shot noise2noise: Efficient image denoising without any data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 14018-14027).

[Bai2025] DS-N2N: Bai, J., Zhu, D., & Chen, M. (2025). Dual-sampling noise2noise: Efficient single image denoising. IEEE Transactions on Instrumentation and Measurement.

[Ma2025] Pixel2Pixel: Ma, Q., Jiang, J., Zhou, X., Liang, P., Liu, X., & Ma, J. (2025). Pixel2Pixel: A Pixelwise Approach for Zero-Shot Single Image Denoising. IEEE Transactions on Pattern Analysis and Machine Intelligence.

W2. Clarification on performance claims of ZS-NCD.

The claim refers to ZS-NCD's performance among learning-based zero-shot denoisers. To ensure that the wording of the claim is clear, in the revised version, we will emphasize that the comparison is within learning-based zero-shot denoising methods. Specifically, we will update that sentence as follows: ZS-NCD achieves state-of-the performance among learning-based zero-shot denoisers for both Gaussian and Poisson noise, as supported by the empirical results reported in Table 1 and Table 2 of the manuscript. In both tables, we have included the performance of BM3D as a strong classical non-learning-based baseline to provide broader context.

W3. Clarification on overfitting claims.

The statement in the abstract is further elaborated in Section 5, ``Robustness to Overfitting'' (lines 282–290) and is empirically supported by Figure 2. Specifically, ZS-NCD leverages the entropy constraint of the compression-based architecture, which inherently limits amount of information that can pass through the bottleneck. As a result, the model avoids overfitting, despite being trained on all noisy pixels. This is a key advantage of zero-shot compression-based denoising compared to other zero-shot approaches.

This claim is further verified in Figure 2, which shows the PSNR of the denoised image over training iterations. It can be observed that the PSNR of ZS-NCD improves steadily and does not show degradation that is typical to overfitting. In contrast, in other zero-shot denoising methods overfitting is a known issue and is typically mitigated manually. For example,

  • DIP and Deep Decoder rely on manual early stopping or under-parameterization.
  • ZS-N2S, S2S and ZS-N2N modify the loss function to exclude some pixels, which is another form of manual regularization.

In contrast, ZS-NCD does not require any of these manual heuristics and architectural constrains. In our approach, overfitting is prevented naturally by the compression-based bottleneck that does not allow the model to reproduce the high-entropy noisy input image. We believe that this is a key advantage of our method.

Paper formatting concerns.

We will update the manuscript to address all the formatting issues and have detailed explanations for the figures in the appendix.

评论

Thank you for addressing my questions with experimental results and detailed explanations.

The author selectively chose some images (Kodak01, Kodak05) to provide new experimental results, which do not comprehensively cover all images in Kodak24, so it does not fully convince me. Nevertheless, considering that this paper achieves state-of-the-art (SOTA) performance, I decide to raise my rating at the end.

评论

We thank the reviewer for recognizing the contribution of our work and deciding to raise the rating. Regarding generalizability, as we explained before, it is out of the scope of the zero-shot denoising problem. However, to address the reviewer’s comment, in our initial rebuttal we provided some simulation results that explored the generalization behavior of ZS-NCD by training the model on a noisy image and testing it on a different image.

In the simulation results presented in our initial rebuttal (in response to Q3), we picked two random images, namely kodim01 and kodim05, from the Kodak24 dataset. In the following tables, we provide a comprehensive comparison of the performance of ZS-NCD under two different settings.

The first table reports the performance achieved by ZS-NCD when applied in the original zero-shot setting, i.e., trained and tested on the same image. In other words, for each image in the Kodak24 dataset, we first created its noisy version (according to the desired noise power) and trained a ZS-NCD model. Then, we denoised the same image using the trained model. The reported number is the average performance (PSNR and SSIM) achieved across all 24 images for σ=15\sigma=15 and σ=50\sigma=50.

The following two tables present the average performance of ZS-NCD when there is a mismatch between the training and testing images. Column ii in the table, i=1,,24i=1,\ldots,24, corresponds to the average performance when a noisy version of image ii from the Kodak24 dataset is used for training. The reported average performance is computed on the remaining 23 images.

It can be observed that, while the performance drops when ZS-NCD is used in this alternative setting, typically the achieved performance remains reasonable.

Our final table is a 24×2424\times 24 table. Let ii and jj denote the row and column indices, respectively. The (i,j)(i,j)-th entry in this table is the PSNR achieved in denoising the ii-th image in Kodak24 dataset when ZS-NCD model is trained on image jj in the dataset. Here, the noise level is set as σ=15\sigma = 15. When comparing the entries within the ii-th column, we observe that even when there is a mismatch between the training and test image (i.e., iji \neq j), the model still achieves reasonable denoising performance. Also, for each image ii, the highest PSNR in row ii corresponds to the model that is trained on image ii, which aligns with the zero-shot setting.

AWGN (σ=15)(\sigma=15)AWGN (σ=50)(\sigma=50)
ZS-NCD (ours)33.18 / 0.902627.89 / 0.7464
(σ=15)(\sigma=15) kodim##:010203040506070809101112131415161718192021222324
PSNR27.5425.7530.3827.9531.5127.2329.5226.9530.0426.7928.9928.9826.9431.3230.3026.0127.7031.1828.4627.2928.2129.9830.4129.62
SSIM0.86070.85050.87460.87230.86960.86550.87490.85400.88530.86070.87890.86470.83780.87200.87490.86330.87450.87300.87520.87130.87240.87970.86370.8731
(σ=50)(\sigma=50) kodim##:010203040506070809101112131415161718192021222324
PSNR24.5523.3025.8424.9726.5624.7125.5324.2925.9624.4924.9824.6423.9126.5525.7923.4524.5526.3925.0024.4625.1025.8826.3126.11
SSIM0.68380.70080.70800.69390.67310.68220.70020.64960.71410.70580.70790.68820.62620.69060.69250.68070.70850.69520.71380.70980.69830.71860.70200.7012
评论
(σ=15)(\sigma=15)010203040506070809101112131415161718192021222324
0131.2827.7928.4629.0130.2929.4429.0128.9029.9327.6629.8228.7926.3629.9728.7826.7728.3030.0029.2826.4129.6629.6027.9129.73
0229.0533.9332.3828.2032.0630.6632.9924.6029.0222.5832.5430.3815.8632.9531.9318.0021.8731.0230.1219.0824.7829.9231.9024.75
0323.2323.3335.8723.5532.0523.5331.1722.4529.4421.9124.8227.9824.1131.9329.2823.0525.4631.7027.0225.4923.7428.3432.9226.41
0426.6426.2832.2634.2032.6726.1531.2526.6330.3428.0828.4431.0819.9232.1032.4421.2323.5030.8627.3121.1225.8931.9532.5627.68
0526.7125.0828.3326.6831.6426.4328.7326.2828.9825.6527.9226.9326.4530.0829.0625.3527.1529.7927.5926.5127.1128.7628.2428.50
0629.0128.1329.9726.6430.9332.1229.8127.3330.3327.6729.9729.2128.5630.9029.3029.0828.0430.5428.1527.8629.4930.1128.7530.41
0728.1328.7733.7629.0433.6529.1735.3626.2431.8626.8830.3331.0326.6233.4632.3927.7328.6432.7530.9628.7028.9732.7333.3730.75
0826.6923.0624.5126.3028.4424.8525.3731.2727.7127.0427.2626.1827.0427.6427.2624.6526.7628.3225.8226.6727.0227.7426.2428.28
0928.4725.6432.6629.4133.7026.5331.2228.9535.0629.8330.8330.5830.1833.1632.6429.2331.8133.1431.6731.3631.4131.4833.0432.49
1030.4926.6231.6232.5933.5527.6331.2631.4033.4134.7132.3530.8431.0733.0532.9530.8531.8633.1531.2031.7832.5231.8232.6433.10
1128.9127.2730.5030.0731.5628.9830.6427.2930.3928.1932.4029.8527.3831.4930.7427.6228.1431.3428.9027.5828.7830.9930.2830.96
1230.9126.7633.0530.6833.1929.7130.6029.8530.9730.5030.0434.9029.5232.8133.4328.7929.5332.2127.7327.7630.0830.5932.5932.23
1326.1123.5825.8725.1827.6526.1625.8824.3526.9623.9026.3725.2728.6127.4126.4526.1726.1427.7126.1326.7226.8626.9925.3827.78
1424.2623.6830.0823.2631.1324.6429.0023.3329.3321.9225.0627.2225.1731.9528.4922.9126.5930.0828.3326.6624.8427.7529.3926.36
1526.7424.0130.6030.6631.4126.9127.6426.8027.6029.5429.1330.1021.7931.9134.3922.2124.8230.4325.1822.2726.6628.4631.1827.58
1631.5828.7032.8831.0932.8429.4131.9130.6933.1431.0532.7231.6632.1932.8831.5834.1132.0432.6732.4332.3033.0432.2631.4132.82
1731.0028.7532.5131.9233.3229.6832.0930.7432.8230.7132.7331.2031.7433.0632.7331.2234.0833.3032.1932.1732.6132.9032.2333.30
1826.8926.1528.7026.5030.3427.6329.2925.2229.4624.6728.1727.7527.3329.9229.3626.6527.9131.2528.9527.9327.9030.0428.7029.90
1928.7127.2729.3828.3431.6727.6930.3728.3231.5227.7331.1929.4429.5230.9230.0829.2630.3531.6833.2730.8331.2031.7030.0331.41
2025.5124.9829.7727.4229.8129.3526.9327.7829.6027.2827.1730.3931.2432.2631.3625.1129.7531.9124.4734.2327.8529.7332.0530.47
2128.2523.5430.2328.9431.4324.9528.1229.3730.9628.9529.1528.5129.9730.6330.2728.9829.9231.2929.7829.5732.4530.0129.8831.41
2226.4925.6330.3227.5631.4926.6230.1824.9530.7426.3929.1029.1727.1530.6730.5726.3627.3731.5928.7827.6627.8532.5430.7630.65
2321.6821.5633.0121.9132.3322.3127.8320.7427.6420.4823.2025.3721.1331.6226.9820.0522.6831.4625.4722.6821.9426.6436.0224.30
2427.9925.6427.7827.9329.3327.8327.6027.6328.7227.4628.4727.5729.2729.4728.7827.0228.5430.2327.1828.6628.5629.1427.9831.50
评论

Thank you for addressing my questions with experimental results. These responses have resolved most of my concerns. I will increase my original score.

最终决定

This paper proposes a new approach for "zero-shot" denoising (i.e., image denoising with machine learning models without requiring any prior training data) based neural compression. The authors provide a finite-sample theoretical guarantee in terms of an upper bound to the reconstruction error for their proposed approach for Gaussian and Poisson noise models. Experimental evaluation demonstrates the performance of their method, showing state of the art results.

Strengths

  • Nice balance between theoretical analysis and strong-performing method.
  • Intuitively simply approach based on information-theoretic notions.
  • State of the art results for zero-shot denoising.

Weaknesses

  • Somewhat limited and outdated baselines in their experiments.
  • Some gaps between their theoretical analysis and the reported metrics in their results.
  • Some parts of the presentation can be improved (e.g. in images in the appendix).

Discussions and conclusion

This paper received 3 reviews, and reviewers were mildly supportive of this paper initially. However, the discussion period was productive, and the authors have provided (i) further clarifications on their theoretical analysis, (ii) further experimental results on other datasets. This further increased the support. There's consensus that this paper represents an interesting and strong contribution to the field of image denoising in general, and it's worthy of publication - and I concur.