/10

Poster4 位审稿人

最低3最高4标准差0.4

ICML 2025

Sample-specific Noise Injection for Diffusion-based Adversarial Purification

Yuhao Sun,Jiacheng Zhang,Zesheng Ye,Chaowei Xiao,Feng Liu

提交: 2025-01-24更新: 2025-08-14

摘要

*Diffusion-based purification* (DBP) methods aim to remove adversarial noise from the input sample by first injecting Gaussian noise through a forward diffusion process, and then recovering the clean example through a reverse generative process. In the above process, how much Gaussian noise is injected to the input sample is key to the success of DBP methods, which is controlled by a constant noise level $t*$ for all samples in existing methods. In this paper, we discover that an optimal $t*$ for each sample indeed could be different. Intuitively, the cleaner a sample is, the less the noise it should be injected, and vice versa. Motivated by this finding, we propose a new framework, called Sample-specific Score-aware Noise Injection (SSNI). Specifically, SSNI uses a pre-trained score network to estimate how much a data point deviates from the clean data distribution (i.e., score norms). Then, based on the magnitude of score norms, SSNI applies a reweighting function to adaptively adjust $t*$ for each sample, achieving sample-specific noise injections. Empirically, incorporating our framework with existing DBP methods results in a notable improvement in both accuracy and robustness on CIFAR-10 and ImageNet-1K, highlighting the necessity to allocate *distinct noise levels to different samples* in DBP methods. Our code is available at: https://github.com/tmlr-group/SSNI.

关键词

adversarial purificationadversarial robustnessdiffusion-based adversarial purificationaccuracy-robustness trade-of

评审与讨论

审稿意见

评分: 32025-03-07

This paper focus on the diffusion model-based purification methods. They proposed SSNI to find the optimal $t^{\ast}$ based on Diffpure paradigm. The weakness of the Diffpure is that the robust of the purification depends on the setting of the optimal $t^{\ast}$ i.e., how much Gaussian noise should be injected into the adversarial samples. Too much noises will break the semantic during the reverse process. Too less noises will be useless to filter the adversarial noise. SSNI proposed an adaptive way to find the optimal $t^{ast}$ for each sample since different samples will be injected different level of the adversarial noise. In this way, SSNI achieves the more robust purification method. The experiments reported on CIFAR-10 and ImageNet show that SSNI could increase the robust accuracy while keeping the standard accuracy.

给作者的问题

No, please see above contents

论据与证据

This paper have two contributions: 1) finding the norm of the score function could be used as a metric to measure the level of an unknown adversarial noise hidden in given sample. 2) proposing the adaptive way to find $t^{\ast}$ .

The contribution 1 seems questionable. EPS [1] has already proposed that the norm of the score function could be the indicator to difference the natural sample and adversarial samples. Thus, the author claims: " Motivated by this, we further investigate how different perturbation budgets ϵ affect score norms under adversarial attacks (Figure 2)" is not the contribution for this paper while lacking the citation [1].

The author claims the contribution includes proposing a general framework since SSNI is adaptive way, which seems questionable. Eq. 7 and Eq. 8, the key of the adaptive method, contain bias term $b$ . Then, the ablation study reported in the Appendix show $b$ will make big influence for the overall method and has the different scale for CIFAR-10 and ImageNet. In such case, SSNI should be specific design based on the different datasets thus breaking the claim of general framework.

[1] Detecting Adversarial Data by Probing Multiple Perturbations Using Expected Perturbation Score. Shuhai Zhang, Feng Liu, Jiahao Yang, Yifan Yang, Changsheng Li, Bo Han, Mingkui Tan. ICML 2023

方法与评估标准

SSNI has been evaluated on PGD and BPDA attacks, which seems not enough. Although Lee et al. [1] suggest that PGD is more threaten to the diffusion-based adversarial attacks, they does not deny the threaten of the AutoAttack. Therefore, AutoAttack should be considered as the baseline at the same time.
Lacking the ablation study to support the SSNI-linear (SSNI-L). The all experimental results about the robust performance seems not include the SSNI-L. I have checked all the results and there is only inference comparison between SSNI-L and SSNI-nonlinear (SSNI-N) shown in Table 10, Appendix. This leads that SSNI-L seems redundant.

[1] Lee, M. and Kim, D. Robust evaluation of diffusion-based adversarial purification. ICCV 2023.

理论论述

I have checked all the proof including these in Appendix.

There are two main weaknesses:

Mistakes. For example, the triangle inequality in 946 lines is wrong. It should be $||E(\ast)|| - ||g(t)x|| > ||E(\ast) - g(t)x||$ , where $E(\ast)$ is the abbreviated for the expectation term.
Lacking the main theoretical claims. The key for this paper is indicating that the propose method could calculate more precisely $t^{\ast}$ . However, no theoretical claims to support this.

实验设计与分析

It lacks the latest baseline such as [1].

[1] Robust Diffusion Models for Adversarial Purification. Guang Lin, Zerui Tao, Jianhai Zhang, Toshihisa Tanaka, Qibin Zhao. Arxiv:2403.16067.

补充材料

I have reviewer all part.

与现有文献的关系

No, i think they have cited enough related works.

遗漏的重要参考文献

Yes, they does not discuss what is different between their score norm finding and EPS [1]

[1] Detecting Adversarial Data by Probing Multiple Perturbations Using Expected Perturbation Score. Shuhai Zhang, Feng Liu, Jiahao Yang, Yifan Yang, Changsheng Li, Bo Han, Mingkui Tan. ICML 2023

其他优缺点

1, The strength is that they use the surrogate process, which is more robust gradient approximation for the diffusion models.

The additional weakness is that the overall paper seems the combination among EPS, Diffpure and Lee et al. The evidence is that the adaptive way is mainly based on the metric proposed in EPS. Then, lacking the theoretical proof makes weaken the contribution for this paper.

其他意见或建议

The author should make clear clarification for the contribution.
More meaningful theoretical proof should be added.

伦理审查问题

作者回复

2025-04-01

Q1: Contribution of this study

R1: Due to character limits, please refer to Response to Reviewer ynLT - Q1 where we clarify our contribution.

Discussion on [1]

We acknowledge that score-based metrics, including EPS [1], are established tools for distinguishing between clean and adversarial samples. Though score norms can estimate sample deviation [3], we follow the motivation in [1] and adopt their proposed EPS as it offers more robust estimates.

Fig.2 serves only as motivation of SSNI, building on understandings from [1]. It empirically visualizes correlations between score norms and perturbation levels, justifying why score norm is suitable for guiding SSNI's adaptive mechanism. We do not claim it as a contribution and will cite [3] and [1] near Fig.2 to credit foundational concepts, ensuring this is clear in our revision.

So, while we leverage EPS from [1], our core contribution is the SSNI framework, enabling sample-specific noise injection to address the accuracy-robustness trade-off problem in DBP.

Q2: Generality of SSNI

R2: We consider SSNI a general framework because its principle - adaptively adjusting the denoising level $t(x)$ based on a sample's deviation using a reweighting function $f$ - is applicable to various DBP methods and across different datasets.

Eq.7-8 are presented as proof-of-concept instantiations of $f$ within this framework. $b$ is a hyperparameter within these instantiations, allowing reweighted $t^*$ to exceed the baseline $t^*$ to improve robustness.

Tuning hyperparameter per dataset is a common practice in machine learning and does not invalidate the framework's generality. Many 'general' methods (including baseline DBPs that require tuning $t^*$ ) do have hyperparameters that benefit from dataset-specific tuning for better performance.

In this sense, we argue that SSNI is a general framework of sample-specific noise injection for DBP, rather than a specific method implementation. Reviewer xxTh also acknowledged SSNI's generality.

Q3: More Evaluations on AutoAttack and [2]

R3: Thanks for bringing [2] to our discussion, which addresses a similar challenge (accuracy-robustness trade-off) in DBP to SSNI but takes a different approach.

[2] learns adversarial guidance during the reverse diffusion step, requiring training an auxiliary network to modify the diffusion direction. Instead, SSNI is training-free, adaptively adjusting diffusion noise levels per sample before standard diffusion at inference time, based on pre-computed score norms. We will include the discussion of [2] in our revision.

These two methods are thus complementary. In principle, SSNI can be integrated with this method. However, as this paper has yet open-sourced the code, we are unable to get the results now but willing to include them later. See link for required results.

Q4: Usefulness of SSNI-L

R4: We included SSNI-L as a first step when exploring training-free reweighting functions for SSNI. Its purpose was to establish a simple baseline with linear mapping before investigating more complex non-linear ones.

However, SSNI-N consistently provided a superior accuracy-robustness trade-off, possibly due to simple linear mapping can't fully model the complex reweighting operation, justifying our focus on SSNI-N in the main text.

In the revision, we will clearly state the role of SSNI-L as a simpler baseline and summarize its relative performance.

Q5: Triangle inequality

R5: Thank you for carefully reading our proof. We'd like to clarify it is correctly used.

\begin{aligned} ||x + y|| & \leq ||x|| + ||y|| \\\\ ||(x - y) + y|| & \leq ||x - y|| + ||y|| \\\\ ||x|| - ||y|| & \leq ||x - y|| \end{aligned}

Q6: Theoretical Claim

R6: We'd like to kindly recall our core contribution is the SSNI framework for sample-specific noise injection in DBP.

Theoretically proving the optimality of $t^*$ is challenging, as there is no clear definition of 'true optimal' noise level. To be clear, we do not claim to derive the 'optimal' noise level; our focus is to emphasize noise level should be sample-specific. We will ensure this is clarified in the revision to avoid any potential overclaiming.

The effectiveness of SSNI is empirically validated and 'theoretically' justified (Recall our response to Q1).

We believe identifying the true optimal $t^*$ is an interesting open question. We will ensure our revision accurately reflects the scope of our claims and contributions.

[1] Detecting Adversarial Data by Probing Multiple Perturbations Using Expected Perturbation Score. ICML 2023

[2] Robust Diffusion Models For Adversarial Purification. ArXiv 2024

[3] Adversarial purification with Score-based generative models. ICML 2021

Thanks for your review! If our response addresses your concern, we hope you might consider increasing score.

审稿人评论

2025-04-03

Thanks for the author`s rebuttal. I have carefully checked all the content. The author addresses my concerns about the contributions and experiments. Although due to the limitation of the rebuttal time, it lacks the comparison about the AutoAttack, the paper tends to propose an interesting method to approximate the optimal $t$ . In this case, I'm willing to increase my score to weak accept.

作者评论

2025-04-03

Dear Reviewer oarL,

Many thanks for your support via increasing your score to 3: weak accept! We are happy to see your concerns are addressed.

(Update 04/06) We have done the comparison about the AutoAttack, see the result below. This table shows the performance of AutoAttack $\ell_{\infty}$ ( $\epsilon = 8/255$ ) Random version on CIFAR-10 dataset. We are sorry for the late post due to AutoAttack which requires lots of time for experiment.

WRN-28-10	clean accuracy (%)	robust accuracy (%)
Diffpure	89.71±0.72	66.73±0.21
Diffpure-SSNI-N	93.29±0.37	66.94±0.44
GDMP	92.45±0.60	64.48±0.62
GDMP-SSNI-N	94.08±0.33	66.53±0.46
GNS	90.10±0.18	69.92±0.30
GNS-SSNI-N	93.55±0.55	72.27±0.19

Best regards,

Authors of Submission 13990

审稿意见

评分: 32025-03-10

This paper proposes a new perspective on diffusion-based purification (DBP) methods. The authors first show the score norms $||\Delta_{x}log\ p_{t}(x)||$ of input samples $x$ are highly related to the noise level of Gaussian noise that should be injected when performing diffusion-based adversarial purification, then they develop a Sample-specific Score-aware Noise Injection(SSNI) method based on a pre-trained score network to control the level of injuected Gaussian noise, which improves the performance in terms of clean accuracy and robust accuracy when integrated with existing DBP methods.

update after rebuttal. We thank the detailed rebuttal and explanations, which addressed our concerns. Our views towards the paper remain unchanged. The paper is acceptable because of its method's novelty, generalization, and effectiveness in improving the performance in defending against adversarial attacks.

给作者的问题

We are curious about the performance of SSNI against other attack methods, especially diffusion-based methods like Diff-PGD [1].
Do the authors evaluate their method against adversarial attacks under the black-box setting?
We are also curious about how can the SSNI method adapt the noise level for unrestricted adversarial attacks which modify the semantics of images on a large scale. (eg. DiffAttack [2], ACA [3])

[1] Xue H, Araujo A, Hu B, et al. Diffusion-based adversarial sample generation for improved stealthiness and controllability[J]. Advances in Neural Information Processing Systems, 2023, 36: 2894-2921.

[2] Chen J, Chen H, Chen K, et al. Diffusion models for imperceptible and transferable adversarial attack[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.

[3] Chen Z, Li B, Wu S, et al. Content-based unrestricted adversarial attack[J]. Advances in Neural Information Processing Systems, 2023, 36: 51719-51733.

论据与证据

The claims made in the submission are supported by clear and convincing evidence.

方法与评估标准

The proposed methods and evaluation criteria make sense for the problem.

理论论述

This paper proposes the proof of the relationship between score norms and noise level. We check the correctness of the proofs of Lemma J.1, J.2, J.3, J.4, J.5, and J.6 in Appendix J carefully and find no obvious issues.

实验设计与分析

The paper does experiments on two datasets: CIFAR-10 and Imagenet-1K, and evaluates the robustness of the convolutional network-based classifiers against white-box PGD+EOT and BPDA+EOT attack method under the SSNI+BPD framework for each dataset, validates the superiority of the SSNI method in improving the accuracy of classifiers against both normal inputs and adversarial inputs in some circumstances. However, the experiments are not persuasive enough, other model architectures such as transformer-based classifiers should be included.
The paper only shows the qualitative results of purified images from the CIFAR-10 dataset in the main paper and appendix, visualization of purified images from Imagenet-1K should be included to show the performance on complex and high-resolution datasets.
Ablation sturdy on hyperparameters validates the framework's hyperparameters selection.

补充材料

We reviewed all parts of the appendix.

与现有文献的关系

Prior diffusion-based purification(DBP) methods inject a constant level of Gaussian noise into the input sample and leverage the sampling process of diffusion models to remove probably existing adversarial perturbations from the input samples, which achieves good performance in defending adversarial attacks.
However, such DBP methods may also decrease the clean accuracy of classifiers, this paper attributes it to the fixed Gaussian noise level, since the Gaussian noise level for different kinds of input samples should be different (eg: a high noise level may destroy the semantics of clean samples, thus reducing the clean accuracy; a low noise level may not remove all adversarial perturbations, thus reducing the robust accuracy. ).
Then this paper solves this problem by proposing the SSNI method, which applies a score network to control the forward diffusion step t* to inject an adaptive level of Gaussian noise into the input samples, improving both clean accuracy and robust accuracy of classifiers.
The SSNI method is general enough to be integrated with existing DBP methods and improve their performance further.

遗漏的重要参考文献

To the best of our knowledge, no.

其他优缺点

Strengths

The writing of the paper is fluent, the structure is clear, and there are no obvious grammar errors.

Weaknesses

The authors only consider modifying the noise level of the entire DBP framework and developing their SSNI method. Though this improves the generalization of their method, which allows it to be integrated with existing DBP methods, the innovation of the entire paper appears insufficient.
The SSNI method employs a pre-trained score network to estimate the score norm of input samples. As empirically validated in the study, the framework achieves a 2-3% accuracy improvement on ImageNet-1K while incurring a 5-second time increase per image. This presents a critical trade-off consideration: Given that DBP methods are inherently time-consuming, the justification for further escalating computational complexity to pursue marginal performance gains warrants rigorous cost-benefit analysis and domain-specific evaluation.

其他意见或建议

In the caption of Figure 1, there's some typo in the citation: Nie et al.(2022).

作者回复

2025-04-01

Q1: Evaluation with transformer-based classifiers, Diff-PGD, and unrestricted attacks

R1: We have supplemented the transformer-based model and Diff-PGD experiments. Due to character limits, please see here for results

Regarding unrestricted attacks: In this paper, we primarily focus on defending human-imperceptible adversarial perturbations, while unrestricted attack is a different problem setup, which breaks the assumption of human-imperceptible perturbations in most adversarial defense literature. However, we acknowledge this is an interesting open question to be explored.

[1] Diffusion models for adversarial purification. ICML, 2022.

Q2: Discussion on the innovation

R2: We thank the reviewer for acknowledging SSNI's generality.

We'd like to respectively argue that customizing sample-specific noise level $t^*$ is new because $t^*$ is fundamentally critical to DBP performances, and the fixed $t^*$ used in existing studies is a core limitation causing suboptimal accuracy-robustness trade-offs.

SSNI's main contribution is introducing a conceptual advance of sampled-adaptive $t(x)$ , leveraging score norms that represent a sample's deviation from clean data, tailoring purification strengths based on instance-specific score norms. With extensive experiments, we confirm this 'simple' modification leads to substantial gains of accuracy-robustness balance.

We believe the simplicity and effectiveness of SSNI, while being easily integrated into diverse DBP methods (as the reviewer mentioned), is a key strength.

To our knowledge, SSNI is the first framework to systematically implement and validate score-norm-driven adaptive $t^*$ for DBP, which offers a practical, impactful and thus novel purification principle.

Q3: Discussion on the performance-time trade-off

R3: Thank you for raising this discussion. We acknowledge that increased inference time is a limitation of SSNI, however, it is primarily attributed to the inherent limitation of diffusion models, which our method relies on in estimating score norms. We look forward to more efficient strategies to accelerate this step.

On the other hand, we argue the benefits often justify the cost. On SSNI delivers absolute gains of +2.0-2.5% standard and +1.0-4.8% robust accuracy on ImageNet with 1000 classes (PGD $\ell_\infty$ ), gains often considered substantial in robustness contexts. Moreover, SSNI improves the overall accuracy-robustness balance, which is a qualitative benefit beyond only numerical results.

Ultimately, the cost-benefit analysis is indeed context-dependent. For security-critical offline tasks, the improved accuracy-robustness profile provided by SSNI could well justify the additional inference time. As a modular enhancement, SSNI provides practitioners an option when the computational budget allows for improved DBP effectiveness and accuracy-robustness trade-off. We also note that ongoing advances in efficient score estimator will help to reduce this overhead.

Q4: Black-box settings

R4: We focused on adaptive white-box attacks (PGD+EOT, BPDA+EOT), aligning with the standard evaluation protocol for DBP methods (Lee&Kim ICCV 2023), which is common practice in recent DBP literature as it directly stress-tests the defense pipeline against the worst-case threat model. Robustness against these strong attacks typically implies robustness against weaker black-box threats. Thus, we prioritized demonstrating effectiveness against the established challenging white-box benchmarks.

Still, we provide results for a gray-box attack setting with PGD+EOT $\ell_{\infty}$ ( $\epsilon=8/255$ ) and $\ell_{2}$ ( $\epsilon=1$ ) on CIFAR-10 dataset (partial results only, we'll report full results in the revision), where the attacker can access the target classifier, not the entire defense system. Due to character limits, please see here for results

Q5: Visualization of Imagenet-1K R5: Thank you for the feedback. Regarding ImageNet-1K, we will include high-quality ImageNet-1K purification visualizations in the revision.

For the CIFAR-10 images (Fig.5-7), visual differences are indeed subtle due to low resolution. Our main goal here was not necessarily to showcase visually superior cleanness, but rather to show that SSNI maintains semantic integrity. Even when SSNI adaptively uses different (sometimes higher) noise levels $t^*$ per sample, the mechanism driving improved robust accuracy does not corrupt the essential semantic information, unlike the failure cases illustrated in Fig.1 where improper $t^*$ leads to misclassification. This thus confirms that more flexible purification does not** come at the cost of distorting image content and compromising clean/robust accuracy.

Thank you for your time again! Hope our responses address your concerns.

审稿意见

评分: 32025-03-13

This paper examines the problem of choosing a sample-dependent number of forward/reverse diffusion steps to use in diffusion-based purification (DBP) adversarial defense. Prior works typically use a fixed number (e.g. t=100) forward/reverse steps to secure an input before sending it to the classifier. The method is motivated by the intuition that different samples have difference number of forward/reverse steps needed for security (more secure samples need less diffusion, less secure samples need more diffusion). The key problem then becomes the method for estimating the ideal number of diffusion steps from a sample. The work proposes to use the score of the input sample in the diffusion network as a way to predict the optimal number of steps. Samples with lower score norms are believed to be closer to natural images and require less purification, while samples with higher score norms are believed to be further from natural images and require more purification. Simple linear and non-linear functions are used to reweight a baseline timestep into a sample-adjusted timestep using the score norm. Experimental results show the proposed method can reliably increase the natural accuracy and often increase the robust accuracy of existing diffusion defenses compared to the baselines using a fixed number of steps.

## After rebuttal: My view of this work remains similar. The proposed method appears to be a reliable and cost-efficient way to provide a modest increase in security for diffusion defense.

给作者的问题

N/A

论据与证据

The claims and evidence in this paper are generally solid. The criterion of using the score norm is a reasonable way to judge how easy/difficult a sample will be to classify and builds upon similar observations in prior work. It is intuitively reasonable that adapting the number of diffusion steps based on this uncertainty measure could improve defense performance compared to the scenario of using a fixed number of steps. The increases in natural/robust accuracy from the proposed method compared to baselines is consistent and reasonable.

方法与评估标准

The methodology is suitable for the problem at hand. Use of score norm is a reasonable way to measure the classification difficulty of an input. The timestep selection function is relatively lightweight and doesn't add unreasonable computational burdens. Evaluation is performed according to the same attack protocol as the Lee & Kim, which is a representative state-of-the-art attack against diffusion models.

理论论述

This paper does not make any theoretical claims.

实验设计与分析

The experimental design and analyses are straightforward and appropriate. An ablation for timestep selection hyperparameters, method for calculating attack gradient (full checkpointing vs. surrogate vs. DDIM), use of single versus multiple score norms is presented.

补充材料

The supplementary material includes code for reproducing the results in the paper. I did not carefully check the code.

与现有文献的关系

The key contribution of this work is a relatively fast method for selecting the number of diffusion forward/reverse steps for DBP. The method is quite general and can be incorporated with different DBP variations. The experimental results show that consistent gains can be achieved. While these gains are not especially large, on the other hand I feel fairly convinced that it is still worth using this method rather than using a fixed timestep.

遗漏的重要参考文献

DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification https://arxiv.org/abs/2311.16124

This attack method produces similar results to Lee & Kim against DBP. It might be worth including the results from this attack with and without the proposed method in Table 4.

其他优缺点

The main strengths are that the method is straightforward and produces a consistent benefit, more for natural accuracy but usually for robust accuracy as well. The main weakness is that the gains are not especially significant and that there is still a risk of reducing robust accuracy.

其他意见或建议

N/A

作者回复

2025-04-01

Q1: Discussion on performance gain and robust accuracy

R1: We appreciate the reviewer for raising this concern. We'd like to first clarify our contribution and provide a clearer context.

Contribution

The central goal of this paper (and SSNI) is to achieve a more favorable accuracy-robustness balance, which is a crucial consideration in adversarial training/defense studies [1]. Our contributions are

identifying a critical limitation in existing DBPs: they rely on a fixed diffusion noise level $t^*$ $t^{*}$ injected during the forward pass, forcing a single, yet often suboptimal, denoising effort across all samples. For any clean or adversarial sample, an inappropriate noise level might
- fail to remove adversarial perturbations sufficiently (hurting robustness) or
- excessively corrupt sample semantics (hurting accuracy).
(core contribution) proposing a new framework for sample-specific noise injection (SSNI) to directly address this limitation. Based on the estimated deviation from the clean data distribution, SSNI adaptively sets the diffusion noise level $t(x)$ for each sample, assigning lower noise for cleaner samples to preserve accuracy, and potentially higher noise for more perturbed samples to enhance robustness.
confirming that SSNI has a larger purification capacity and is more flexible than sample-shared-noise DBPs, supported by empirical results and theoretical justification (Appendix A)
(central implication) SSNI improves the overall accuracy-robustness trade-off in DBP, particularly achieved in a training-free manner.

Significance and risk

Our results show SSNI's success in achieving this better balance. We observe that in most cases, both standard and robust accuracy are improved simultaneously (e.g., GDMP under BPDA+EOT, Table 3: +1.63% Std Acc, +1.11% Rob Acc). Even in rare cases where robust accuracy sees a minor decrease (e.g., -0.06% for DiffPure PGD-L2, Table 1), it is often coupled with a substantial gain in standard accuracy (+2.15% in that case). We argue this often represents a preferable trade-off point, recovering clean performance sacrificed by using fixed $t^*$ .

The consistent improvements on ImageNet (Table 2) further confirm that SSNI can effectively balance these targets on large-scale tasks. We thus argue that SSNI generally enhances the robustness aspect of the accuracy-robustness balance compared to baselines.

In summary, we believe that SSNI represents a principled framework to address the fundamental accuracy-robustness trade-off in DBP via adaptive denoising budgets, providing flexibility and improved overall balance that fixed $t^*$ methods cannot achieve.

[1] Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training, ICML 2024.

Q2: Results with additional DBP backbone (DiffAttack)

R2: Thank you for the suggestion. We have supplemented experiments and reported these results as requested.

Please see the DiffAttack results. The table provides results of utilizing DiffAttack [2] with $\ell_{\infty}$ ( $\epsilon=8/255$ ) on target classifier WRN-28-10 on CIFAR-10 (due to limited time frame within the rebuttal phase, we can only include partial results therein; we will report full results in the revision).

It is easy to observe consistent performance gains over all DBP baselines with our SSNI integrated.

[2] DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification, NeurIPS 2023

Thank you very much again for your time! We hope our response addresses your concern. If you have any further questions, please feel free to ask, we’d be happy to provide more clarification.

审稿人评论

2025-04-08

I read the other reviews and the authors responses. I decided to keep my score the same. Thanks to the authors for their thoughtful rebuttal and additional experiments. The proposed method appears to be a consistent and straightforward way to increase the robustness of diffusion defenses. While it is unlikely to greatly extend the scope of adversarial purification, I feel convinced that applying the proposed method is worthwhile for the defender.

作者评论

2025-04-08

Dear Reviewer ynLt,

Many thanks for your reply! We are glad to hear that your major concerns have been addressed!

We want to thank you again for providing this valuable feedback to us. Your support would definitely play a crucial role for our paper.

Best regards,

Authors of Submission13990

审稿意见

评分: 42025-03-15

This paper presents a method to enhance existing diffusion-based adversarial purification techniques. The authors build on the intuitive idea that adversarial samples with higher noise levels require larger diffusion timesteps for effective purification. To explore this, they analyze the output of the diffusion model when processing adversarial samples and observe that samples with greater noise tend to exhibit larger output norms. Leveraging this insight, the authors propose an adaptive approach to selecting the optimal diffusion timestep based on the noise level of each adversarial sample. They categorize noise into sample-shared and sample-specific components. Empirical results on CIFAR-10 demonstrate that the proposed method is compatible with existing diffusion-based adversarial purification techniques and further enhances their performance.

给作者的问题

I have two questions directly relevant to the weaknesses:

Have the authors considered other forms of reweighting functions, for example, can we have optimizable ones using neural networks?
Do you have alternatives to bypass pre-defined noise levels?

In addition, How sensitive is the performance of SSNI to the choice of score network? If the method is highly dependent on a specific, perfectly trained score network, its practical applicability and robustness could be limited.

论据与证据

Yes, most of the claims are well supported. In the motivation section, the authors highlight three aspects of the relationship between perturbation $\epsilon$ and the noise level $t^\ast$ . The first two claims are substantiated with examples. However, I do not find any evidence supporting the intuitive assumption that samples with larger perturbations should require a higher timestep $t^\ast$ .

方法与评估标准

The authors propose to leverage pre-trained score networks to estimate the gradient of the log-data density is a valid technique from score-based generative modeling, and nusing score norms as a proxy for deviation from the clean data distribution is a reasonable heuristic.

The authors use standard datasets like CIFAR-10 and ImageNet-1K and evaluating both clean accuracy and robustness against adversarial attacks are standard and appropriate evaluation criteria for adversarial defense methods.

理论论述

This criteria does not apply to this paper, as there are no theoretical claims presented in the main text. The mention of score norms and references to [1] hint at a theoretical underpinning in score-based generative modeling, but this is not formalized as theorems or propositions.

[1] Song et al. Score-based generative modeling through stochastic differential equations. ICLR 2021

实验设计与分析

The experimental design is soundness. The authors basically follow the experimental design of existing works, which is comprehensive.

One issue is that the authors claim they will conduct experiments on ImageNet-1K, however, I do not see any visualised results in the paper. Since the visualised samples of CIFAR10 is quite blur, I can not see any useful information from the compared results in Figure 5-7. Can you further explain how to tell the purification effects from the aspect of visualization?
One thing I am confused about is that in the methodology part, the authors claim that their proposed method are used to address sample-specific noise presented in the adversarial purification. However, it seems that the authors also use the same level of perturbation for the whole dataset. For example, in Table 1, the perturbation is set to be 8/255 and 0.5 respectively. Where does the sample-specific noise come from?

补充材料

I have reviewed part of the supplementary material, especially focusing on the supplemented experiment results. I do not check the proof very carefully.

与现有文献的关系

The contributions are related to the literature on Adversarial Purification (AP) and Score-based Adversarial Detection. For the former area, the paper direclty builds upon the field, specifically DBP. The authors adequately cite a number of representative works in AP and example DBP methods. The paper positions itself as an improvement with a more general framework to these existing DBP techniques. As for the later area, the paper mentioned Yoon et al. (2021)'s use of score norms for adversarial example detection, connecting the proposed method to this line of work and extending the use of score norms beyond detection level adaptation in purification.

遗漏的重要参考文献

I believe that some of works [1] have been investigating the level of noise problem in diffusion purification. In this work, the authors investigate how the level of noise could be connected to the timestep in diffusion purification. The authors are encouraged to discuss this paper and illustrate the difference between this paper and their work.

[1] Wang et al., Imitation Learning from Purified Demonstrations. ICML 2024.

其他优缺点

Strengths:

I appreciate that the proposed method is simple enough to understand, and appears to be a general framework, allowing it to be compatible with diverse DBP methods.

Weaknesses:

The methodology section only covers two reweighting functions - it appears that there are also other possibilities which could be considered as well.
Still, a pre-defined noise level needs to be specified before reweighting.

其他意见或建议

N/A

作者回复

2025-04-01

Q1: Evidence of Claim

R1: Thanks for pointing it out. We now show that samples with larger deviation (caused by larger perturbation, leading to higher score norm) need stronger denoising (higher $t^*$ ).

With a DBP method [1], we assess the robust accuracy of WRN-28-10 against AEs from PGD+EOT $\ell_{\infty}$ attacks with varying perturbation budgets on CIFAR-10, finding a shared $t^*$ leads to poor robust accuracy for the other three groups (results).

[1] Robust evaluation of diffusion-based adversarial purification. ICCV 2023.

Q2: Visualization of ImageNet

R2: Thanks for the advice! We will include visualizations of ImageNet. Due to character limits, we kindly refer you to Response to Reviewer xxTh Q5 where we also discuss this in detail.

Q3: Clarify adversarial perturbation from diffusion noise

R3: This confusion is caused by inconsistent $\epsilon$ between motivation and experiment sections.

We now clarify the difference between adversarial perturbation budget $\epsilon$ and the adaptive diffusion noise $t(x)$ applied by SSNI.

Our experiments indeed use a fixed $\epsilon$ , defining attack strength used to generate adversarial examples (AEs) for controlled evaluation. However, SSNI's sample-specific noise refers to the amount of Gaussian noise injected during DBP process, not $\epsilon$ of adversarial attack.

Our motivation is that different samples benefit from sample-specific diffusion noise levels for purification. Fig.1 implies this need. Fig.2 then shows that score norms (our chosen metric) correlate with perturbation intensities (using different attack budgets $\epsilon$ is a clear way to show this correlation). It establishes score norms as a proxy for deviation from the clean manifold.

We still need sample-specific diffusion noise level $t(x)$ even when evaluating against the same budget $\epsilon$ .

The same attack (shared $\epsilon$ ) applied to different clean examples (CEs) will produce AEs perturbed to varying degrees relative to the clean manifold, depending on each CE's own property and its interactions with the attack.
So, even under a fixed $\epsilon$ , AEs have different score norms.
This leads to different diffusion noise level $t(x)$ tailored to sample-specific deviations (Eq.4).

So, the same $\epsilon$ in evaluation does not omit sample-specific diffusion noise levels $t(x)$ .

Q4: NN-based reweighting function

R4: The presented SSNI-L/N are instantiations of the reweighting mechanism within our SSNI framework. Our primary goal was to introduce the core SSNI concept and show its effectiveness using simple and computationally lightweight functions working entirely at inference time without additional training.

Using a NN as the reweighting function is indeed a valid extension. But this would require a training phase for optimization, shifting from the current training-free paradigm and introducing training overhead. We agree that investigating such learnable functions is a promising direction.

Q5: Pre-defined noise level

R5: Yes, we used a pre-defined base noise level for score estimation via EPS (L252). Bypassing these pre-defined levels is challenging within the training-free paradigm. Similar to the reweighting function, one could train NN to predict a proper level per sample. Yet this would also introduce a training phase and associated overhead. Developing a purely analytical training-free method to determine these levels without any reference is non-trivial.

We appreciate the reviewer's question and leave this interesting avenue for future work.

Q6: Score network choice

R6: Thank you for this valuable advice. We used a standard SDE-based score network [1] pre-trained on CIFAR-10 and a guided diffusion model [2] pretrained on ImageNet, following previous score estimation studies (Zhang et al. 2023).

Moreover, SSNI should not be sensitive to a specific score network, as we use EPS, which averages scores over multiple noise levels for more robust score estimation (L264-274, Eq.6), supported by Appendix G. Thus, we believe SSNI can generalize to other score networks (e.g. LDM, EDM), as long as they provide reliable score estimations.

We are empirically evaluating with other score networks and will report the results once we obtain them.

Q7: Related work

R7: Thanks for providing the reference [3]. Both [3] and SSNI leverage diffusion models and analyze choice of noise level in different contexts. [3] denoises imitation learning demonstrations with an optimal $t^*$ before IL, whereas SSNI targets adversarial defense with sample-specific $t^*(x)$ selection based on score norms. We will include the discussion in the revision.

[1] Score-Based Generative Modeling through Stochastic Differential Equations ICLR 2021

[2] Diffusion Models Beat GANs on Image Synthesis NeurIPS 2021

[3] Imitation Learning from Purified Demonstrations ICML 2024

最终决定Accept (poster)

2025-05-01

All reviewers have provided positive scores for this submission, highlighting its strengths in novelty and experiments. Given the unanimous positive feedback and the recognition of its contribution to the area, the AC carefully reviewed the paper and concurred with the reviewers' assessments, therefore supporting the decision to accept this submission.