PaperHub
5.3
/10
Poster4 位审稿人
最低4最高6标准差0.8
4
6
5
6
3.5
置信度
正确性2.5
贡献度2.8
表达2.5
NeurIPS 2024

DiffHammer: Rethinking the Robustness of Diffusion-Based Adversarial Purification

OpenReviewPDF
提交: 2024-05-15更新: 2024-11-09
TL;DR

DiffHammer provides effective and efficient robustness evaluation for diffusion-based purification via selective attack and N-evaluation.

摘要

Diffusion-based purification has demonstrated impressive robustness as an adversarial defense. However, concerns exist about whether this robustness arises from insufficient evaluation. Our research shows that EOT-based attacks face gradient dilemmas due to global gradient averaging, resulting in ineffective evaluations. Additionally, 1-evaluation underestimates resubmit risks in stochastic defenses. To address these issues, we propose an effective and efficient attack named DiffHammer. This method bypasses the gradient dilemma through selective attacks on vulnerable purifications, incorporating $N$-evaluation into loops and using gradient grafting for comprehensive and efficient evaluations. Our experiments validate that DiffHammer achieves effective results within 10-30 iterations, outperforming other methods. This calls into question the reliability of diffusion-based purification after mitigating the gradient dilemma and scrutinizing its resubmit risk.
关键词
adaptive adversarial attackadversarial purificationdiffusion

评审与讨论

审稿意见
4

This work proposes a new attack evaluation for diffusion-based purification methods: the 1 + N evaluation, which incorporates expectation maximization-based attacks and N-time evaluation. This method is helpful to evaluate the worst-case robustness of stochasticity-based defense methods.

优点

  1. This paper is well-written, with elaborative article organization and clear illustrations.
  2. This paper gives a theoretical analysis of the advantages of N-time evaluation.

缺点

  1. One of the main conclusions on the advantages of N-evaluation is not that impressive. In fact, I think it is quite intuitive and the innovation is questionable.
  2. This setting will significantly increase the attack cost. If N is large enough, nearly all stochasticity-based defense methods have the risk of being broken. However, I do not think this is a practical and meaningful setting.
  3. The results cannot be compared with other methods directly. In addition to ASR, the absolute robustness should be reported in Table 1 so that these numbers can be directly compared with the results in other related literature [1, 2].
  4. Insufficient experimental results. Only the simple dataset, CIFAR-10, is considered. More datasets, especially ImageNet, should be incorporated.

[1] Robust Evaluation of Diffusion-Based Adversarial Purification, ICCV, 2023.

[2] Robust Classification via a Single Diffusion Model, ICML, 2024

问题

Please see comments above.

局限性

N/A

作者回复

Thank you for the valuable feedback, here's our response to the concerned problems.

Advantages and costs of N-evaluation

Summary: We were the first to propose using NN-evaluation to enhance evaluation accuracy and improve attack effectiveness and efficiency. In terms of evaluation, we demonstrate that 1-evaluation underestimates resubmission risk. An appropriate NN (e.g., 10) is acceptable to attackers, making the risk realistically alarming. We'll include a discussion of this section in the paper.

In terms of attack, we leverage the byproduct of NN-evaluation, which incurs minimal additional overhead compared to the NN-EOT attack. Our algorithm only require a small NN to overcome the gradient dilemma for sufficient attacks.

Current robustness evaluations typically rely on 1-evaluation. However, in diffusion-based purification, the attack success rate for about 50% of samples ranges between 0 and 1, suggesting that attackers can undermine defenses by resubmitting. Evaluating resubmit attack risk depends on estimating attack probability, which 1-evaluation's non-zero-or-one result fails to provide. Even for samples with a 10-20% success rate (around 20% in our experiments), attackers can expect more than one successful attack out of 10 resubmissions with a feasible cost. Our analysis shows that 1-evaluation underestimate this risk statistically, as discussed in Section 3.1. Therefore, our proposed NN-evaluation is practical amid the rise of diffusion-based defenses.

Another contribution is using NN-evaluation to aid the attack algorithm, which isn't burdensome because (1) increasing NN slightly raises costs, and (2) our algorithm remains stable without a large NN. We use only the byproducts of NN-evaluation, with the main overhead being the computation of approximate gradients, which is less costly than full gradients. Our algorithm's computational cost involves NN approximate gradients and one full gradient, whereas the EOT-based algorithm requires NN full gradients. Comparing computational times for varying NN, we find that increasing NN doesn't significantly raise costs as shown in Rebuttal Figure.4. We also assess different NN values on algorithm performance, fixing evaluations at 10 for fairness. Results in Rebuttal Figure.5,6 show that a small NN (e.g. N=5N=5) allows our algorithm to address the gradient dilemma and improve attack. Thus, NN-evaluation effectively enhances attack efficiency at minimal cost.

Direct comparison with other papers

We computed robustness metrics using 1-ASR for direct comparison with other studies. Our differing results arise because the original paper reports the best historical robustness in 1-evaluation, while we focus on N-evaluations. Let MM denote iterations, and nm(N)n^{(N)}_m the successful attacks in the mm-th iteration's N-evaluation. Our Avg.Rob is 1maxmnm(N)/N1 - \max_m \sum n^{(N)}_m / N, Wor.Rob is 1maxm1(nm(N)>0)1 - \max_m 1(n^{(N)}_m >0), while their Rob is 1maxm1(nm(1)>0)1 - \max_m 1(n^{(1)}_m >0). Their Rob is more akin to our Wor.Rob but from different samples, which can't assess model robustness for a given sample. After comparing methods consistently, most attack results align with the paper, except GDMP shows lower robustness due to the smaller WRN-28-10 classifier compared to DiffPure's WRN-70-16. To ensure fair benchmarking, we use WRN-70-16. Replacing it with WRN-28-10 reproduces GDMP results. Inconsistencies with LM results are due to numerical explosions in gradient computation which may disable the attack, we resolved it by gradient clipping.

Table 1: Comparison with other papers.

epsDefenseAttackAvg.RobWor.RobRobRob (reported)
4DiffPureDiffAttack70.8040.0465.0467.19
4DiffPurePGD71.2739.0665.82N/A
4DiffPureDiffHammer69.0635.9463.48N/A
4GDMPDiffAttack73.9852.5469.73N/A
4GDMPPGD77.7755.0874.61N/A
4GDMPDiffHammer72.2951.5668.75N/A
4LLHD_maximizeAA49.2227.3444.14N/A
4LLHD_maximizePGD59.0433.4054.30N/A
4LLHD_maximizeDiffHammer45.7620.3139.84N/A
8DiffPureDiffAttack56.0033.0154.5659.38
8DiffPurePGD60.2733.0154.6955.82
8DiffPureDiffHammer51.9927.9345.90N/A
8GDMPDiffAttack54.9039.6551.56N/A
8GDMPPGD65.3747.6660.5546.84
8GDMPDiffHammer50.3734.9645.90N/A
8LLHD_maximizeAA36.4318.7531.8471.68
8LLHD_maximizePGD48.4826.7644.34N/A
8LLHD_maximizeDiffHammer30.6114.0626.76N/A

Other dataset

CIFAR10 is the primary dataset for evaluating adversarial defenses, so we focus on it. We also include results on other datasets, like restricted ImageNet and CIFAR100, where restricted ImageNet's 9 superclasses present more challenges. We reduced the attack budget to 4/255. Results show our algorithm's ASR significantly surpasses others, indicating the gradient dilemma's presence across datasets.

Table 2: Experiment result on restricted Imagenet

MethodDiffPureGDMPLM
ASRAvg.Wor.Avg.Wor.Avg.Wor.
BPDA41.049.242.348.471.777.3
DA/AA46.953.953.460.948.657.8
PGD45.753.151.761.744.754.7
DH61.668.866.071.173.878.1

Table 3: Experiment result on CIFAR100

MethodDiffPureGDMPLM
ASRAvg.Wor.Avg.Wor.Avg.Wor.
BPDA67.293.655.285.458.680.7
DAAA69.494.558.985.252.4
PGD69.594.359.085.451.478.9
DH72.295.362.487.5054.480.5
评论

Thanks for the rebuttal. Part of my concerns have been addressed, except for W2 and W4.

W2: I still do not think that N-evaluation is a practical and meaningful setting considering that nearly all stochasticity-based defense methods have the risk of being broken if N is large enough. In addition, if N=5 or N=10 would be recommended, why don't we use N=100 or even larger to evaluate? Conversely, when we discuss robustness, we often report the accuracy under attacks, which is the expectation of accuracy over the dataset. 1-evaluations naturally meet this perspective.

W4: High-resolution and large-scale datasets, in particular, the ImageNet-1K, should be evaluated to enhance the results.

Based on these concerns, I think this work to be a borderline case and will keep my rating.

评论

Thank you for your question and feedback, and our response to your concerns is as follows:

W2: In scenarios such as login, authentication, etc., the reward for an attacker to obtain at least one attack is high, while attempts are costly or limited. In such scenarios, the defender needs to consider worst robustness under a practical NN-evaluation setting. In such a scenario-dependent setting with practical NN, stochastic defenses are not so inevitably attacked and should maintain a certain level of robustness. For example, the attacker needs to pay for API calls to attack, or may be blocked for a while when 5 incorrect authentications are submitted. In these cases, threats with N=5N=5 need to be considered, while threats with N=100N=100 are extreme/impossible. In short, the choice of NN is attack scenario-dependent, and NN-evaluation can be adapted to these scenarios by setting a proper NN.

When considering robustness in the average sense, the stochastic defense yields different outputs for each submit, so its accuracy should be expected not only for the data, but also for the submit. Thus robustness corresponds to our Avg. Rob metric, which also requires NN-evaluation to estimate.

The importance of NN-evaluation in evaluating stochastic defenses is also discussed in 11 . Our contributions on NN-evaluation are (1) we statistically prove that 1-evaluation underestimates the risk of resubmits, and (2) attacks based on NN-evaluation are more effective and efficient compared to those based on EOT. Overall, NN-evaluation helps to assess the both average or worst case robustness of stochastic defenses in a wider range of scenarios, and can be used as an aid for better attacks even in strict scenarios where defenders only consider 1-evaluation threats.

W4: We tested the effectiveness of our algorithm on the high resolution dataset restricted-ImageNet, which is ImageNet reorganized into 9 classes. Images are still 256*256 and from ImageNet1k, with the only difference being that fewer classes make the attack more challenging, which is widely used for evaluating model robustness 242-4 . Otherwise the 1000 classes in ImageNet1k make the attack easy and the robustness of the model is almost 0. As the results are shown in Table 2, our method DH performs better in different defenses. Our improvement is even more significant in denoising-based DiffPure and GDMP, which suggests that diffusion models on large-scale data are more likely to introduce gradient dilemmas to the defense. Thus, DiffHammer can be used as a better evaluation tool in different datasets.

We appreciate your feedback and will clarify our contribution and add experimental results from other datasets in the revision. We sincerely appreciate it if you could let us know any additional questions or concerns. We are keen to provide a satisfactory response.

11 Lucas K, Jagielski M, Tramèr F, et al. Randomness in ml defenses helps persistent attackers and hinders evaluators[J]. arXiv preprint arXiv:2302.13464, 2023.

22 Chen H, Dong Y, Wang Z, et al. Robust classification via a single diffusion model[J]. arXiv preprint arXiv:2305.15241, 2023.

33 Tsipras D, Santurkar S, Engstrom L, et al. Robustness may be at odds with accuracy[J]. arXiv preprint arXiv:1805.12152, 2018.

44 Yang Y Y, Rashtchian C, Zhang H, et al. A closer look at accuracy vs. robustness[J]. Advances in neural information processing systems, 2020, 33: 8588-8601.

评论

Hi, We kindly look forward to receiving your feedback to give us the opportunity to further improve our paper and address your questions. Thank you!

审稿意见
6

The paper proposes a new adversarial attack framework against diffusion-based purification defenses. The paper first explains the advantage of using N-time evaluation as the metric for randomized defenses. Then, it proposes an E-M based adversarial attack, which empirically shows SOTA ASR.

优点

  1. Attacking diffusion-based purification defenses (SOTA defenses) is noteworthy to the community.
  2. The attack framework is novel to me, and the empirical evidence is sufficient.

缺点

I think the randomness of diffusion purification is worthy of noticing and exploring in an EM framework with distribution-level analysis, which is a viable way from a high level. However, the major concern is that the paper does not convey the attack analysis in the EM framework very clearly. It might be a problem of presentation, but additional explanations are definitely needed for a fair judgment of the method.

  1. Some definitions are not clear and rigorous. Line 141: missing rigorous definition of gg. Line 143 says that g(ϕ)g(\phi) depends on a cluster, but the formulation does not show dependence on the cluster. Also, definition of “cluster” is missing.
  2. Since sec 3.2 is not quite clearly written to me, I guess that you assume the adversarial perturbations for a given sample xx for a given defense ϕi\phi_i follows a Gaussian with mean rr and covariance Σ\Sigma. Is that correct? If so, the Gaussian assumption should be clearly emphasized with motivations of why you use Gaussian. Will assumptions of other distribution work? Is Gaussian important here? Empirical evidence would be beneficial.
  3. Here’s another question: if you assume g(ϕi)g(\phi_i) denotes Gaussian perturbation regarding ϕi\phi_i (according to equation 6, which include multiple ϕi\phi_i), then there should be NN Gaussians for NN defenses ϕ1,..,ϕN\phi_1,..,\phi_N? But there is just one rr and Σ\Sigma here.
  4. Also, line 156 indicates that qiq_i denote NN independent distribution on zz, then should there be NN Bernoulli distribution param α\alpha here?
  5. q(t)q^{(t)} seems not defined in line 159.
  6. Line 160: what does optimizing rr with higher zz mean? Should it be optimizing rr towards a high objective in eq. (6)?
  7. In line 143, the paper indicates that g(ϕ)g(\phi) follows a normal distribution, which means that g(ϕ)g(\phi) is a random vector. But in equation (8), g(ϕi)g(\phi_i) is treated like a deterministic vector. Should g()g() be inside the expectation?
  8. In E-step, the constant CC looks quite important in Equation (9), how to select it? What is the rationale for approximation with a constant?
  9. Logging in problem is one practical case for N-times evaluation, but is there any practical scenario for image misclassification?

问题

Please refer to the weakness part for concrete questions.

局限性

Discussed in Appendix A.

作者回复

Thank you for the valuable feedback, here's our response to the concerned problems. We will describe our EM framework more clearly and rigorously in response to concerned problems, and we will also update this part of the exposition in our paper.

Given a sample xRdx\in\mathbb{R}^d and its label y[K]y\in [K], a stochastic purification ϕ:RdRd\phi:\mathbb{R}^d\to \mathbb{R}^d and a deterministic classifier f:RdRKf:\mathbb{R}^d\to \mathbb{R}^K will classify it as f[ϕ(x)]f[\phi(x)]. Denotes the normalized gradient g(ϕ):=xL(f[ϕ(x)];y)/xL(f[ϕ(x)];y)2g(\phi):=\nabla_x \mathcal{L}(f[\phi(x)];y) /\lVert \nabla_x \mathcal{L}(f[\phi(x)];y) \rVert_2, we suspect that g(ϕ)g(\phi) may obey a multi-peaked distribution, where g(ϕ)g(\phi) can be divided into clusters that have high intraclass similarity and low interclass similarity.

Our goal is to find the cluster center with the highest attack success rate. At each step of the attack, we can observe NN g(ϕi)g(\phi_i) and Ai,i=1,...,N\mathcal{A}_i, i=1,...,N (whether fϕif\circ \phi_i is attacked or not), We identify the parameters in the distribution of g(ϕ)g(\phi) by maximizing the likelihood function of g(ϕi)g(\phi_i) and Ai\mathcal{A}_i.

We make the following assumptions about the distribution of g(ϕ)g(\phi): (1) There exists a master cluster of g(ϕ)g(\phi) that obeys a Gaussian distribution. Let the hidden variable zz denote whether g(ϕ)g(\phi) comes from this cluster, then p(g(ϕ)z=1)=N(r,Σ)p(g(\phi)|z=1) = \mathcal{N}(r,\Sigma) and the prior distribution of zz is the Bernoulli distribution with parameter α\alpha. The proportion α\alpha, mean rr, and variance Σ\Sigma of this cluster are the parameters of the underlying model that we desired. (2) We use a null information prior for whole g(ϕ)g(\phi), i.e., we assume that p(g(ϕ))=cp(g(\phi))=c and cc will be eliminated in subsequent analyses. We make these assumptions because the normal distribution is flexible in modeling similarity between g(ϕ)g(\phi), and the zero-information prior avoids over-designing our model. Furthermore, these assumptions facilitate subsequent theoretical derivations, and we will show how certain steps of the algorithm can be modified to accommodate other assumptions.

Since the hidden variable ziz_i for each g(ϕi)g(\phi_i) is unknown, we use qi(t)q_i^{(t)} as an approximation of the ziz_i posterior distribution at each stage tt. Due to the properties of the Gaussian mixture model, the update involves only the mean of qi(t)q_i^{(t)} (i.e., the posterior mean of ziz_i: E[zig(ϕi)]=p(zi=1g(ϕi))\mathbb{E}[z_i|g(\phi_i)]=p(z_i=1|g(\phi_i))), so in practice, Eqi(t)\mathbb{E}q_i^{(t)} can be viewed as an approximation of the posterior probability that g(ϕi)g(\phi_i) belongs to the main cluster. Then in M-step, Eqi(t)\mathbb{E}q_i^{(t)} is fixed and the parameters r,Σ,αr,\Sigma,\alpha are updated; in E-step, Eqi(t)\mathbb{E}q_i^{(t)} is updated to approximate p(zi=1g(ϕi))p(z_i=1|g(\phi_i)).

M-step. In M-step, maximizing our objective (Eq. 6 in the paper) can be achieved by gradient ascent in Eq. 8 in the paper, where the observed g(ϕ)g(\phi) is weighted by Eqi(t)\mathbb{E}q_i^{(t)}, and rr moves towards the weighted center.

E-step. As we discussed in Section 3.2.1 of the paper, we set Σ\Sigma as a hyperparameter, so we are only concerned with the update of rr, where the previously assumed constant cc is eliminated.

Eqi(t)=αN(g(ϕi);r(t),λ1I)c=Aexp(g(ϕi)r(t)222)=Aexp(cosg(ϕi),r(t))\mathbb{E}q_i^{(t)}=\frac{\alpha \mathcal{N}(g(\phi_i);r^{(t),\lambda^{-1}I})}{c}=A\exp(-\frac{\lVert g(\phi_i)-r^{(t)}\rVert_2^2}{2})=A\exp(\cos\langle g(\phi_i), r^{(t)} \rangle)

where AA denotes (not a same) constants independent of ii. In the last term of Eq. 8 in the paper, due to the α\alpha of the denominator, it will become i[exp(cosg(ϕi),r(t))g(ϕi)]/iexp(cosg(ϕi),r(t))\sum_i[\exp(\cos\langle g(\phi_i), r^{(t)} \rangle)g(\phi_i)]/\sum_i\exp(\cos\langle g(\phi_i), r^{(t)} \rangle) . We can find that this is a result weighted by a normalized similarity function w.r.t. g(ϕi),r(t)g(\phi_i), r^{(t)}, and the function expcos,\exp\cos\langle \cdot, \cdot \rangle is the result of our assumption.

Table1: DiffHammer in different assumption.

AttackDiffPureGDMPLM
ASRAvg.Wor.Avg.Wor.Avg.Wor.
DH253.573.456.076.978.087.9
DH53.875.257.972.583.391.4
  • Response to question 1: We will define g(ϕ)g(\phi) as the gradient and call a set with high intraclass similarity and low interclass similarity a cluster.
  • Response to question 2 & 8: We adopt the assumption of Gaussian distribution for flexibility and convenience, which provides a theoretical reference for our algorithm. We experimented with other customized similarity functions in the above E-step, e.g. we directly defined the similarity of g(ϕi)g(\phi_i) to rr as the expectation of the loss boost (the closer of these indicates the stronger the attack). As shown in Table 1 below, proper similarity function choices leads to good results in practice, which in theory corresponds to similar but different cluster modeling.
  • Response to question 3 & 4: Since r,Σr,\Sigma, and α\alpha are parameters of the underlying model, they are shared and unique among the g(ϕ)g(\phi).
  • Response to question 5: qi(t)q_i^{(t)} is an approximation of the ziz_i posterior distribution at each stage tt and only Eqi(t)\mathbb{E}q_i^{(t)} is involved and updated.
  • Response to question 6: Maximizing the target with Eq. 6 is equivalent to making rr close to the Eqi(t)\mathbb{E}q_i^{(t)}-weighted average g(phii)g(phi_i), which means that attacking a purifications with higher Eqi(t)\mathbb{E}q_i^{(t)}.
  • Response to question 7: In Eq. 8, g(ϕi)g(\phi_i) denotes the observed random variable and is therefore deterministic.
  • Misclassification of models can constrain the deployment of models in safety-critical domains, for example, misclassification in autonomous driving can lead to catastrophic consequences. Diffusion-based sanitization serves as a promising solution, but may be undermined by resubmit attacks. Therefore, we reveal this risk to inspire subsequent research.
评论

Thank the efforts of the authors in the rebuttal!

I still want to make sure I correctly understand the assumptions here.

"call a set with high intraclass similarity and low interclass similarity a cluster" is still not a rigorous and clear definition. Do you assume the gradients for successful attack follow a Guassian? And you just assume a single Gaussian since "r,Σr,\Sigma" are shared?

评论

Thank you for your question. To define the cluster, we follow the definition in 11 , where clusters are defined by "determine mm clusters (subsets) of individuals in II, and those individuals assigned to the same cluster are similar yet individuals from different clusters are different (not similar)".

Our assumption is that, for a class of purifications that can be attacked by the same adversarial noise, the gradients of purifications in this class follow a Gaussian distribution with mean rr. We estimate rr as an approximation to the adversarial noise. We set a null-informative prior on other purifications that are attacked (but not by rr) or that are not attacked. Thus, we have only a shared set of Gaussian distribution parameters.

In fact, it is not necessary to assume a Gaussian mixture model for all attacked purifications and assign mm sets of parameters (r1,.....rm;Σ1,...,Σm;α1,...,αmr_1,.... .r_m;\Sigma_1,... ,\Sigma_m;\alpha_1,... ,\alpha_m) where α1>α2...,>αm\alpha_1>\alpha_2... ,>\alpha_m. The reason for this is that the EM algorithm is locally convergent and we are only concerned with the cluster mean r1r_1 that has the largest proportion (α1\alpha_1). Purification from N(r1,Σ1)\mathcal{N}(r_1,\Sigma_1) is more likely to occur in the first few batches of observations due to the larger probability α1\alpha_1, which allows the estimated rr in EM algorithm to approach r1r_1 initially and optimize towards r1r_1 thereafter. Although the EM algorithm also has a probability of converging to a suboptimal, this also has a sufficient attack effect and avoids the additional overhead of maintaining mm sets of parameters. Therefore, we only assume a set of Gaussian parameters r,Σ,αr,\Sigma,\alpha to make our algorithm efficient and concise.

11 Duran B S, Odell P L. Cluster analysis: a survey[M]. Springer Science & Business Media, 2013.

评论

Thanks for the authors' efforts for further clarification! My concerns are now addressed and I raise the score to 6. I hope the authors can make the method part more clear and rigorous in revision.

评论

We sincerely appreciate your acknowledgment and your positive feedback on our work! It is extremely valuable for helping us strengthen the paper. We will make sure to clarify and enhance the rigor of the methods section in our revision.

审稿意见
5

Diffusion-based purification methods have gained recognition for their robustness against adversarial attacks. However, concerns arise regarding the adequacy of current evaluation methods, particularly in addressing the gradient dilemma inherent in these techniques. This paper introduces DiffHammer, an advanced evaluation framework utilizing an EM-based attack and N-time evaluation to overcome these challenges. DiffHammer identifies vulnerabilities in purification clusters more effectively than traditional methods, significantly enhancing the assessment of diffusion-based purification robustness. Their experiments demonstrate that DiffHammer outperforms existing approaches by identifying more adversarial samples and highlighting previously underestimated security risks.

优点

  1. This paper is well-sturctrued.

  2. They identified the limitations of the N + 1 evaluation for diffusion-based adversarial purification and proposed an EM-based attack and the 1 + N evaluation method, which demonstrated effectiveness.

  3. The theoretical foundation of this paper is comprehensive, with detailed proofs provided in the appendix. Extensive experiments further validate the reasonableness of the authors’ method, revealing underestimated risks of diffusion-based adversarial purification.

缺点

  1. The paper describes that when the purification process contains unshared vulnerabilities, existing EoT-based N-averaging can encounter the gradient dilemma. This gradient dilemma might lead to the inability to generate effective attack samples, resulting in insufficient evaluation of diffusion-based purification methods. However, the paper lacks theoretical proof or experimental validation for this gradient dilemma.

  2. The paper mentions that diffusion model-based purification methods like Diffpure, GDMP, and LM have issues with resisting resubmit attacks. However, to my knowledge, DiffSmooth[1], another diffusion model-based purification method, takes into account multiple denoised outputs for the same input sample when making predictions, and it claims to have a certifiably robust pipeline. This defense method appears to be more powerful, and the authors should consider including such methods when evaluating the robustness of diffusion-based adversarial purification techniques. Additionally, it would be valuable to investigate whether resubmit attacks still pose a problem for such robust defense methods.

[1] Zhang J, Chen Z, Zhang H, et al. {DiffSmooth}: Certifiably robust learning via diffusion models and local smoothing[C]//32nd USENIX Security Symposium (USENIX Security 23). 2023: 4787-4804.

问题

See weaknesses.

局限性

The authors have clearly described the limitations.

作者回复

Thank you for the valuable feedback, here's our response to the concerned problems.

Validation of the gradient dilemma

Summary: By conducting cluster analysis on the gradients of the purification process, we observed that the gradients have multiple clustering centers with low inter-cluster similarity, leading to what we term the "gradient dilemma" in attacks. This phenomenon is illustrated using a simple Gaussian mixture model, suggesting that the gradient dilemma may stem from non-clustered features in the data.

For our analysis, we sampled 500 gradients for each of the defenses. We employed a Gaussian mixture model (GMM) as the clustering algorithm, consistent with our paper's methodology. The optimal number of clusters is determined using the Akaike Information Criterion (AIC) within the range of 1 to 5. Our experiment revealed that 47% of the sampled gradients (across 3 defenses) possess more than one clustering center as shown in Rebuttal Figure 1. For these samples, we further analyzed the distribution of the cosine similarity between the top two cluster centers as shown in Rebuttal Figure 2, which was often in (-0.2 \sim 0.3), thereby confirming the presence of the gradient dilemma.

To investigate the origins of the gradient dilemma, we constructed a Gaussian mixture toy example where the two diagonal components belong to the same category. In this scenario, diffusion-based purification pulls samples slightly towards the origin before diffusing them back into the data distribution. Given that the optimal classifier is a heteroscedastic classifier aligned with the coordinate axes, a sample's gradient direction is either horizontal or vertical, depending on which cluster the sample is closer to.

We demonstrated the probability of a sample's most frequent gradient direction in Rebuttal Figure 3, where values nearing 50% indicating the presence of a gradient dilemma. Simulation results revealed that samples along the diagonal experience severe gradient dilemmas, which is consistent with intuition. In this toy example, the gradient dilemma arises from a non-clustered data distribution, which imposes divergent gravitational pulls for purification. Consequently, we hypothesize that real-world data distributions may also exhibit non-clustered features. For instance, in the misclassification of a cat as a dog, considering the existence of various dog breeds, it is more efficient and effective to distort the features of one specific breed rather than all breeds collectively.

Analysis of DiffSmooth vulnerability

Summary: The robustness of DiffSmooth benefits from majority voting, a mechanism that enables certifiable robustness. However, its smoothed classifier remains vulnerable to DiffHammer. Although DiffSmooth is robust to resubmit attacks, this occurs at the expense of significant computational cost.

DiffSmooth employs a smoothing classifier in the inner loop and a certification algorithm in the outer loop. The inner loop's smoothing classifier can be viewed as an aggregation of multiple 1-step DiffPure processes. We evaluated DiffSmooth's performance under the 2\ell_2-norm using a ResNet-110 model trained with Gaussian smoothing, adhering to the paper's parameter settings. As illustrated in Figure 1, DiffSmooth's classifier alone is not sufficiently robust; DiffHammer achieves worst-case robustness of 57.0% and 86.7% (out of 10 evaluations) with perturbation budgets of 0.5 and 1, respectively.

Table 1: Experiment results on DiffSmooth classifier

Attackr=0.5r=0.5r=1r=1
ASRAvg.Wor.Avg.Wor.
BPDA25.646.950.186.9
PGD34.053.953.089.8
DA25.745.749.486.1
DH36.857.654.789.8

The outer loop's certification algorithm derives robustness through majority voting, providing a lower bound on DiffSmooth's robustness and resisting resubmit adversarial samples with small attack success rate. Although majority voting suppresses resubmit attacks, it encounters several issues:

  1. Achieving theoretical certification requires a vast number of samples (N = 100,000), limiting its practicality.
  2. DiffSmooth can falsely certify a class when DiffHammer submits adversarial samples with higher ASR. Our experiments show that DiffSmooth will even falsely certify DiffHammer-generated adversarial samples with a radius of 0.175.
审稿意见
6

This paper reveals two limitations of Eot-based attacks in diffusion-based purification: gradient dilemma and underestimation of resubmit attacks' risk. The authors introduce N-time evaluation to evaluate the risk of resubmitted attacks sufficiently. Then, they propose an EM-based attack to solve the gradient dilemma, which achieves satisfactory results compared to both Eot-based and transfer-based attacks. Finally, this paper guides the future improvement of diffusion-based purification.

优点

  1. This paper has well originality: a novel N-time evaluation is introduced to sufficiently evaluate diffusion-based purification, and an effective EM-based attack is proposed to overcome the shortcomings of existing attacks.
  2. The authors revealed two important obstacles in the evaluation of diffusion-based purification and the causes are clearly explained. Moreover, they propose effective evaluation and attack methods to solve the revealed obstacles.

缺点

In experiments:

  • In line 232, the author would better explain why they chose "BPDA [1], PGD [21], and AA [6]" as baselines. The published years of [1], [6], and [21] are 2018, 2020, and 2017, it seems these methods are relatively old works. The authors would better consider a comparison with newer methods, such as "Diffusion models for adversarial purification. ICML 2022, https://arxiv.org/abs/2205.07460".
  • In lines 239 and 240, the author would better provide the measuring unit of time.
  • The authors would better add horizontal and vertical labels for the figures in their paper, including figures 2, 4, 5, and 6.
  • In line 268, "as shown in DiffHammer retains the memory of...", it seems to be "as shown in Figure 4, DiffHammer retains the memory of...". In addition, the author would better explain why the transfer-based attack DTI only appears in Table 1 and is not compared in Figure 5 and Figure 4. In subsection 4.3, the authors also don't analysis DTI.
  • In line 253, the author would better explain how is "14%" and "28%" calculated or obtained from the experimental results.

问题

Please check weaknesses part.

局限性

I think that the authors have adequately addressed the limitations mentioned in this paper and the potential negative societal impact of their work is controllable.

作者回复

Thank you for the valuable feedback, here's our response to the concerned problems.

Comparison with newer methods

Summary: We adhere to the literature's naming conventions for these methods, but our evaluation incorporates the enhancements proposed in 1,21,2 (2023), upgrading PGD and AA to SOTA evaluation with exact gradients. Additionally, we test complementary attacks.

Stochastic and iterative algorithms for diffusion-based purification yield challenging gradients, so a line of works aimed at approximating gradients in attack algorithms. This includes the Adjoint method 33 and Joint methods (score/full) 44 . 1,21,2 obtained exact gradients of DiffPure and GDMP via computational graph reconstruction, and we further derive exact gradients for the optimization in LM using the Hessian vector product trick. It was shown that approximate gradients lead to inadequate robustness evaluation in 1,21,2 . Therefore, we utilize exact gradient algorithms as a better baseline, with complementary experiments on approximate gradients validating this point. We will clarify our evaluation in the paper.

MethodDiffPureGDMPLM
Adjoint41.4/68.042.3/59.6N/A
Joint (full)35.7/62.737.3/54.953.1/73.2
Joint (score)22.3/56.417.5/43.036.0/60.9
Exact Gradient46.7/69.050.6/63.382.0/90.4
DH53.8/75.257.9/72.583.3/91.4

Transfer-based attack DTI

We include a demonstration and discussion of DTI, a representative transfer-based attack approach. We will compare DTI's performance with other algorithms in Figures 4 and 5 of the paper. DTI performs well in transfer-based attacks, likely due to its emphasis on translation and transformation (scaling and affine) invariance. Since most stochastic purifications do not share vulnerabilities, assuming global optimality can invalidate some transfer-based algorithms (VMI and CWA). Whereas DTI does not rely on this assumption, but it only considers simple transformations such as translation and scaling, which limits its performance.

Typos and presentation issues

  • Since the time spent per step is comparable for the different methods, we directly report the number of iterations required to reach 90% of the optimal metric. We will unify the statements as iterations rather than time to avoid ambiguity.
  • We define the robustness of the defense as the proportion of data that is not misclassified in 10 resubmit attacks, which is 1-Wor.ASR (the proportion of attacks that succeed at least once).
  • We will add horizontal and vertical axis labels to the image to enhance legibility.
  • We will check and correct typos in the paper.

11 Robust Evaluation of Diffusion-Based Adversarial Purification, ICCV, 2023.

22 DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification, NeurIPS, 2023.

33 Diffusion models for adversarial purification, ICML, 2022.

44 Adversarial purification with score-based generative models, ICML, 2021.

评论

Thank you for your responses. I think it solves most of my concerns, and I will keep my score.

评论

We sincerely appreciate your acknowledgment and your positive feedback on our work!

作者回复

Thank you for the valuable feedback, here's our response to the concerned problems. We have provided some additional images attached to the pdf.

最终决定

The authors propose a sufficient and efficient attack named DiffHammer against diffusion-based purification. The reviewers agree that the paper makes interesting analysis and conclusions and please make sure to incorporate the improvements in the final version of the paper/