Certified Defense Against Complex Adversarial Attacks with Dynamic Smoothing
A novel approach based on radnomised smoothing to make vision classifiers robust against complex attacks.
摘要
评审与讨论
This paper proposes a generalization of Randomized-Smoothing-based robustness certification. It first claims a very general result (Theorem 4.3), which states that if one applies randomized smothing (Cohen et al 2019) with the added Gaussian noise transformed by an arbitrary (invertible) function p, then the resulting smoothed classifier will enjoy certified robustness under a metric determined by Precisely, for a smoothed classifier:
if
Then, for any perturbation such that , we have that
This general result is then used to prove a more specific smoothing certificate (Lemma 4.5), claiming that if one applies randomized smoothing with anisotriopic Gaussian noise under an arbitrary covariance matrix , then this results in certified robustness under the Mahalanobis metric defined by :
This more specific anisotropic smoothing result is proposed as the basis for a practical method, D-Smooth, for certifiably robust classification. In D-Smooth, the smoothing covariance matrix is determined as the empirical covariance of adversarial attack perturbations computed from samples in the dataset. Note that because different attacks will result in different covariance matrices, this allows the smoothing distribution to be tailored to a specific attack or threat model. Experiments are conducted on CIFAR-10 and ImageNet, using an attack procedure that combines an L_2 and an L_infinity attack.
优点
The particular idea of smoothing using on a covariance matrix derived specifically from the empirical distribution of adversarial attack directions is, to my knowledge, novel, and may be promising.
The presentation of the theoretical results is clear for the most part.
缺点
Theoretical Claims
The more-general theoretical claim of this paper (Theorem 4.3) is incorrect, while the more specialized claim for anisotropic Gaussian smoothing (Lemma 4.5) can be proved directly with a much simpler proof than the one provided, and in fact appears in prior works.
- Theorem 4.3 is incorrect: one can construct a counterexample in one dimension. In particular, let:
Note that is deterministic and invertible (in fact, ), and let:
(Note, for later, that:
undefined问题
See " Experiments/Results Section" above under Weaknesses. There are several aspects of the experiments that could be clarified:
- It is currently unclear what quantity is being reported as "Certified Radius" for each defense method ( L1, L2, or Mahalanobis Distance radius). (It would be better to report "Certified Volumes" as in (Pfrommer et al 2023; Tecot 2021))
- What does "maximum perturbation 0.5" mean for the Square Attack?
- Are the attack perturbations used to compute based on attacks on the base classifier or the smoothed classifier?
- What models were used for the baseline smoothing techniques in the experimental section? Were these models also adversarially trained?
- How is computed with a low-rank approximation of ?
Q: It is currently unclear what quantity is being reported as "Certified Radius" for each defense method ( L1, L2, or Mahalanobis Distance radius). (It would be better to report "Certified Volumes" as in (Pfrommer et al 2023; Tecot 2021))
A: Thank you for pointing this out. In future work, we will use the volumes of the certified regions for fair experimental comparison.
Q: What does "maximum perturbation 0.5" mean for the Square Attack?
A: We implement the Square Attack following this repository https://github.com/fra31/auto-attack . Here, the square attack is the maximum perturbation allowed on the pixels, which are within range [0, 255].
Q: Are the attack perturbations used to compute based on attacks on the base classifier or the smoothed classifier?
A: They are based on attacks on the base classifier.
Q: What models were used for the baseline smoothing techniques in the experimental section? Were these models also adversarially trained?
A: Yes, we always use the same base classifier for all smoothing techniques, which was adversarially trained as described in our submission.
Q: How is computed with a low-rank approximation of ?
A: If the given matrix is not positive-definite, one can compute the pseudo-inverse or Moore-Penrose inverse instead.
The paper investigates the effectiveness of dynamic smoothing as a defense mechanism against complex adversarial attacks in machine learning models. The authors assert that existing defense strategies often fall short in providing certified robustness, particularly in the face of sophisticated adversarial inputs. Their key contributions include:
- Theoretical Analysis: The paper establishes a dynamic smoothing framework that adapts the noise level based on input complexity, proving that this approach enhances the model's certified robustness against a variety of adversarial attacks.
- Empirical Findings: The paper demonstrate through extensive experiments that the proposed dynamic smoothing method significantly improves robustness compared to traditional static smoothing techniques, while maintaining performance on both easy and hard samples.
- Proposed Solutions: The paper introduce specific training strategies, including adaptive noise levels and robust certification techniques, which allow for effective defense against complex attacks. These solutions not only enhance certified robustness but also improve overall model performance.
优点
The paper introduces a approach to dynamic smoothing that adapts noise levels based on input complexity, distinguishing it from traditional static methods and offering a new direction in randomized smoothing.
缺点
-
The scope of this paper is limited. Its primary contribution, anisotropic Gaussian randomized smoothing, has already been addressed by ANCER [1], which generalized Gaussian randomized smoothing to anisotropic Gaussian distributions and provided a robustness certification. This significantly diminishes the novelty of the current work. Moreover, the paper does not reference this closely related work [1].
-
The writing lacks clarity, as exemplified by the inclusion of the phrase "Code: [removed for review]" in the abstract; this should be omitted. The experimental setup is inadequately explained. The paper needs to clarify why the Mahalanobis-norm certified accuracy of DSMOOTH is compared to the L1 and L2-norm certified accuracy of other methods. It should also discuss the rationale for using Mahalanobis Distance and identify practical scenarios where its robustness is applicable, assessing DSMOOTH's performance in those contexts.
-
The benefits introduced by the modified noise are not well-presented. Finally, unnecessary text in the summary should be removed.
[1] ANCER: Anisotropic Certification via Sample-wise Volume Maximization. PMLR 2022
问题
- Add reference to the work [1].
- Clarify why the Mahalanobis-norm certified accuracy of DSMOOTH is compared to the L1 and L2-norm certified accuracy of other methods.
- Discuss the rationale for using Mahalanobis Distance and identify practical scenarios where its robustness is applicable, assessing DSMOOTH's performance in those contexts.
[1] ANCER: Anisotropic Certification via Sample-wise Volume Maximization. PMLR 2022
Q: Add reference to the work [1]
A: We are happy to add this reference to our work, and to compare against [1], as well as other relevant references that are not currently discussed in our manuscript.
Q: Clarify why the Mahalanobis-norm certified accuracy of DSMOOTH is compared to the L1 and L2-norm certified accuracy of other methods.
A: Our overall aim with the experiments was to show superior performance in terms of the certified accuracy. However, we understand that this comparison is not very significant, since it would have been better to use other metrics, such as the volume, as discussed in [1].
Q: Discuss the rationale for using Mahalanobis Distance and identify practical scenarios where its robustness is applicable, assessing DSMOOTH's performance in those contexts.
A: Thank you for pointing this out. We are happy to improve our submission, by adding a clear discussion on this point. Our rationale for using the Mahalanobis distance was aligned with related work,, e.g., [1]. Standard p-norm certificates represent a worst-case scenario., since they constraint the certificate to the p-closest adversary. However, the decision boundaries of general classifiers may be complex and nonlinear, and standard norms may be uninformative in terms of the shape of decision boundaries. On the other hand, the Mahalanobis distance is defined relative to the distribution of the adversarial perturbations, using the covariance matrix to adjust for the spread and correlations of the perturbations. With this submission, our goal was to show that the proposed distance allows us to derive certificates that are more informative w.r.t. the decision boundary. We believe that practical scenarios where this framework is applicable are camera-based smart vision systems, where physical adversarial attacks can successfully mislead perception models. We understand, however, that our work at this stage is insufficient to support these claims.
[1] ANCER: Anisotropic Certification via Sample-wise Volume Maximization. PMLR 2022
This paper proposes Dynamic Smoothing (DSMOOTH), an extension of the randomized smoothing framework aimed at enhancing robustness against complex adversarial attacks. Traditional randomized smoothing methods rely on isotropic Gaussian noise, which limits their effectiveness against multi-norm and structured adversarial threats. Authors overcome these limitations by employing a broader range of noise distributions and using Mahalanobis distance to define probabilistic robustness guarantees, making it more adaptable to localized and non-uniform attacks. The authors validate DSMOOTH’s effectiveness through experiments on CIFAR-10 and IMAGENET, where it demonstrates significantly improved certified accuracy against state-of-the-art baselines in multi-attack scenarios.
优点
The core strength of the paper is its strong theoretical foundation, presenting a novel idea of using Mahalanobis distance to extend the randomized smoothing framework, enabling it to handle complex adversarial attacks. It demonstrates originality by expanding randomized smoothing to incorporate a range of noise distributions beyond isotropic Gaussian noise, thereby providing robustness against multi-norm, multi-type adversarial threats, which traditional smoothing methods struggle to defend against. This is a valuable and important direction to explore in the context of randomized smoothing. The experimental results also support the theoretical claims, further adding to the strength of the paper.
缺点
While the paper presents promising theoretical advancements, several issues weaken its overall contribution and lead me to lean towards rejection. Although the experimental results support the theoretical claims, the setup itself lacks rigor. For instance, the evaluation is conducted on only 500 images from the CIFAR-10 test set rather than the full test set, potentially skewing the robustness results and limiting the generalizability of the findings. This choice is not well-justified, especially given the scale of CIFAR-10 and the availability of complete test sets. Additionally, the authors choose not to compare against baselines designed for -norm defenses with , but the reasons for this omission are vague. This exclusion undermines the evaluation since these defenses are common benchmarks in adversarial robustness research. Further, while the paper aims to address robustness in a multi-attack setting, it only tests a single combination of attacks (Square Attack + FGSM). The limitation to just one multi-attack scenario significantly weakens the experimental results, as the approach’s effectiveness under diverse, real-world multi-attack combinations remains unverified. Expanding the experimental setup to include multiple combinations and comprehensive baseline comparisons would strengthen the paper’s contributions.
One more thing I would like to point out is that, the authors discuss isotropic gaussian distribution (this has been referred to as isotopic consistently in the paper which is wrong terminology), something similar has been discussed in few other works in the context of randomized smoothing [1,2], similaritities and/or dissimilarities with those methods is not discussed in the related works section.
References: [1] Hanbin Hong and Yuan Hong. Certified adversarial robustness via anisotropic randomized smoothing. arXiv preprint arXiv:2207.05327, 2022. [2] Francisco Eiras, Motasem Alfarra, M Pawan Kumar, Philip HS Torr, Puneet K Dokania, Bernard Ghanem, and Adel Bibi. Ancer: Anisotropic certification via sample-wise volume maximization. arXiv preprint arXiv:2107.04570, 2021.
问题
-
Could the authors provide the evaluation results of their method on the entire CIFAR-10 test set? Using the full test set is standard in the field and would provide stronger evidence for the method's generalizability.
-
The authors exclude baselines based on -norm defenses with . Could the authors provide a more detailed justification for this choice? Although this is briefly mentioned in Section 5.1 under Baselines, it is not very clear to the reader why this choice was made. Specifically, could the authors expand on the line: “We do not consider randomized smoothing techniques with certification guarantees in terms of norms with , since impossibility results are known for increasing .”?
-
Under the multi-attack setting, only one combination of attacks (Square Attack + FGSM) was evaluated. Testing multiple attack combinations could better demonstrate DSMOOTH's robustness. Are there plans to expand the experimental setup to include additional attack combinations?
-
Other works have discussed anisotropic approaches in the context of randomized smoothing, such as Hong and Hong (2022) and Eiras et al. (2021). Could the authors elaborate on the similarities or differences between their approach and these methods, particularly in the related works section?
Q: Could the authors provide the evaluation results of their method on the entire CIFAR-10 test set? Using the full test set is standard in the field and would provide stronger evidence for the method's generalizability.
A: Our experiments are indeed carried out on the entire CIFAR-10 dataset, as in previous related work. Lines 398-400 are incorrect, and they will be removed from the final version of this submission.
Q: The authors exclude baselines based on -norm defenses with . Could the authors provide a more detailed justification for this choice? Although this is briefly mentioned in Section 5.1 under Baselines, it is not very clear to the reader why this choice was made. Specifically, could the authors expand on the line: “We do not consider randomized smoothing techniques with certification guarantees in terms of norms with , since impossibility results are known for increasing ”?
A: This statement refers to Corollary 7.4 by [3], showing that, for any smoothing scheme satisfying Def. 71., the largest possible certified radius decreases w.r.t. . We understand that this argument is vague and that additional discussion is needed.
Q: Under the multi-attack setting, only one combination of attacks (Square Attack + FGSM) was evaluated. Testing multiple attack combinations could better demonstrate DSMOOTH's robustness. Are there plans to expand the experimental setup to include additional attack combinations?
A: Thank you for raising this point. We are happy to incorporate more attacks (or attack combinations) in a future re-submission.
Q: Other works have discussed anisotropic approaches in the context of randomized smoothing, such as Hong and Hong (2022) and Eiras et al. (2021). Could the authors elaborate on the similarities or differences between their approach and these methods, particularly in the related works section?
A: Hong and Hong (2022): The certification guarantees in this work (Hong and Hong (2022), Thm. 4.1) provide guarantees for the case where the smoothing distribution is of the form N(0, Sigma), where Sigma is a diagonal matrix. These guarantees are different from our work, which derives probabilistic guarantees for any positive-definite matrix Sigma. Furthermore, Hong and Hong (2022) differs from our work in the way that the matrix Sigma is chosen. Eiras et al. (2021) differs from our work in the way that the matrix Sigma is obtained. In Eiras et al. (2021), the matrix Sigma is computed by solving an optimization problem as in Eq. (1).
[1] Hanbin Hong and Yuan Hong. Certified adversarial robustness via anisotropic randomized smoothing. arXiv preprint arXiv:2207.05327, 2022. [2] Francisco Eiras, Motasem Alfarra, M Pawan Kumar, Philip HS Torr, Puneet K Dokania, Bernard Ghanem, and Adel Bibi. Ancer: Anisotropic certification via sample-wise volume maximization. arXiv preprint arXiv:2107.04570, 2021. [3] Greg Yang, et. al. Randomized Smoothing of All Shapes and Sizes, ICML 2020
We would like to thank the reviewers for taking the time to provide insightful feedback. We understand that the current submission has various shortcomings, including lack of novelty, as well as the other issues pointed out by the reviewers. For this reason, we have decided to withdraw this submission.