6.2

/10

Rejected5 位审稿人

最低5最高8标准差1.0

3.4

置信度

正确性2.6

贡献度2.8

表达3.0

ICLR 2025

DDAD: A Two-Pronged Adversarial Defense Based on Distributional Discrepancy

Jiacheng Zhang,Benjamin I. P. Rubinstein,Jingfeng Zhang,Feng Liu

OpenReview PDF

提交: 2024-09-28更新: 2025-02-05

摘要

关键词

adversarial defenseadversarial robustnessaccuracy-robustness trade-off

评审与讨论

审稿意见

评分: 6置信度: 42024-10-21

The paper presents a new adversarial defense mechanism called Distributional-Discrepancy-based Adversarial Defense (DDAD), which leverages statistical adversarial data detection (SADD). Unlike previous SADD-based methods that discard inputs detected as adversarial examples (AEs), DDAD employs a two-pronged process, in which clean examples (CEs) are sent directly to a classifier, and AEs are denoised before classification. The method is founded on the concept of Maximum Mean Discrepancy (MMD), which measures the distributional discrepancy between CEs and AEs. In the training phase, DDAD learns a denoiser and optimizes MMD-based statistics to distinguish between CEs and AEs. Extensive experiments show that the proposed DDAD method outperforms existing adversarial defense techniques in both clean and robust accuracy across CIFAR-10 and ImageNet-1K, including against unseen transfer attacks.

优点

The paper provides a theoretical rationale behind the efficacy of minimizing distributional discrepancies for adversarial examples. The clear connection between adversarial risk and distributional discrepancy minimizes arbitrary choices for its design.
Combining statistical methods (MMD) with denoising brings a fresh perspective compared to standard approaches like adversarial training (AT) or adversarial purification (AP), demonstrating improved robustness and clean accuracy.
The DDAD effectively combines statistical detection (SADD) and denoising processes, which retains clean accuracy while defending against adversarial attacks using a straightforward and computationally less-intensive method compared to purely denoiser-based defenses.

缺点

Identifying discrepancies using MMD is highly reliant on batch-wise processing, which introduces practical limitations, especially in single-instance or real-time applications. Although this is discussed in the ablation studies and limitations, no immediate solutions are provided beyond recommending larger batch sizes.
The method shows a drop in performance when batches contain a mixture of CEs and AEs. This could be problematic in domains where AEs are sparsely distributed, and include cases where the relative mixture is unknown beforehand.
The method is evaluated only on image classification tasks (CIFAR-10, ImageNet-1K). It would have been valuable to explore the effect of DDAD on more diverse datasets, such as medical imagery, autonomous driving, or natural language processing tasks, which may hinge on differently structured features.
While the method is holistic, it introduces a somewhat complex series of steps (MMD optimization, denoiser training, and dynamic thresholding). Both the denoiser and classifier need careful training and calibration, making reproducibility potentially challenging.

问题

You propose MMD as the core of your discrepancy measurement. How would alternative statistical measures (e.g., Wasserstein distance, energy distance) perform, and what drawbacks are there to switching from MMD?
The two-pronged method relies on a clearly defined distributional discrepancy threshold (t = 0.05). Could you provide more details regarding the sensitivity of the model to this hyperparameter? Would adaptive or learned thresholds enhance robustness?

评论- Response to Reviewer Xg2m (part 1)

2024-11-20

Q1. Identifying discrepancies using MMD is highly reliant on batch-wise processing, which introduces practical limitations, especially in single-instance or real-time applications. Although this is discussed in the ablation studies and limitations, no immediate solutions are provided beyond recommending larger batch sizes.

Reply: Thanks for your insightful comment! The key message we want to deliver here is: batch-wise evaluation is not impractical, but it will have some costs:

Proposed solution: for user inference, single samples provided by the user can be dynamically stored in a queue. Once the queue accumulates enough samples to form a batch, our method can then process the batch collectively using the proposed approach.
Costs for this solution: a direct cost for this solution is the waiting time, as the system must accumulate enough samples (e.g., 50 samples) to form a batch before processing. However, in scenarios where data arrives quickly (e.g., Google's terminal), the waiting time is typically very short (e.g., less than 2 seconds), making this approach feasible for many real-time applications. For applications with stricter latency requirements, the batch size can be dynamically adjusted based on the incoming data rate to minimize waiting time. For instance, if the system detects a lower data arrival rate, it can process smaller batches to ensure timely responses.
Comparison with current SOTA AP methods: diffusion-based AP methods can support single samples as input, but the inference speed of diffusion-based AP is relatively slow (e.g., processing one image can take several seconds) [1] [2]. Thus, diffusion-based AP methods can hardly be applied to a system where data arrives quickly, but our proposed method can, which further demonstrates that batch-wise evaluation is not impractical.

[1] Diffusion Models for Adversarial Purification, ICML 2022.

[2] Robust Evaluation of Diffusion-Based Adversarial Purification, ICCV 2023.

Q2. The method shows a drop in performance when batches contain a mixture of CEs and AEs. This could be problematic in domains where AEs are sparsely distributed, and include cases where the relative mixture is unknown beforehand.

Reply: Thanks for your insightful comment!

When batches contain a mixture of CEs and AEs, the performance of all baseline methods will drop. In our original manuscript, we conducted an ablation study on mixed data batches (the proportion of AEs in a batch ranges from 10% to 100%) and we found that our method can still outperform all baseline methods across all mixed proportions. Therefore, even if a batch contains a few adversarial examples, our method can still perform well. Please kindly check Appendix D.2 in our original manuscript for more details.
When AEs are sparsely distributed, one possible threat is that the entire batch will be classified as a 'clean batch'. However, we would like to highlight that it can be controlled. One of the benefits of using a non-parametric two-sample test is that the false alarm rate can be controlled by users [1]. In this paper, we set the maximum false alarm rate to be 5% (i.e., we allow a maximum of 5% adversarial examples in a batch). That is to say, we can also set the maximum false alarm rate to be 0%, then if a batch contains only one or two adversarial examples, the entire batch will be classified as an 'adversarial batch'.

[1] Maximum Mean Discrepancy Test is Aware of Adversarial Attacks. ICML 2021.

评论- Response to Reviewer Xg2m

2024-11-20

Thank you so much for your positive comments! It is our pleasure that our theoretical motivation, the novelty of our method and our experimental results can be recognized. Your thorough review and comments are very important to the improvement of our work! Please find our replies below.

评论- Response to Reviewer Xg2m

2024-11-20

In the end, we want to thank you for providing this valuable feedback to us, which is always important to hear opinions from other experts in this field. If you feel there are still unclear points regarding our paper, please discuss with us in the author-reviewer discussion phase. If you feel that your concerns are well-addressed, we do hope for an updated rating and new comments (if possible). Thanks again!

评论- Response to Reviewer Xg2m (part 3)

2024-11-20

Q6.1 The two-pronged method relies on a clearly defined distributional discrepancy threshold (t = 0.05). Could you provide more details regarding the sensitivity of the model to this hyperparameter?

Reply: Thanks for your question and sorry for the confusion!

We realize that the statement in line 323: 'In practice, we set the threshold t = 0.05 by default' will lead to unnecessary misunderstanding and cause confusion. Here t = 0.05 serves as a general threshold for clean data, but might not be the optimal threshold to separate CEs and AEs. To avoid confusion, we have decided to remove this sentence from our original manuscript.

In our original manuscript, we select the threshold based on the experimental results on the validation data. To provide further clarity and transparency, we evaluated the impact of different threshold values on DDAD's performance. Please kindly check the experiment results below:

Table 1: Sensitivity of DDAD to the threshold values of MMD-OPT on CIFAR-10. We report clean and robust accuracy (%) against adaptive white-box attacks ( $\epsilon = 8/255$ ). The classifier used is WRN-28-10.

Threshold Value	Clean	PGD+EOT( $\ell_\infty$ )	PGD+EOT( $\ell_2$ )	AutoAttack( $\ell_\infty$ )	AutoAttack( $\ell_2$ )
0.05	94.16	66.98	73.40	72.21	85.96
0.07	94.16	66.98	73.40	72.21	85.96
0.1	94.16	66.98	73.40	72.21	85.96
0.5	94.16	66.98	84.38	72.21	85.96
0.7	94.16	66.98	84.38	72.21	85.96
1.0	94.16	64.75	84.38	72.21	85.96

Table 2: Sensitivity of DDAD to the threshold values of MMD-OPT on ImageNet-1K. We report clean and robust accuracy (%) against adaptive white-box attacks ( $\epsilon=4/255$ ). The classifier used is RN-50.

Threshold Value	Clean	PGD+EOT( $\ell_\infty$ )
0.01	76.61	53.75
0.015	76.61	53.75
0.02	78.61	53.75
0.025	78.61	53.75
0.03	78.61	0.46
0.04	78.61	0.46
0.05	78.61	0.46

Please note that generating AEs using attacks with many loops (e.g., PGD+EOT and AutoAttack) is too time-consuming for a large-scale dataset (i.e., ImageNet-1K), so we only report the results on PGD+EOT ( $\ell_\infty$ ) given limited time for the rebuttal.

In our work, a threshold value of 0.5 is selected for CIFAR-10 and 0.02 is selected for ImageNet-1K. It is reasonable to use a smaller threshold for ImageNet-1K because the distribution of AEs with $\epsilon = 4/255$ (i.e., AEs for ImageNet-1K) will be closer to CEs than AEs with $\epsilon = 8/255$ (i.e., AEs for CIFAR-10). Intuitively, when $\epsilon$ decreases to 0, AEs are the same as CEs (i.e., the distribution of AEs and CEs will be the same).

Lastly, we apologize that we did not specify the threshold values we used in the experiment settings. We would like to thank you for pointing this issue out to us. We will clarify the selection of the threshold values in the updated version of our manuscript to avoid confusion.

Q6.2 Would adaptive or learned thresholds enhance robustness?

Reply: Thanks for your insightful question! Yes, it is possible to further enhance the robustness by using adaptive or learned thresholds. Intuitively, using adaptive/learned thresholds can separate CEs and AEs in a more precise way (e.g., can be adaptive to the strength of the attack or the type of the attack, etc). We believe it is worth to be further investigated and we leave it as future work.

评论- Response to Reviewer Xg2m (part 2)

2024-11-20

Q3. The method is evaluated only on image classification tasks (CIFAR-10, ImageNet-1K). It would be valuable to explore the effect of DDAD on more diverse datasets, such as medical imagery, autonomous driving, or natural language processing tasks, which may hinge on differently structured features.

Reply: Thanks for your insightful suggestion!

Our problem setting is based on image classification tasks. Specifically, we aim to improve the model performance on the robust image classification.
We agree with you that our work will be more valuable if DDAD can be extended to other tasks. However, to extend our method to different domains, we must change our problem settings to make our work self-contained. Therefore, in our paper, we only focus on improving the robustness of the model on image classification tasks. We leave extending our work to other domains as future work.
We also agree with you that it is worth testing our method on more diverse datasets. Therefore, following [1], we test our method on Street View House Numbers (SVHN), which is completely different from CIFAR-10 and ImageNet-1K. We aim to demonstrate that our method can work well on various image domains. Please kindly check the experiment results below and we will include them in the updated version of our manuscript:

Table 1: Clean and robust accuracy (%) against adaptive white-box attacks $\ell_\infty (\epsilon = 8/255)$ on SVHN. We show the most successful defense in bold.

Method	Classifier	Clean	Robust
[1]	WRN-28-10	95.55	63.05
[2]	ResNet-18	93.08	52.83
[3]	WRN-28-10	92.87	56.83
[4]	WRN-28-10	94.15	60.90
Ours	WRN-28-10	96.57	69.45

[1] Robust Evaluation of Diffusion-Based Adversarial Purification, ICCV 2023.

[2] Reducing excessive margin to achieve a better accuracy vs. robustness trade-off. ICLR 2022.

[3] Uncovering the limits of adversarial training against norm-bounded adversarial examples. ArXiv, abs/2010.03593, 2020.

[4] Improving robustness using generated data. NeurIPS 2021.

Q4. While the method is holistic, it introduces a somewhat complex series of steps (MMD optimization, denoiser training, and dynamic thresholding). Both the denoiser and classifier need careful training and calibration, making reproducibility potentially challenging.

Reply: Thank you for your comment! We agree that our method involves a series of steps. However, these steps are carefully designed. Each component contributes to the holistic nature of the method, making it more effective overall.

To address concerns about reproducibility, we made efforts to reproduce the results of our work:

We provided a detailed description of each step in the paper, including hyperparameter settings, model architectures, and training strategies.
To ensure transparency, we have released our code on the anonymous GitHub (https://anonymous.4open.science/r/DDAD-DB60), along with step-by-step instructions for reproducing the experiments.
We conducted our experiments multiple times and reported the averaged results.

Q5. You propose MMD as the core of your discrepancy measurement. How would alternative statistical measures (e.g., Wasserstein distance, energy distance) perform, and what drawbacks are there to switching from MMD?

Reply: Thank you for your insightful question!

Compared to Wasserstein distance, MMD has two major advantages: (1). The estimator of MMD is unbiased, while the estimator of Wasserstein distance is biased. Therefore MMD estimator is more accurate than Wasserstein estimator, especially when the dimension of the data is large. (2). Wasserstein distance requires solving a transportation problem, which is computationally more expensive than MMD, which means using Wasserstein distance will be slow for large datasets.
Compared to energy distance, MMD offers greater flexibility due to its use of kernel functions, which allow it to capture intricate differences between distributions and adapt to various tasks by selecting appropriate kernels. It is particularly well-suited for high-dimensional data, as its reliance on embeddings in a reproducing kernel Hilbert space (RKHS) mitigates issues like the "curse of dimensionality" that can affect energy distance. Moreover, MMD is sensitive to higher-order statistics of distributions, enabling it to capture subtle discrepancies beyond mean and variance, which energy distance primarily focuses on.

Overall, while Wasserstein and energy distances are valuable metrics, MMD was chosen due to its balance between computational efficiency and flexibility in handling complex distributions. The kernel choice in MMD also provides an additional adaptability, allowing us to tune the method for specific tasks or datasets. Exploring alternative measures like Wasserstein or energy distance could indeed be an interesting direction for future work.

评论- Reminder - Discussion Stage Closing Soon - 23 November

2024-11-23

Dear Reviewer Xg2m,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 24 November

2024-11-24

Dear Reviewer Xg2m,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 25 November

2024-11-25

Dear Reviewer Xg2m,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 26 November

2024-11-26

Dear Reviewer Xg2m,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 27 November

2024-11-27

Dear Reviewer Xg2m,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 28 November

2024-11-28

Dear Reviewer Xg2m,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 30 November

2024-11-30

Dear Reviewer Xg2m,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 1 December

2024-12-01

Dear Reviewer Xg2m,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Since the current recommendation is borderline instead of "a good paper", we are not sure if there is anything we can do to make our paper even better? We can further strengthen our paper based on your new comments until our paper is good to be accepted. ^^

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 2 December

2024-12-02

Dear Reviewer Xg2m,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 3 December

2024-12-03

Dear Reviewer Xg2m,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

审稿意见

评分: 5置信度: 52024-11-02

Statistical adversarial data detection (SADD) methods generally detect whether an upcoming batch contains adversarial examples (AEs) by measuring the distributional discrepancies between clean examples (CEs) and AEs, and would discard such detected batches containing AEs. In this paper, the former action of the SADD methods is viewed as a potential strength, whereas the latter action of SADD is viewed as a potential limitation since it leads to the loss of clean CEs within those detected batches containing AEs and also potential under-utilization of AEs. Motivated to remove the limitation, the paper proposes a two-pronged adversarial defense method, named Distributional-Discrepancy-based Adversarial Defense (DDAD), which consists of two phases: a training phase and an inference phase. In the training phase, DDAD first establishes a suitable measure called MMD-OPT to compute the maximum mean discrepancy (MMD) between a clean batch and an AE batch, and then trains a denoiser by minimizing the MMD-OPT between CEs and AEs and the cross entropy loss of an underlying DNN classifier over the denoised AE batch. In the inference phase, DDAD first detects whether an incoming batch is an AE batch based on the MMD-OPT between the incoming batch and a reference clean batch, and then applies a two-pronged process: (1) directly feeding the incoming batch into the classifier if it is detected as a clean batch, and (2) adding Gaussian noise into the incoming batch and feeding the noise inserted incoming batch into the trained denoiser followed by the DNN classifier if the incoming batch is detected as an AE batch. Experiment results on CIFAR-10 and ImageNet-1K in terms of both clean and robust accuracy are reported and compared with some benchmark methods in the literature.

Along the way, a theorem is established to upper bound the error probability of a classifier under the distribution of AEs by the sum of the error probability of the classifier under the distribution of CEs and the variation distance between the distribution of CEs and the distribution of AEs. No independent performance results are reported for the MMD-OPT based detector and the trained denoiser.

优点

If the problem is formulated in a right manner, the idea of using a detector plus a denoiser (or adversarial perturbation diffuser) is interesting and worth to be further investigated.
Developing Theorem 1 and using it to provide some theoretic justification for the denoiser training is commended, although it is simple and there is a discrepancy between the variance distance and MMD.
The first three sections are generally well written.

缺点

The major issue in this paper and the line of work following SADD lies in the problem formulation itself. The word ``statistical'' in SADD implies that one has to deal a batch of samples with each sample serving as an input to an underlying DNN. The discrepancy between the batch requirement and the normal operation of the underlying DNN accepting a single sample as an input and making a prediction in response to the input makes the problem formulation impractical. The discussions in Section 7 can hardly justify the use of batches in the inference stage. The examples provided therein are not convincing and actually not applicable to the current problem. Depending on the underlying DNN in use, in those applications, the set of images fed into the DNN should be regarded as one input to help the DNN make a prediction.
Even if we accept batch inference as a hypothetic working mode of DNNs, this will change the attack surface, and the robust problem will likely take a different form. However, in the current set up, an AE is defined traditionally to be a perturbed input which causes the underlying DNN to make an erroneous prediction when the DNN is working in a normal inference mode, accepting a single input and making a prediction in response to the input. By shifting the inference working mode to a different form, it does not really solve the original robustness problem.
Theorem 1 can be easily proved for the multi-label classification setting as well. Although it can motivate to minimize the distribution discrepancy between CE data and AE data, there is a gap between Theorem 1 and Figure 1 in terms of distribution distance used, which is the variation distance in Theorem, and MMD in Figure 1. Is there any mathematical or empirical relationship between these two distribution discrepancies?
Figure 1 is not well explained to certain degree. (1) Are the two classifiers therein the same? (2) How are AEs generated? Are they generated specifically for the same classifier? (3) How are test data on the right side attacked? (4) For attached test data, what is the probability for MMD-OPT to classify them as clean?

问题

In the paper, DUNET (Liao et al., 2018) is as the denosing model. What is the impact of the denoising DNN on the performance of the trained denoiser? If a different DNN architecture is used in the denoiser training, how would it interact with MMD-OPT? Will the performance dramatically different?
With reference to Figure 1, are results in Table 1 produced under the conditions that (1) the classifier used in the training phase (left of Figure 1) is the same as the classifier used in the inference phase (right of Figure 1), and (2) adversarial samples used in the training phase (left of Figure 1) are generated specifically for the same classifier?
Results in Table 3 are now clear. What do you mean by "our method trained on WideResNet-28-10 against unseen transfer attacks on CIFAR-10. Notably, attackers cannot access the parameters of WideResNet-28-10, and thus it is in a gray-box setting."? Please see my questions on Figure 1 above and explain your experiment results with reference to components in Figure 1.
In Equation (5), where is m? Do you assume that m=n in this case?

评论- Response to Reviewer 5nfo (Weakness 1 - part 2)

2024-11-21

Section 3: practicality of batch-wise evaluation in the inference stage

Now, we move to the question if the batch-wise evaluation is practical in the inference stage. In our humble opinion, the practicality of a method should be evaluated in the context of specific scenarios and application requirements, which means there is no absolute 'practical' or 'impractical' method. The key message we want to deliver here is: batch-wise evaluation is not impractical, but it will have some costs:

Proposed solution: for user inference, single samples provided by the user can be dynamically stored in a queue. Once the queue accumulates enough samples to form a batch, our method can then process the batch collectively using the proposed approach.
Costs for this solution: a direct cost for this solution is the waiting time, as the system must accumulate enough samples (e.g., 50 samples) to form a batch before processing. However, in scenarios where data arrives quickly (e.g., Google's terminal), the waiting time is typically very short (e.g., less than 2 seconds), making this approach feasible for many real-time applications. For applications with stricter latency requirements, the batch size can be dynamically adjusted based on the incoming data rate to minimize waiting time. For instance, if the system detects a lower data arrival rate, it can process smaller batches to ensure timely responses.
Comparison with current SOTA AP methods: diffusion-based AP methods can support single samples as input, but the inference speed of diffusion-based AP is relatively slow (e.g., processing one image can take several seconds) [2] [3]. For example, DiffPure [2] takes 4 seconds to purify 1 single CIFAR-10 image using one A100 GPU. Assuming there are 1000 images, DiffPure would take 4000 seconds to complete the inference. However, our method only takes around 0.003 seconds to process 1 single CIFAR-10 image on average. Therefore, if the waiting time to form a batch is less than 3997 seconds, our method is more time-efficient than DiffPure. Thus, diffusion-based AP methods can hardly be applied to a system where data arrives quickly. Instead, our method can handle it, demonstrating that batch-wise evaluation is not impractical.

Overall, it is a trade-off problem: using our method for user inference can obtain high robustness, but the cost is to wait for batch processing. Based on the performance improvements our method obtains over the baseline methods and the fact that current SOTA AP methods are generally slow at inference, we believe the cost is feasible and acceptable.

[2] Diffusion Models for Adversarial Purification, ICML 2022.

[3] Robust Evaluation of Diffusion-Based Adversarial Purification, ICCV 2023.

Section 4: batch-wise evaluation is practical in the training stage

On the other hand, our method is not necessarily used for user inference. Instead, our method is suitable for cleaning the data before fine-tuning the underlying model. In many domains, obtaining large quantities of high-quality data is challenging due to factors such as cost, privacy concerns, or the rarity of specific data. As a result, all possible samples with clean information are critical in these data-scarce domains. Then, a practical scenario is that there exists a pre-trained model on a large-scale dataset (e.g., a DNN trained on ImageNet-1K) and clients want to fine-tune the model to perform well on downstream tasks. If the data for downstream tasks contain AEs, our method can be applied to batch-wisely clean the data before fine-tuning the underlying model.

评论- Response to Reviewer 5nfo (Weakness 1 - part 1)

2024-11-21

Thank you so much for your comments! Your thorough review and comments are very important to the improvement of our work! Please find our replies below.

W1. The major issue in this paper and the line of work following SADD lies in the problem formulation itself. The word "statistical'' in SADD implies that one has to deal a batch of samples with each sample serving as an input to an underlying DNN. The discrepancy between the batch requirement and the normal operation of the underlying DNN accepting a single sample as an input and making a prediction in response to the input makes the problem formulation impractical. The discussions in Section 7 can hardly justify the use of batches in the inference stage. The examples provided therein are not convincing and actually not applicable to the current problem. Depending on the underlying DNN in use, in those applications, the set of images fed into the DNN should be regarded as one input to help the DNN make a prediction.

Reply: Thanks for your insightful comment! We recognize that this concern is crucial to the overall understanding and evaluation of our work. To address it comprehensively, we have divided our response into several sections, each tackling specific aspects of the issue you raised.

Section 1: clarification of the workflow of our method

First of all, we would like to address a potential misunderstanding regarding the workflow of our method: In the inference stage, it is true that the detection phase in our approach relies on batch processing to identify statistical discrepancies. However, the subsequent prediction phase is fully compatible with single-sample inference. Specifically, once a batch is detected as AEs or CEs, individual samples can be processed independently by the underlying DNN for predictions.

Section 2: false alarm rates can be controlled by our method

Following Section 1, since each sample can be processed independently, a potential concern is that if a batch only contains a few AEs (e.g., a batch of 50 samples contains 48 CEs and 2 AEs), the entire batch will likely be detected as CEs (i.e., a false alarm case happens). Then the 2 AEs will directly harm the underlying DNN.

However, we would like to highlight that the false alarm rate can be controlled. One of the benefits of using a non-parametric two-sample test is that the false alarm rate can be controlled by users [1]. In this paper, we set the maximum false alarm rate to be 5% (i.e., we allow a maximum of 5% adversarial examples in a batch).

[1] Maximum Mean Discrepancy Test is Aware of Adversarial Attacks. ICML 2021.

评论- Response to Reviewer 5nfo (Question 1)

2024-11-22

Thanks for your questions! We will address your questions one by one:

Q1.1: In the paper, DUNET (Liao et al., 2018) is as the denosing model. What is the impact of the denoising DNN on the performance of the trained denoiser?

Reply: Firstly, denoising DNN provides an architecture that is specifically designed to remove the adversarial noise. To further explore the impact of the denoising DNN on the performance of the trained denoiser, we conduct an ablation study on the robustness of denoising DNN with and without MMD-OPT against white-box PGD attacks. Please kindly check the experiment results below:

Table 1: Ablation study on the denoising DNN with and without MMD-OPT. We report clean and robust accuracy (%) against white-box PGD attacks ( $\ell_\infty, \epsilon=8/255$ ). The classifier used is RN-18.

Method	Clean	Robust
Denoising DNN without MMD-OPT	84.36	18.07
Denoising DNN with MMD-OPT	88.03	75.87

Originally, denoising DNN can hardly defend against white-box attacks (i.e., attack denoiser+classifier), but it provides a foundamental architecture to learn the mapping from AEs to CEs and can maintain good clean accuracy. After the integration of MMD-OPT, both clean accuracy and robust accuracy improve by a notable margin.

Q1.2: If a different DNN architecture is used in the denoiser training, how would it interact with MMD-OPT?

Reply: MMD-OPT itself does not rely on any specific properties of DUNET and can be seamlessly integrated with other denoisers:

The training process of MMD-OPT is completely independent of the denoiser.
In the training phase of the denoiser, MMD-OPT serves as a 'guider' that can help minimize the distributional discrepancies between AEs and CEs. This process does not rely on any specific properties of the denoiser.

Therefore, even if a different DNN architecture is used, it will not affect the integration of MMD-OPT.

Q1.3: Will the performance dramatically different?

Reply: If a different DNN architecture is used as the denoiser, the overall performance would depend on the denoiser's ability to handle the specific noise patterns in the data. As mentioned in Q1.2, MMD-OPT itself does not rely on any specific properties of the denoiser. In general, the integration of MMD-OPT can help the denoiser to deal with the white-box attacks as shown in Q1.1. Consequently, if the alternative denoiser can achieve comparable performance to DUNET in handling noise, we believe the overall performance will not vary significantly after incorporating MMD-OPT during training.

评论- Response to Reviewer 5nfo (Weakness 2)

2024-11-22

W2. Even if we accept batch inference as a hypothetic working mode of DNNs, this will change the attack surface, and the robust problem will likely take a different form. However, in the current set up, an AE is defined traditionally to be a perturbed input which causes the underlying DNN to make an erroneous prediction when the DNN is working in a normal inference mode, accepting a single input and making a prediction in response to the input. By shifting the inference working mode to a different form, it does not really solve the original robustness problem.

Reply: Thanks for your insightful comment and sorry for the confusion! There might be a potential misunderstanding here. Indeed, the robustness problem is the same as previous studies.

As clarified in PART 1 of Q1, the prediction phase of our workflow is fully compatible with single-sample inference. Specifically, after a batch is detected as adversarial examples (AEs) or clean examples (CEs), individual samples can be processed independently by the underlying DNN for predictions. This ensures that the original robustness problem can be preserved.
If we assume that the underlying DNN can only take batch as input, then under the scenario of adversarial attacks, 2 possible cases would happen:
- (1) every sample in the batch shares the same adversarial perturbation (i.e., AEs are optimized by increasing the averaged risk).
- (2) every sample in the batch has a different adversarial perturbation (i.e., AEs are optimized by increasing the risk of each sample).
For case (2), the robustness problem is exactly the same as what you mentioned in the comment. We want to highlight that our method places no restriction on how the attacker optimizes AEs, meaning it can handle both cases effectively. If the attacker chooses to generate AEs in the traditional single-sample setting, the robustness problem does not change.

Overall, the key message is that our workflow does not change the attack surface because every sample in the batch can have a different adversarial perturbation. In this case, AEs are optimized by increasing the risk of each sample.

评论- Response to Reviewer 5nfo (Weakness 3)

2024-11-22

W3. Theorem 1 can be easily proved for the multi-label classification setting as well. Although it can motivate to minimize the distribution discrepancy between CE data and AE data, there is a gap between Theorem 1 and Figure 1 in terms of distribution distance used, which is the variation distance in Theorem, and MMD in Figure 1. Is there any mathematical or empirical relationship between these two distribution discrepancies?

Reply: Thanks for your insightful question! Both $L^1$ divergence (i.e., variation distance) and MMD are integral probability metrics (IPMs), which are distances on the space of distributions over a set $\mathcal{X}$ , defined by a class $\mathcal{F}$ of real-valued functions on $\mathcal{X}$ as:

$D _{\mathcal{F}}(P, Q) = \sup _{f \in \mathcal{F}}|\mathbb{E} _{X \sim P}f(X) - \mathbb{E} _{Y \sim Q}f(Y)|$

The difference is that $L^1$ divergence and MMD have different function space: the $L^1$ divergence operates over a function space defined by all indicator functions, representing measurable subsets of the domain:

$L^1(P, Q) = 2 \sup_{B \in \mathcal{B}}|\mathbb{E} _{X \sim P}f(X) - \mathbb{E} _{Y \sim Q}f(Y)|,$ where $\mathcal{B}$ is the set of measurable subsets under $P$ and $Q$ .

In contrast, the MMD is computed within a function space defined by the unit ball of a reproducing kernel Hilbert space (RKHS), which is spanned by the kernel function used:

$\text{MMD}(P, Q) = \sup _{f \in \mathcal{H}, \|f\| _\mathcal{H} \leq 1} |\mathbb{E} _{X \sim P}f(X) - \mathbb{E} _{Y \sim Q}f(Y)|,$

where $\mathcal{H}$ is the reproducing kernel Hilbert space (RKHS) associated with a kernel function $\kappa(x, y)$ , and $\|f\|_\mathcal{H}$ is the RKHS norm.

Mathematically, why do we use $L^1$ divergence in theoretical justification?

$L^1$ divergence, also known as total variation, is one of the most distinguishable metrics for measuring discrepancies between two probability distributions $P$ and $Q$ . If $P$ and $Q$ differ on any subset $A$ with a non-zero measure, the total variation distance will be strictly greater than zero, indicating that $P$ and $Q$ are unequal. This sensitivity makes total variation a very powerful tool for detecting even subtle differences between distributions. Therefore, researchers often use $L^1$ divergence to explore the relationship between distributional discrepancies and terms of interest because $L^1$ divergence is theoretically more precise and sensitive than many other metrics.

Empirically, why do we use MMD estimator in practice?

The problem of using $L^1$ divergence in practice is that it does not any unbiased estimators. This is because the supremum can hardly be approximated by finite samples. Hence, in practice, it is challenging to estimate $L^1$ divergence accurately, especially in high-dimensional settings, where the bias and variance of the estimation can become significant. On the other hand, MMD has unbiased estimators. This is because through kernel tricks, the supremum can be effectively removed (e.g., see Eq.2 in our manuscript). In practice, the kernel trick allows MMD to be computed efficiently, and it is empirically sensitive to differences between distributions, making it well-suited for practical tasks such as detecting distribution shifts between CE and AE.

In general, we would like to highlight that the purpose of using $L^1$ divergence in the theory is to motivate researchers in this field to view adversarial classification problem through the lens of distributional discrepancies. While in practice, measuring distributional discrepancies can be achieved by other metrics with unbiased estimators and MMD showcases how this perspective can be implemented effectively in practice.

评论- Response to Reviewer 5nfo (Weakness 4 and Question 2)

2024-11-22

Thanks for your questions and sorry for the confusion! We will address your questions one by one:

W4.1: Figure 1 is not well explained to a certain degree. Are the two classifiers therein the same?

Reply: Yes, the classifier used for training the denoiser and inference is the same, which is a pre-trained classifier on clean data.

W4.2: How are AEs generated? Are they generated specifically for the same classifier?

Reply: Since the third question is asking for the AE generation for the test data, we assume this question is asking for the AE generation for the training stage.

As we mentioned in lines 400 - 405 in our manuscript, to avoid the evaluation bias caused by seeing similar attacks beforehand during training, we train both MMD-OPT and the denoiser using $\ell_\infty$ -norm MMA attack [1], which differs significantly from PGD+EOT and AutoAttack. And yes, they are generated for the same classifier.
Also, we would like to highlight that our method does not require a specific type of attack to generate AEs. For example, our method can still perform very well on unseen attacks when the denoiser and MMD-OPT are trained with PGD-10 attack.

[1] Fast and reliable evaluation of adversarial robustness with minimum-margin attack, ICML 2022.

W4.3: How are test data on the right side attacked?

Reply: We do not assume a specific type of attack for the test data. For example, the test data can be attacked by PGD, AutoAttack or C&W, etc. In our manuscript, to avoid the evaluation bias caused by seeing similar attacks during the training, we use unseen attacks (e.g., PGD+EOT, BPDA+EOT, C&W) to evaluate the robustness of our method.

W4.4: For attached test data, what is the probability for MMD-OPT to classify them as clean?

Reply: It depends on the batch size and the pre-determined false alarm rate (as explained earlier in W1).

In our experiments, when the batch size exceeds 100 and the proportion of AEs is less than the pre-determined false alarm rate (e.g., 5% in our paper), the probability for MMD-OPT to classify them as clean is 100%.
When the proportion of AEs exceeds the pre-determined false alarm rate, the probability for MMD-OPT to classify them as clean is 0% (i.e., it will classify them as adversarial).

评论- Response to Reviewer 5nfo (Question 3 & 4)

2024-11-22

Q3. Results in Table 3 are unclear. What do you mean by "our method trained on WideResNet-28-10 against unseen transfer attacks on CIFAR-10. Notably, attackers cannot access the parameters of WideResNet-28-10, and thus it is in a gray-box setting."? Please see my questions on Figure 1 above and explain your experiment results with reference to components in Figure 1.

Reply: Thanks for your question and sorry for the confusion! As we answered in W4.3, we do not assume a specific type of attack for the test data. Therefore, the test data can be attacked by any type of attacks (e.g., the transfer attacks used in Table 3). Based on this, let's clarify this sentence in more detail:

Motivation: since DDAD requires AEs to train the MMD-OPT and the denoiser, it is important for us to evaluate the transferability of our method (i.e., how our method can generalize to attacks generated for different threat models).
The question we aim to address in Table 3 is: 'Since our method has seen attacks generated for a specific classifier (e.g., WideResNet-28-10) during training, will our method perform poorly on attacks generated for an unseen classifier (e.g., RN-18, Swin-T, etc)?'
Therefore, for this kind of attack, the key is to use a different threat model, which means the attackers cannot access the parameters of the WideResNet-28-10 (i.e., the attacks are not generated for the classifier used for training and testing).
Experiment results in Table 3 show that our method can generalize well to these unseen transfer attacks.

Q4. In Equation (5), where is m? Do you assume that m=n in this case?

Reply: Thanks for your question and sorry for the confusion! Yes, we assume m=n in this case. This is because the u-statistics is used here and usually we will assume m=n to obtain the asymptotic distribution of the MMD estimator. We will clarify this in the updated version of our manuscript!

评论- Response to Reviewer 5nfo

2024-11-22

2024-11-29

Thank you for providing detailed responses. Suppose that there is a queue in place to collect incoming samples (e.g., images) to form a batch for the subsequent batch inference. Suppose that each incoming sample is attacked with a probability $p$ . Can you report your "robustness" results for different values of $p$ and different sizes of the inference batch and compare them with the following values

(1-p) \times CA + p \times RA

where CA and RA are the clean accuracy and robust accuracy of a normal adversarial training (AT) method such as Vanilla AT, TRADES, MART, and TRADES-AWP? Each incoming sample, if attacked, will be attacked with respect to the underlying classifier in the same manner as in the evaluation stage of Vanilla AT, TRADES, MART, and TRADES-AWP.

Vanilla AT: Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.

MART: Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, and Quanquan Gu. Improving adversarial robustness requires revisiting misclassified examples. In International Conference on Learning Representations, 2019.

AWP: Dongxian Wu, Shu-Tao Xia, and Yisen Wang. Adversarial weight perturbation helps robust generalization. Advances in Neural Information Processing Systems, 33:2958–2969, 2020.

TRADES: Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In International Conference on Machine Learning, pp. 7472–7482. PMLR, 2019.

评论- Follow-up Response to Reviewer 5nfo (part 1)

2024-12-02

Reply: Thanks for your insightful comments and sorry for the late reply! Although reviewers are instructed not to ask for significant experiments at this stage, we believe it is very interesting and has the potential to further increase the value of our work if our method can perform well against AEs that are generated by attacking AT-based methods.

Before moving to the experiment results, we would like to highlight that the computational resources available to us have been extremely limited recently due to the high demand on the shared servers. This has resulted in significantly longer waiting times for experiments to run. Furthermore, AT requires more computational resources than other defense methods, and thus we only conduct the experiment on CIFAR-10 using ResNet-18 as the target classifier.

Here are the main configurations we use to reproduce the results for AT-based methods: For fair comparison, all AT-based methods you mentioned (i.e., vanilla AT [1], TRADES [2], MART [3] and TRADES-AWP [4]) are trained for $200$ epochs using SGD with momentum $0.9$ , weight decay $5 \times 10^{-4}$ , and an initial learning rate of $0.1$ that is divided by $10$ at the $100$ -th and $150$ -th epoch. The training attack is PGD- $10$ with step size $2/255$ . For TRADES, we set $\beta = 6.0$ . For MART, we set $\beta = 5.0$ . For TRADES-AWP, we set $\gamma = 5 \times 10^{-3}$ and $\beta=6.0$ . Table 1 below demonstrates the reproduced results we obtained for these AT-based methods.

Table 1: Clean and robust accuracy (%) of AT-based methods against PGD+EOT attack on CIFAR-10. The target classifier is ResNet-18 and the batch size is 100.

Method	Clean	Robust
Vanilla AT [1]	74.97	44.09
TRADES [2]	72.88	46.91
MART [3]	71.02	47.20
TRADES-AWP [4]	71.82	48.81

Then, as you required, for each incoming sample, if attacked, will be attacked with respect to the well-trained AT-based classifiers. For fair comparisons, we integrate the well-trained AT-based classifiers into our method: if the incoming sample is detected as a clean sample, it will be directly fed into the AT-based classifier; otherwise, it will be denoised by our well-trained denoiser and then fed into the AT-based classifier. The probability of attacking each incoming sample ranges from 10% to 100% and the batch size ranges from 25 to 150. The mixed accuracy we use for evaluating AT-based methods is calculated by $(1-p) \times CA + p \times RA$ .

We demonstrate the experimental results in Table 2. We find that if our method is integrated with AT-based methods, the performance can be further boosted. We believe this can further increase the value of our work.

评论- Follow-up Response to Reviewer 5nfo (part 2)

2024-12-02

Table 2: Mixed accuracy (%) for defense methods with different probabilities of attacking each incoming samples and different batch sizes.

Probability of attacking each incoming sample (%)	10	20	30	40	50	60	70	80	90	100
Batch Size: 25
Vanilla AT	71.89	68.8	65.72	62.63	59.55	56.46	53.38	50.29	47.21	44.12
Ours+Vanilla AT	72.81	69.08	66.32	62.67	60.69	56.98	54.60	50.82	48.87	45.75
TRADES	70.29	67.70	65.11	62.52	59.92	57.33	54.74	52.15	49.56	46.97
Ours+TRADES	70.99	67.73	65.67	63.46	60.88	57.87	55.80	52.92	50.98	48.10
MART	68.63	66.25	63.86	61.47	59.08	56.70	54.31	51.92	49.54	47.15
Ours+MART	69.34	66.67	64.34	61.78	60.05	57.27	55.16	52.48	51.10	48.46
TRADES-AWP	69.52	67.22	64.92	62.62	60.32	58.02	55.72	53.42	51.12	48.82
Ours+TRADES-AWP	70.22	66.98	65.56	62.46	61.12	58.18	57.03	54.65	53.40	49.99
Batch Size: 50
Vanilla AT	71.89	68.81	65.73	62.65	59.57	56.49	53.41	50.33	47.25	44.17
Ours+Vanilla AT	72.84	69.57	66.55	62.93	60.74	57.12	53.88	51.21	48.82	46.91
TRADES	70.28	67.69	65.09	62.50	59.90	57.30	54.71	52.11	49.52	46.92
Ours+TRADES	70.17	67.80	65.38	62.64	59.99	57.47	55.45	52.90	51.10	49.27
MART	68.63	66.24	63.86	61.47	59.08	56.69	54.30	51.92	49.53	47.14
Ours+MART	68.75	66.34	64.54	62.60	59.41	58.07	56.78	55.14	53.51	51.35
TRADES-AWP	69.52	67.23	64.93	62.64	60.34	58.04	55.75	53.45	51.16	48.86
Ours+TRADES-AWP	69.44	67.10	65.18	62.80	60.49	57.98	56.18	54.25	52.34	51.10
Batch Size: 100
Vanilla AT	71.88	68.79	65.71	62.62	59.53	56.44	53.35	50.27	47.18	44.09
Ours+Vanilla AT	72.23	69.14	65.82	63.74	59.87	58.01	56.08	53.36	51.50	49.20
TRADES	70.28	67.69	65.09	62.49	59.89	57.30	54.70	52.10	49.51	46.91
Ours+TRADES	70.36	67.79	65.22	63.58	60.62	58.31	57.04	55.51	53.38	51.58
MART	68.64	66.26	63.87	61.49	59.11	56.73	54.35	51.96	49.58	47.20
Ours+MART	68.75	66.35	64.50	62.61	59.50	58.02	56.77	54.97	53.51	51.40
TRADES-AWP	69.52	67.22	64.92	62.62	60.31	58.01	55.71	53.41	51.11	48.81
Ours+TRADES-AWP	69.64	67.49	64.97	63.61	61.10	59.30	57.87	56.40	54.24	52.06
Batch Size: 150
Vanilla AT	71.89	68.80	65.71	62.62	59.54	56.45	53.36	50.27	47.18	44.09
Ours+Vanilla AT	72.06	69.17	65.96	62.86	59.68	58.92	55.09	52.39	49.90	47.97
TRADES	70.27	67.67	65.08	62.49	59.89	57.30	54.71	52.12	49.52	46.93
Ours+TRADES	70.59	67.87	65.43	63.35	61.58	60.05	58.23	56.13	53.68	51.68
MART	68.64	66.26	63.87	61.48	59.09	56.71	54.32	51.93	49.55	47.16
Ours+MART	68.56	66.48	64.07	62.06	60.99	59.37	57.47	55.69	53.55	51.68
TRADES-AWP	69.53	67.23	64.93	62.63	60.34	58.04	55.74	53.44	51.14	48.84
Ours+TRADES-AWP	69.70	67.43	64.98	63.53	62.26	60.05	57.99	56.14	54.25	52.18

[1] Vanilla AT: Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.

[2] TRADES: Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In International Conference on Machine Learning, pp. 7472–7482. PMLR, 2019.

[3] MART: Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, and Quanquan Gu. Improving adversarial robustness requires revisiting misclassified examples. In International Conference on Learning Representations, 2019.

[4] AWP: Dongxian Wu, Shu-Tao Xia, and Yisen Wang. Adversarial weight perturbation helps robust generalization. Advances in Neural Information Processing Systems, 33:2958–2969, 2020.

2024-12-02

In the end, we want to thank you for providing this valuable feedback to us, which is always important to hear opinions from other experts in this field. If you feel there are still unclear points regarding our paper, please discuss with us in the author-reviewer discussion phase. If you feel that your concerns are well-addressed, we do hope for an updated rating and new comments (if possible). Thanks again!

评论- Reminder - Discussion Stage Closing Soon - 3 December

2024-12-03

Dear Reviewer 5nfo,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your follow-up queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your follow-up concerns?

If there is anything unclear, we will address it further! We look forward to your feedback!

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Deadline Approaching Very Soon

2024-12-03

Dear Reviewer 5nfo,

Thank you for your thoughtful feedback. Currently, among the five reviewers, your recommendation is the only one which is negative. We’ve valued and answered all your concerns in detail, and we are eager to answer any further questions.

If our revisions meet your expectations, we’d greatly appreciate an updated score or any additional suggestions before the discussion deadline.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 23 November

2024-11-23

Dear Reviewer 5nfo,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 24 November

2024-11-24

Dear Reviewer 5nfo,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 25 November

2024-11-25

Dear Reviewer 5nfo,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 26 November

2024-11-26

Dear Reviewer 5nfo,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 27 November

2024-11-27

Dear Reviewer 5nfo,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 28 November

2024-11-28

Dear Reviewer 5nfo,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Many thanks for your updating your score!

2024-12-04

Dear Reviewer 5nfo,

Many thanks for updating the score! We do hope that your concerns have been well-addressed now.

Best regards,

Authors of Submission 13588

审稿意见

评分: 8置信度: 22024-11-04

Current Statistical adversarial data detection-based methods discard inputs that are detected as AEs, leading to the loss of clean information within those inputs. This paper, first theoretically establishes a relationship between adversarial risk and distributional discrepancy. Motivated by this, the paper proposes a two-stage defense method where during training they train a denoiser model to minimize the MMD-OPT between clean and adversarial examples. During inference, they first distinguish between the CE and AE, the clear examples are passed straight to the classifier, whereas the adversarial samples are passed through the denoiser first.

优点

The paper is well-written and has clear details.
The authors conduct extensive experiments and provide theoretical motivations.
The proposed method, while relating to work in Certified Robustness, is novel to my understanding, and leads to significantly better results.

缺点

During inference, the authors assume access to a batch of clean validation data to serve as a reference for measuring distributional discrepancies. How are the constituents of this batch selected? For example, does it represent samples from each class uniformly, or does it follow the empirical distribution? Additionally, what would happen to performance if an attacker, for instance, supplied only samples from a single class along with one or two adversarial samples? Since detection is performed at the batch level, would the entire batch be classified as “clean samples”?

问题

Please see the weaknesses.
Potentially missing related work?
- Nayak et al. [1] also investigated using an MMD-based loss to train a denoiser for adversarial defense.

[1] Nayak, G. K., Rawal, R., & Chakraborty, A. (2023). DE-CROP: Data-efficient certified robustness for pretrained classifiers. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 4622-4631).

评论- Response to Reviewer aeSE

2024-11-20

Q1. During inference, the authors assume access to a batch of clean validation data to serve as a reference for measuring distributional discrepancies. How are the constituents of this batch selected? For example, does it represent samples from each class uniformly, or does it follow the empirical distribution?

Reply: Thanks for your question! In our implementation, we uniformly sample a fixed number of samples per class to construct the batch. For example, for CIFAR-10, we randomly pick 50 samples per class to construct the validation data. For ImageNet-1K, we randomly pick 100 samples per class to construct the validation data.

Q2. Additionally, what would happen to performance if an attacker, for instance, supplied only samples from a single class along with one or two adversarial samples? Since detection is performed at the batch level, would the entire batch be classified as “clean samples”?

Reply: Thanks for your question!

Yes, your understanding is correct: in our current implementation, if a batch only contains one or two adversarial examples, the entire batch will be classified as a 'clean batch'.
However, we would like to highlight that this can be controlled. One of the benefits of using a non-parametric two-sample test is that the false alarm rate can be controlled by users [1]. In this paper, we set the maximum false alarm rate to be 5% (i.e., we allow a maximum of 5% adversarial examples in a batch).
Also, we conducted an ablation study on mixed data batches (the proportion of AEs in a batch ranges from 10% to 100%) in our original manuscript (lines 462 - 471) and we found that our method can still outperform all baseline methods across all mixed proportions. Therefore, even if a batch contains several adversarial examples, our method can still perform well. Please kindly check Appendix D.2 in our original manuscript for more details.

[1] Maximum Mean Discrepancy Test is Aware of Adversarial Attacks. ICML 2021.

Q3. - Potentially missing related work? Nayak et al. [1] also investigated using an MMD-based loss to train a denoiser for adversarial defense. [1] Nayak, G. K., Rawal, R., & Chakraborty, A. (2023). DE-CROP: Data-efficient certified robustness for pretrained classifiers. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 4622-4631).

Reply: Thank you for bringing this interesting paper to our attention! We summarize some key differences between DE-CROP and DDAD:

The overall idea / motivation is significantly different.
Technically, DDAD contains a two-pronged process, where MMD-OPT serves not only as a 'guider' during training to help minimize the distributional discrepancy between AEs and CEs (the same role as the MMD in DE-CROP), but also a 'detector' that helps distinguish AEs and CEs. Therefore, the pipeline is very different between their work and ours.
Also, DE-CROP uses vanilla MMD [1], while DDAD uses semantic-aware MMD [2], which is more sensitive to adversarial attacks.

For more detailed discussions, we will add a short section in the updated version of our manuscript!

[1] A kernel two-sample test. The Journal of Machine Learning Research, 2012.

[2] Maximum Mean Discrepancy Test is Aware of Adversarial Attacks. ICML 2021.

评论- Response to Reviewer aeSE

2024-11-20

评论- Reminder - Discussion Stage Closing Soon - 23 November

2024-11-23

Dear Reviewer aeSE,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

2024-11-24

Thank you for your response. I have a few more questions upon re-reading the paper and reading the review from Reviewer 5nfo.

the false alarm rate can be controlled by users [1]. In this paper, we set the maximum false alarm rate to be 5%

I am sorry if I missed this, but could you point me to or provide accuracy results ablation with differing proportions of AE samples in a batch for different false alarm rate values? I would like to see empirically how lowering the false alarm rate holds up against a minor proportion of adversarial images in the batch, and if an attacker would still be able to cause harm.

we uniformly sample a fixed number of samples per class to construct the batch

I think that the method is over-reliant on the batches of samples selected for reference and evaluation. For instance, do you end up falsely detecting a lot of samples even if there's a small shift in the distribution (e.g., using CIFAR-10.2 [1], CIFAR-C, or CIFAR-P [2] for evaluation)? Could you provide me with results for how many samples are rejected when inputted with samples from these datasets with slightly OOD distributions, with the reference batch from the training distribution?

[1] https://www.gatsby.ucl.ac.uk/~balaji/udl2020/accepted-papers/UDL2020-paper-101.pdf [2] Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

评论- Thanks for your further comments!

2024-11-24

Dear Reviewer aeSE,

Thank you for your timely response and for raising these insightful concerns!

We appreciate your valuable feedback and will conduct the experiments to address these points promptly.

We will update you with our findings as soon as possible.

Best regards,

Authors of Submission 13588

评论- Follow-up Responses to Reviewer aeSE (part 2)

2024-11-25

Q2. I think that the method is over-reliant on the batches of samples selected for reference and evaluation. For instance, do you end up falsely detecting a lot of samples even if there's a small shift in the distribution (e.g., using CIFAR-10.2 [1], CIFAR-C, or CIFAR-P [2] for evaluation)? Could you provide me with results for how many samples are rejected when inputted with samples from these datasets with slightly OOD distributions, with the reference batch from the training distribution? [1] https://www.gatsby.ucl.ac.uk/~balaji/udl2020/accepted-papers/UDL2020-paper-101.pdf [2] Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Reply: Thanks for your interesting question! In our humble opinion, the datasets you provided can hardly be considered as having a small shift in the distribution, because the performance of a classifier trained on CIFAR-10 often performs poorly on these datasets. However, we feel it is quite interesting to see how our method performs.

Before moving to the experiment results, we would like to highlight that the computational resources available to us have been extremely limited recently due to the high demand on the shared servers. This has resulted in significantly longer waiting times for experiments to run. Given these constraints and the limited rebuttal period, we choose CIFAR-10-C as an OOD distribution of CIFAR-10 and conduct experiments on several typical corruption types, with the reference batch from the training distribution.

Unsurprisingly, all the samples in the CIFAR-10-C we test are rejected. However, this phenomenon can be explained and justified. The nature of MMD is to see whether two distributions are the same. When we train a kernel function on CEs and AEs, it learns to distinguish the difference between the distribution of CEs and the distribution of AEs. However, if the kernel function has strong generalization capabilities, it becomes highly sensitive to any distributional changes. While this is beneficial for distinguishing AEs, it can also be beneficial for distinguishing OOD datasets like CIFAR-10-C, where the distributional shift is fundamentally different but still significant. In this case, if the user believes that OOD datasets should be treated as clean data, then there is a fundamental flaw in the MMD-based detector since it is too sensitive to the distribution shift.

Nevertheless, it might not be a flaw in our proposed method. This is because even if those samples are rejected, they will not be discarded by using our method (i.e., they will still be sent to the denoiser before feeding into the classifier). If sending to the denoiser will not lead to a decrease in the model performance on OOD datasets, or even improve the performance, then it is totally acceptable to reject those OOD data. This motivates us to further conduct experiments to compare how the combination of Denoiser + Classifier (i.e., if reject the null hypothesis, treat OOD data as adversarial) performs with Classifier Only (i.e., do not reject null hypothesis, treat OOD data as clean data). Surprisingly, we find that Denoiser + Classifier can even outperform Classifier Only by a notable margin on CIFAR-10-C. For example, the accuracy of 'Glass Blur' has increased from 44.29% to 48.03%, and the accuracy of 'Impulse Noise' has increased from 41.19% to 58.09%. This shows that our denoiser may generalize to OOD datasets such as CIFAR-10-C, which we believe can add more value to our work.

评论- Follow-up Responses to Reviewer aeSE (part 1)

2024-11-25

Q1. I am sorry if I missed this, but could you point me to or provide accuracy results ablation with differing proportions of AE samples in a batch for different false alarm rate values? I would like to see empirically how lowering the false alarm rate holds up against a minor proportion of adversarial images in the batch, and if an attacker would still be able to cause harm.

Reply: Thanks your question! To address your concern, we tested an extreme case (i.e., a very small proportion of AEs with a very low false alarm rate value). Please kindly check the experimental results below:

Table 1: Ablation study with different proportions of AE samples in a batch for different false alarm rate values on CIFAR-10. The classifier used in WRN-28-10. We report the the mixed accuracy (i.e., the accuracy on a mixture of AEs and CEs) and we use italic and bold to represent that the batch is considered as 'adversarial' (i.e., the entire batch will be processed by the denoiser).

FAR \ Proportions of AEs in a batch	0.01	0.02	0.03	0.04	0.05	0.06	0.07
0.05	93.20	92.29	91.34	90.25	89.13	87.42	87.35
0.04	93.20	92.29	91.34	90.25	87.57	87.42	87.35
0.03	93.20	92.29	91.34	87.77	87.57	87.42	87.35
0.02	93.20	92.29	87.94	87.77	87.57	87.42	87.35
0.01	93.20	92.29	87.94	87.77	87.57	87.42	87.35

Note that when a batch is processed by the denoiser, the clean accuracy will drop a bit because the denoiser is mainly trained to remove the adversarial perturbations of AEs. That's why the overall performance will decrease (e.g., from 92.29% to 87.94%) when there are only a few AEs in a batch and the entire batch is considered as 'adversarial'. Therefore, when there are only a few AEs in a batch (e.g., less than 5%), it is acceptable to feed the batch into the classifier as it can hardly affect the overall performance. However, if the proportion of AEs increases, then undoubtedly they should be handled by the denoiser.

审稿意见

评分: 6置信度: 32024-11-04

The method proposed in this paper achieves significant improvements in both clean and robust accuracy compared to existing SOTA defense methods. It introduces a novel framework that enhances model robustness without the need to incorporate a large number of additional samples, which is particularly interesting.

优点

The DDAD proposed in this article effectively avoids information loss. Without relying on a large amount of additional data, DDAD has significantly improved its robustness on datasets such as CIFAR-10 and ImageNet-1K. DDAD also has good scalability and can cope with unseen transfer attacks. Therefore, DDAD is able to defend against adversarial samples generated based on different models in a gray box environment, indicating that its defense capability can be generalized across models.

缺点

The paper has some limitations. First, it does not provide open-source code or detailed instructions to replicate the results, making it difficult for others to validate the findings. Second, the assumptions in Equation 4 are not fully extended to components like Batch Normalization layers, pooling layers, or transformer structures. Rather than simply showing added structures as in Figure 2, a more comprehensive explanation is needed to clarify how Equations 4,5, and 6 apply to these components.

问题

Could you provide more details or release code to help replicate the reported results?
How do the assumptions in Equation 4 extend to complex components like Batch Normalization (BN) layers, pooling layers, and transformer structures?
Given that DDAD performs well on CIFAR-10 and ImageNet-1K, how does it scale with even larger datasets or different image domains?

评论- Response to Reviewer AY2R

2024-11-20

Thank you so much for your positive comments! Your thorough review and comments are very important to the improvement of our work! Please find our replies below.

Q1. First, it does not provide open-source code or detailed instructions to replicate the results, making it difficult for others to validate the findings. Could you provide more details or release code to help replicate the reported results?

Reply: Thank you for your suggestion on providing open-source code! We are always willing to release the code to help replicate the reported results! In fact, in our manuscript, we do provide our source code, which is mentioned in the abstract at line 028: "The code is available at: https://anonymous.4open.science/r/DDAD-DB60."

Q2. Second, the assumptions in Equation 4 are not fully extended to components like Batch Normalization layers, pooling layers, or transformer structures. Rather than simply showing added structures as in Figure 2, a more comprehensive explanation is needed to clarify how Equations 4,5, and 6 apply to these components. How do the assumptions in Equation 4 extend to complex components like Batch Normalization (BN) layers, pooling layers, and transformer structures?

Reply: Thanks for your question! However, in our humble opinion, there might be some misunderstanding here (e.g., this question might be prepared for other assigned papers):

Firstly, there is no assumption in Equation 4.
Secondly, Figure 2 in our manuscript is not about showing added structures.
Lastly, Equations 4, 5, and 6 are not relevant to BN layers, pooling layers and transformer structures.

Therefore, we hope you can further clarify the question and we are always willing to address your concerns!

Q3. Given that DDAD performs well on CIFAR-10 and ImageNet-1K, how does it scale with even larger datasets or different image domains?

Reply: Thanks for your question!

In our humble opinion, ImageNet-1K is often used as a very large-scale dataset, especially for this field which often requires many computational resources.
On the other hand, we agree with you that it is worth testing our method on a different image domain.
Therefore, following [1], we test our method on Street View House Numbers (SVHN), which is completely different from CIFAR-10 and ImageNet-1K. We aim to demonstrate that our method can work well on various image domains. Please kindly check the experiment results below and we will include them in the updated version of our manuscript:

Table 1: Clean and robust accuracy (%) against adaptive white-box attacks $\ell_\infty (\epsilon = 8/255)$ on SVHN. We show the most successful defense in bold.

Method	Classifier	Clean	Robust
[1]	WRN-28-10	95.55	63.05
[2]	ResNet-18	93.08	52.83
[3]	WRN-28-10	92.87	56.83
[4]	WRN-28-10	94.15	60.90
Ours	WRN-28-10	96.57	69.45

[1] Robust Evaluation of Diffusion-Based Adversarial Purification, ICCV 2023.

[2] Reducing excessive margin to achieve a better accuracy vs. robustness trade-off. ICLR 2022.

[3] Uncovering the limits of adversarial training against norm-bounded adversarial examples. ArXiv, abs/2010.03593, 2020.

[4] Improving robustness using generated data. NeurIPS 2021.

评论- Response to Reviewer AY2R

2024-11-20

评论- Reminder - Discussion Stage Closing Soon - 23 November

2024-11-23

Dear Reviewer AY2R,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 24 November

2024-11-24

Dear Reviewer AY2R,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 25 November

2024-11-25

Dear Reviewer AY2R,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 26 November

2024-11-26

Dear Reviewer AY2R,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 27 November

2024-11-27

Dear Reviewer AY2R,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 28 November

2024-11-28

Dear Reviewer AY2R,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 30 November

2024-11-30

Dear Reviewer AY2R,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 1 December

2024-12-01

Dear Reviewer AY2R,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 2 December

2024-12-02

Dear Reviewer AY2R,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 3 December

2024-12-03

Dear Reviewer AY2R,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

审稿意见

评分: 6置信度: 32024-11-06

This paper presents a two-pronged adversarial defending method called distributional-discrepancy-based adversarial defense, leveraging the maximum mean discrepancy. After training, the model could fist distinguishes between clean inputs and adversarial examples and then denoise the attacks before feeding them into the classifier. The paper conducts experiments on two datasets to demonstrate the effectiveness of the proposed method against attacks.

优点

1.This paper is well-motivated and the idea of this paper is clearly explained.

2.The paper provides theoretical proofs and algorithms for the proposed idea.

3.The experiments are conducted across multiple datasets using various backbone architectures. The performance compared with the baseline methods demonstrates the effectiveness of the proposed method.

缺点

The following questions need more explanation:

The authors use Assumptions 1 and 2, along with Corollary 1, to establish that the ground truth labeling functions are equivalent for both the source and target domains. While it is true that clean examples and adversarial examples are actually using the same model, Assumption 1 needs further explanation. If $f_A$ represents a valid ground-truth labeling function specifically for the adversarial domain, why should it yield the same prediction for both adversarial examples $x+\epsilon'$ and clean examples $x$ ?
In the experiments, the threshold for MMD-OPT is set to 0.05. How was this threshold selected?
The Evaluation Settings section in the paper lacks clarity Are the baseline methods evaluated using AutoAttack, while the proposed method is evaluated with the adaptive white-box attack? Could you report the results under the same attack, for example, consider the entire process as a classifier and calculate the accuracy for defending adversarial examples from either AutoAttack or PGD+EOT.

问题

Please see the weaknesses section. I will leave my rating for now and wait for the discussion.

评论- Response to Reviewer R3Ms

2024-11-20

Thank you so much for your comments! It is our pleasure that our theoretical motivation and our experimental results can be recognized. Your thorough review and comments are very important to the improvement of our work! Please find our replies below.

评论- Response to Reviewer R3Ms (part 1)

2024-11-20

Q1. The authors use Assumptions 1 and 2, along with Corollary 1, to establish that the ground truth labeling functions are equivalent for both the source and target domains. While it is true that clean examples and adversarial examples are actually using the same model, Assumption 1 needs further explanation. If $f_\mathcal{A}$ represents a valid ground-truth labeling function specifically for the adversarial domain, why should it yield the same prediction for both adversarial examples $x + \epsilon'$ and clean examples $x$ ?

Reply: Thanks for your question! Here is the detailed explanation for Assumption 1:

The key idea here is: a valid adversarial example will not change the semantic meaning of the clean example (i.e., adversarial perturbation should be visually imperceptible). Often, this is achieved by setting a maximum allowed perturbation budget $\epsilon$ .
Then, based on the above idea, the ground-truth labelling function in adversarial domain (i.e., $f_\mathcal{A}$ ) refers to an adversarially robust labelling function, i.e., the prediction of $f_\mathcal{A}$ will not be affected by an adversarial perturbation $\epsilon'$ s.t. $\left|\epsilon'\right|_p \leq \epsilon$ .
That is to say, $f_\mathcal{A}$ should yield the same prediction for both $x + \epsilon'$ and $x$ since they share the same semantic meaning. This aligns with common assumptions in adversarial learning [1], where ground-truth labels are considered invariant to imperceptible adversarial perturbations.
Following the above, Assumption 1 assumes that the ground-truth labelling function in the adversarial domain $f_\mathcal{A}$ indeed exists, and if it exists, it will satisfy the property $f_\mathcal{A}(x + \epsilon') = f_\mathcal{A}(x)$ .

[1] Towards Deep Learning Models Resistant to Adversarial Attacks, ICLR 2018.

Q2. In the experiments, the threshold for MMD-OPT is set to 0.05. How was this threshold selected?

Reply: Thanks for your question and sorry for the confusion!

Threshold Value	Clean	PGD+EOT( $\ell_\infty$ )	PGD+EOT( $\ell_2$ )	AutoAttack( $\ell_\infty$ )	AutoAttack( $\ell_2$ )
0.05	94.16	66.98	73.40	72.21	85.96
0.07	94.16	66.98	73.40	72.21	85.96
0.1	94.16	66.98	73.40	72.21	85.96
0.5	94.16	66.98	84.38	72.21	85.96
0.7	94.16	66.98	84.38	72.21	85.96
1.0	94.16	64.75	84.38	72.21	85.96

Threshold Value	Clean	PGD+EOT( $\ell_\infty$ )
0.01	76.61	53.75
0.015	76.61	53.75
0.02	78.61	53.75
0.025	78.61	53.75
0.03	78.61	0.46
0.04	78.61	0.46
0.05	78.61	0.46

评论- Response to Reviewer R3Ms (part 2)

2024-11-20

Q3.1 The Evaluation Settings section in the paper lacks clarity. Are the baseline methods evaluated using AutoAttack, while the proposed method is evaluated with the adaptive white-box attack?

Reply: Thanks for the question and sorry for the confusion!

Indeed, AT-based methods are evaluated using white-box AutoAttack.
AP-based methods are evaluated using white-box PGD+EOT attack.
Our method is evaluated using the proposed adaptive white-box PGD+EOT attack.

Here is the justification and motivation of why we use such settings:

AutoAttack is often used as the gold standard to evaluate AT-based methods (i.e., AT-based methods often demonstrate worst-case robustness on AutoAttack) [1].
However, [2] found that AutoAttack can lead to over-estimated robustness for diffusion-based AP methods and PGD+EOT is the golden standard for diffusion-based AP methods.
Empirically, diffusion-based AP methods perform better on AutoAttack than AT-based methods, while AT-based methods perform much better on PGD+EOT than diffusion-based AP methods.

Now, considering our method, the situation becomes more complex: our method involves a detector module, which neither AT-based methods nor AP-based methods have.

Then, a natural question occurs: How can we fairly compare our method with the AT and AP methods? The answer to this question is we need to design an adaptive attack that counts for the detector module. If we only consider the denoiser and the classifier (i.e., the white-box setting for AP-based methods), DDAD can achieve around 77% robustness on PGD+EOT and 81% on AutoAttack, which is clearly not fair for both AT and AP methods.

Therefore, we are thinking of using worst-case robustness to evaluate all baseline methods (i.e., report the robustness that can represent the worst-case scenario of a system):

Since AT-based methods are empirically biased towards PGD+EOT, we use AutoAttack for AT-based methods.
Since diffusion-based AP methods are empirically biased towards AutoAttack, we use PGD+EOT for diffusion-based AP methods.
For our method, we find that our method achieves the worst-case robust accuracy on adaptive white-box PGD+EOT attack, rather than adaptive white-box AutoAttack. Therefore, we report the robustness of our method using adaptive white-box PGD+EOT.

[1] RobustBench: a standardized adversarial robustness benchmark, NeurIPS D&B 2021.

[2] Robust Evaluation of Diffusion-Based Adversarial Purification, ICCV 2023.

Q3.2 Could you report the results under the same attack, for example, consider the entire process as a classifier and calculate the accuracy for defending adversarial examples from either AutoAttack or PGD+EOT.

Reply: Yes, sure! Please kindly check the experimental results in Table 1 below. Notably, in both cases, our method can outperform baseline methods.

Table 1: Clean and robust accuracy (%) against adaptive white-box PGD+EOT ( $\ell_\infty, \epsilon = 8/255$ ) and adaptive white-box AutoAttack ( $\ell_\infty, \epsilon= 8/255$ ) on CIFAR-10. * means this method is trained with extra data. We show the most successful defense in bold.

Type	Method	Clean	PGD+EOT	AutoAttack
AT	Gowal et al. (2021)	87.51	66.01	63.38
AT	Gowal et al. (2020)*	88.54	65.10	62.76
AT	Pang et al. (2022a)	88.62	64.95	61.04
AP	Yoon et al. (2021)	85.66	33.48	59.53
AP	Nie et al. (2022)	90.07	46.84	63.60
AP	Lee & Kim (2023)	90.16	55.82	70.47
Ours	DDAD	94.16	67.53	72.21

评论- Response to Reviewer R3Ms

2024-11-20

评论- Reminder - Discussion Stage Closing Soon - 23 November

2024-11-23

Dear Reviewer R3Ms,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

评论- Reminder - Discussion Stage Closing Soon - 24 November

2024-11-24

Dear Reviewer R3Ms,

We appreciate the time and effort that you have dedicated to reviewing our manuscript.

We have carefully addressed all your queries. Could you kindly spare a moment to review our responses?

Have our responses addressed your major concerns?

If there is anything unclear, we will address it further. We look forward to your feedback.

Best regards,

Authors of Submission 13588

2024-11-24

Thank you for your response. It has resolved my concerns.

评论- Thanks for your further comments!

2024-11-24

Dear Reviewer R3Ms,

Thank you for your support and we truly appreciate your thoughtful comments and engagement with our work.

If you feel that your concerns have been well-addressed, we do hope for an updated rating, as it would be a strong encouragement for our efforts.

If there are any concerns that might hinder an updated rating, please do not hesitate to let us know. We are more than happy to address them in greater detail.

We look forward to your feedback.

Best regards,

Authors of Submission 13588

2024-11-26

Dear Reviewer R3Ms,

It is glad to hear that your concerns have been addressed.

We noticed that the current score remains "weak reject", despite addressing all your concerns.

Could you kindly clarify what prevents you from assigning a more positive score? Your additional insights would be invaluable for us to further refine our work.

There is still time to discuss, we are delighted to answer any questions or address any concerns from you.

We are looking forward to your reply!

Best regards,

Authors of Submission 13588

2024-11-30

The authors' response has addressed my concerns. After reviewing the feedback from other reviewers, I have decided to raise my score.

评论- Many thanks for your reply and increasing your score to 6!

2024-11-30

Dear Reviewer R3Ms,

We want to thank you again for providing this valuable feedback to us. We are glad to hear that your concerns have been addressed! Your support would definitely play a crucial role in this rebuttal!

Best regards,

Authors of Submission 13588

AC 元评审

2024-12-18

The paper proposes a two-pronged adversarial defense (DDAD) that combines detection and denoising using Maximum Mean Discrepancy (MMD), achieving improved clean and robust accuracy on datasets like CIFAR-10 and ImageNet-1K. Strengths include a novel approach, theoretical grounding, and comprehensive experiments. However, significant weaknesses include the impracticality of batch-based inference, inconsistent evaluation settings, limited robustness to out-of-distribution shifts, unclear links between theoretical and empirical components, and reproducibility concerns. Reviewer confidence disparity further undermines support, with the highest score (8) from a low-confidence reviewer and the most critical score (5) from a highly confident one. Overall, the method requires refinement and stronger evidence of practical viability and fairness in evaluation, leading to a rejection recommendation.

审稿人讨论附加意见

During the rebuttal, reviewers raised concerns about the practicality of batch-based inference, fairness in evaluation settings, robustness to out-of-distribution data, reproducibility, and gaps between the theoretical claims and empirical methods. The authors clarified that batch processing could be implemented using sample queues and defended their evaluation choices, providing additional results and sensitivity analyses. They also argued that their denoiser improves performance on rejected out-of-distribution samples. However, the fundamental issues of impracticality, inconsistent evaluation, reliance on strong assumptions, and theoretical gaps remained inadequately addressed.

最终决定Reject

2025-01-22

Reject