5.8

/10

Poster4 位审稿人

最低5最高7标准差0.8

3.8

置信度

正确性3.0

贡献度2.8

表达3.0

NeurIPS 2024

Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor

Shaokui Wei,Hongyuan Zha,Baoyuan Wu

OpenReview PDF

提交: 2024-05-14更新: 2024-11-06

摘要

关键词

Adversarial Machine LearningBackdoor AttackBackdoor DefenseAI Security

评审与讨论

审稿意见

评分: 5置信度: 42024-06-16

This paper focuses on the in-training backdoor defense by proactively injecting a defensive backdoor into the model during training. During inference, PDB embeds a defensive trigger in the inputs and reverses the model prediction. This operation effectively suppresses malicious backdoors and ensures the model utility on the original task. Various experiments demonstrate the effectiveness.

优点

Various experiments demonstrate the proposed method is effective.

缺点

This paper has several limitations.

Novelty has been discovered by existing methods. I assume that the author truly share a interesting insight towards the backdoor defense. But in [2] (Usenix Security 2024), “backdoor samples are OOD samples compared to benign samples from the target class, designs indicator task leveraging OOD samples to identify and rule out backdoor updates”. But I do not see that author has discussed in your paper. I wonder whether authors could explain whether directly utilizing [2] in your setting has any problem.
Missing comparison and confusing results. As shown in your Table 1: Results (%) on CIFAR-10 with PreAct-ResNet18 and poisoning ratio 5.0%, for attack Trojan, ABL achieves 18.64 100.00, but the results in [1] is not the same. 70.70/0.02. Why do you not present the same setting as the current relative works? I wonder whether you could compare your work in your setting or the [1] scenario.
Abaltion for trigger pattern. "Regarding the trigger’s position, it should181 be crafted to preserve the core visual patterns182 of the original image". I wonder whether the author could conduct the experiments on the trigger pattern position to demonstrate your conclusion.

[1] Enhancing Fine-Tuning Based Backdoor Defense with Sharpness-Aware Minimization. ICCV 2023. [2] BackdoorIndicator: Leveraging OOD Data for Proactive Backdoor Detection in Federated Learning. Usenix Security 2024

问题

Your work shares a high similar motivation with the BackdoorIndicator. I encourage authors to provide detailed explanations.

局限性

Yes, the authors have discussed.

作者回复

2024-08-07

Q1. Novelty and comparison with backdoorIndicator [2]

R1: Since [2] was first accessible on May 31 on Arxiv, one week after the submission deadline of NeurIPS 2024, we did not have the opportunity to review and compare our work with it. We would like to mention that our submission is not expected to compare to work that appeared only a month or two before the deadline, according to the latest policy of NeurIPS 2024 (see NeurIPS-FAQ).

However, we appreciate the opportunity to highlight the distinctions between our approach and BackdoorIndicator [2] to further improve our submission.

First, we would like to clarify the following differences between our work and [2]:
- Threat model: [2] targets decentralized training (FL) setting, where multiple clients train models locally and contribute updates to a central server. Our work considers a centralized training setting where only a central server is used.
- Task: [2] focuses on detecting malicious clients, whereas our method aims to train a secure model on a poisoned dataset without clients.
- Motivation: [2] is built on the motivation that planting subsequent backdoors with the same target label enhances previously planted backdoors, therefore, providing a way to detect the poisoned clients, while our method is based on the motivation that planting a concurrent reversible backdoor can help to mitigate the malicious backdoor.
- Methodology: [2] utilizes OOD samples for backdoor client detection while our method constructs a proactive defensive poison dataset, following well-designed principles.
Based on above analysis, we think our work and [2] are significantly different from each other. If the reviewer can identify specific similarities in the conceptual ideas between our work and [2], we would be more than happy to engage in a deeper discussion regarding these points.
Second, we would like to discuss the challenges in direct utilizing [2] in our setting:
- BackdoorIndicator [2] is designed to detect malicious clients within a federated learning (FL) context. This makes it challenging to apply [2] directly to our centralized environment since the task of identifying backdoored clients does not naturally fit into this setting (only a central server).
- For comparison between [2] and our method, we need to emulate an FL scenario by assigning each image to a separate client (ensuring existence of benign client), thereby creating 50,000 local models from the CIFAR-10 dataset to defend a single attack with PreAct-ResNet18. This would require an impractical amount of computational resources, estimated at over 30,000 hours (1,250 days) of training time and 30TB of storage space using a server with a single RTX 3090 GPU. That's why we cannot compare our method with [2].

Q2. Confusing results in Table 1.

R2: Thank you. Firstly, it is important to note that:

All attack checkpoints were sourced directly from the official BackdoorBench website.
All experimental results for baselines align with those reported by BackdoorBench (refer to the official leaderboard).

This ensures that the comparisons presented in our paper are both fair and reliable.

It should be noted that, as we use different checkpoints compared to [1], the results presented in our paper may differ from those in [1] due to the inherent randomness and potential instability of some baselines. To investigate the failure of ABL in Table 1 of main manuscript, we conducted a detailed analysis of its training process and found that ABL successfully detected 437 poisoned samples from 2500 samples. During the unlearning phase, a sudden increase in the loss for clean samples occurs, leading to a notable degradation in model performance, which ultimately causes ABL to fail (see Figure 3 of the Supplementary PDF for the loss curves).

Additionally, we anonymously requested the Trojan attack checkpoint for PreAct-ResNet18 with CIFAR-10 and a poisoning ratio of 5% from [1]. For this checkpoint, PDB can still mitigate the backdoor with ACC of 91.84% and ASR of 0.47%.

Q3. Missing comparison to FT-SAM [1]

R3: Thanks. To address the comparison with FT-SAM [1], we have adapted their method to our experimental setting. It's worth noting that in [1], the authors employ the Blended attack with a blending ratio of 0.1 (Blended-0.1), whereas we use a blending ratio of 0.2 (Blended-0.2). For consistency and completeness, we have now included experiments using both blending ratios, and the results are shown below:

Table 1: Results on PreActResNet-18

	Attack →	BadNet	BadNet	Blended-0.2	Blended-0.2	Blended-0.1	Blended-0.1	Sig	Sig	SSBA	SSBA
Poisoning ratio ↓	Defense ↓	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR
5%	FT-SAM	92.66	1.22	92.87	31.54	92.76	2.87	92.82	1.80	92.83	3.27
5%	PDB	91.08	0.38	91.36	0.70	91.85	0.22	91.79	0.06	91.58	0.46

From Table 1 above, we can find FT-SAM can achieve a higher accuracy as it aims to fine-tune a backdoored model while PDB aims to train a model from scratch. Consistent with [1], Table 1 shows that FT-SAM can mitigate backdoor attacks for most cases, except for the Blended-0.2. We observe that FT-SAM struggles to defend against blended attacks with higher blending ratios, such as 0.2. Notably, PDB achieves a significantly lower ASR across all cases, with an average ASR below 0.5%.

Q4. Ablation study for trigger position.

R4: Thanks. We would like to refer you to the Common Response for more comprehensive ablation studies for PDB. From Common Response, we can see that placing a trigger at the center of an image significantly degrades accuracy, as the trigger masks the core patterns of the image.

Reference:

[1] Enhancing Fine-Tuning Based Backdoor Defense with Sharpness-Aware Minimization.

[2] BackdoorIndicator: Leveraging OOD Data for Proactive Backdoor Detection in Federated Learning.

评论- Official Comment by Authors

2024-08-12

Dear Reviewer wZZe,

We sincerely appreciate your valuable insights and suggestions on our work. We have made our best efforts to address the concerns and queries you raised during the rebuttal process. We would greatly appreciate confirmation on whether our response has effectively resolved your doubts. Your feedback will be instrumental in improving the quality of our work. As the end of the discussion period is approaching, we eagerly await your reply before the end.

Sincerely,

The Authors

2024-08-12

Thanks for the author's response. I expect the authors could add the relative discussion with [1] in your final version. Besides, I hope the authors could provide a more detailed experiment setting in your main part or supplement material.

[1] BackdoorIndicator: Leveraging OOD Data for Proactive Backdoor Detection in Federated Learning.

评论- Thanks for your feedback

2024-08-12

Dear Reviewer wZZe,

We sincerely appreciate your thoughtful response and the time you have dedicated to reviewing our paper. We are pleased to hear that our rebuttal addressed your concerns satisfactorily, and we are strongly encouraged by your recognition of our efforts. In the next revision, we will include a detailed discussion contrasting our approach with BackdoorIndicator, highlighting both the similarities and differences to further strengthen our paper. Additionally, we will enhance Appendix A: Experiment Details to provide a more thorough description of our experimental methodology, including all necessary details for reproducibility and clarity.

Thank you again for your valuable input and support.

Best regards,

The Authors

审稿意见

评分: 5置信度: 42024-07-09

This paper investigates in-training backdoor defense by proactive defensive backdoor. They introduce a defensive poisoned dataset to train the model together with all poison data. By attach the defensive trigger onto the input sample, the potential backdoor attack is failed by such proactive defensive backdoor. Extensive experiments are conducted to evaluate the effectiveness of our method and compare it with five SOTA defense methods across seven challenging backdoor attacks.

优点

The author offers a straightforward solution to eliminate the need for costly detection and relabeling processes, thus improving the efficiency.

The experiments seem comprehensive, great attach success rate and defensive effectiveness rate are achieved.

缺点

The author should elaborate why DBD, NAB and V&B result in a substantial increase in training costs in Introduction. Experimental analyses are highly desired.

The contribution (1) is overclaimed. This is not the novel paradigm because this work develop based on several proactive attack-based works like NAB.

The sensitivity to the defensive poisoned dataset of the proposed method should be evaluated. Though the author claim the proposed method follow Principle 4, more details and thorough analyses are required.

Despite better defense effectiveness rate of the proposed method, its accuracy on benign samples is obviously inferior to other methods. When the poisoning ratio is decreasing, the proposed method may unexpectedly become an attack to the original benign dataset, resulting in the degraded accuracy. How to avoid this situation?

问题

Why the proposed method can meet Principle 4? The explanation in Method is not clear.

What is the difference between the malicious poisoned dataset and the defensive poisoned dataset when preparing the data?

局限性

The generalization of the proposed method to diverse backdoor attack remains unknown. When the attack is invisible or the poison rate is mariginal, the proposed method may degenerate.

作者回复

2024-08-06

Q1. Training cost for DBD, NAB and V&B

R1: Thanks. Here, we report both training complexities and the empirical runtime of these methods in Table 1. For simplicity, we first define the following notations:

$C_{sl}$ : Supervised training cost.
$C_{ssl}$ : Self-supervised training cost.
$C_{semi}$ : Semi-supervised training cost.
$C_{fc}$ : Training cost for fully connected (FC) layers.
$N_{tr}$ : Size of the training dataset.
$N_{def}$ : Size of the defensive poisoned dataset.
$F$ : Frequency of sampling defensive poisoned samples.

Table 1: Training complexity and empirical runtime on CIFAR-10 with PreAct-ResNet18

Method	Complexity	Empirical Runtime (s)
No Defense	$O(C_{sl})$	919
DBD	$O(C_{ssl} + C_{fc} + C_{semi})$	7495
NAB	$O(C_{ssl} + C_{sl})$	3081
V&B	$O(2 \cdot (C_{ssl} + C_{sl}))$	6144
PDB (Ours)	$O\left(\frac{N_{tr} + F \cdot N_{def}}{N_{tr}} \cdot C_{sl}\right)$	1853

Analysis: From Table 1, we find that since we set $\frac{F \cdot N_{def}}{N_{tr}}$ as a small value, the training complexity of PDB is not much larger than the baseline (i.e., No Defense). In contrast, as $C_{ssl}$ is often several times $C_{sl}$ , the complexities of DBD, NAB, V&B are often much higher than the baseline. It is reflected in the difference in empirical time, where PDB's runtime is about two times that of the baseline and is much lower than other methods.

Q2. Difference between PDB and NAB

R2: Thanks for this insightful comment. A comprehensive analysis can be found in Appendix C.5 of our submitted manuscript. Here, we would like to highlight that although both NAB and PDB share the idea of proactive backdoor, PDB has essential differences from NAB, as detailed below:

PDB does not rely on poisoned sample detection, while NAB still relies on accurate poisoned sample detection, falling into the "detection-then-mitigation" pipeline.
PDB does not depend on suspicious sample relabeling, while NAB relies on accurately relabelling of detected suspicious samples.

Q3: Sensitivity to the defensive poisoned dataset and why the proposed method can meet Principle 4.

R3: Thanks. We would like to refer you to the Common Response for more comprehensive experiments and analysis for PDB.

Q4: Difference between the malicious poisoned dataset and the defensive poisoned dataset

R4: Thanks. We highlight the differences as follows:

Malicious poisoned dataset is provided by the attacker and may contain unknown malicious poisoned samples.
Defensive poisoned dataset is crafted by the defender with known reversible defensive poisoned samples.

Q5: Generalization to invisible backdoor attack and low-poisoning ratio attack.

R5: Thanks. As discussed in our paper, the proposed method, PDB, does not rely on specific assumptions about the type of attack, making it effective for defending against both invisible backdoor attacks and attacks with low poisoning ratios. To demonstrate this effectiveness, we conducted experiments using low poisoning ratios (0.5% and 0.1%) for both Visible and Invisible attacks. The results are summarized in Table 2, from which we can find that PDB can consistently mitigate backdoor attacks.

Table 2: Results on PreAct-ResNet18 and CIFAR10

	Trigger Type →	Visible	Visible	Invisible	Invisible	Invisible	Invisible	Invisible	Invisible	Invisible	Invisible
Attack →		BadNet	BadNet	Blended	Blended	Sig	Sig	SSBA	SSBA	WaNet	WaNet
Poisoning ratio ↓	Defense ↓	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR
0.10%	No Defense	93.61	1.23	93.80	56.11	93.73	41.27	93.89	1.62	92.18	0.78
0.10%	PDB	91.55	0.87	91.91	0.36	91.59	0.39	91.72	0.42	91.87	0.89
0.50%	No Defense	93.76	50.06	93.68	93.30	93.80	82.43	93.41	35.67	91.27	1.12
0.50%	PDB	91.62	0.60	91.66	0.31	91.72	0.12	91.65	0.54	91.72	0.92

Q6: Degraded accuracy when the poisoning ratio is decreasing.

R6: Thanks. For such concern, we would like to first clarify that PDB can still achieve high accuracy when the poisoning ratio is decreasing, as shown in Table 2 above. Then, we would like to discuss the factors that influence the accuracy of PDB:

Model capacity and data complexity:
- Model capacity: Since PDB introduces additional task, i.e., injecting defensive backdoor, increasing the model capacity helps to increase the accuracy of PDB, as evidenced in Table 3.
  
  Table 3: Results on different models
  
  Model ResNet-18 ResNet-18 ResNet-34 ResNet-34 ResNet-50 ResNet-50
  Metric ACC ASR ACC ASR ACC ASR
  No Defense 92.54 76.27 93.08 82.48 93.76 87.26
  PDB 91.81 0.29 92.63 0.28 93.67 0.18
- Dataset complexity: By comparing defense results with different datasets (e.g., Table 1 and Table 6 in the main manuscript), we can find that by decreasing the dataset complexity, the accuracy of PDB increases significantly.
Strength of defensive backdoor:
- Strength of augmentation: From Fig. 2 of Supplementary pdf, we can find that there exists a tradeoff between ACC and ASR. Therefore, the accuracy of PDB can be boosted by reducing the strength of augmentation.
- Sampling frequency: From Table 4 in common response, we can find that by increasing the sampling frequency of defensive poisoned samples, the accuracy of PDB can be boosted.
- Trigger size: Table 1 in Common Response shows that a proper choice of trigger size can also help to increase the accuracy. Therefore, if a validation set is accessible, a proper trigger size can be chosen to increase accuracy.

Model	ResNet-18	ResNet-18	ResNet-34	ResNet-34	ResNet-50	ResNet-50
Metric	ACC	ASR	ACC	ASR	ACC	ASR
No Defense	92.54	76.27	93.08	82.48	93.76	87.26
PDB	91.81	0.29	92.63	0.28	93.67	0.18

In summary, due to the "home field advantage" of PDB, there are several ways to maintain a high accuracy even in the case of a low malicious poisoning ratio, such as increasing model capacity, simplifying the dataset, reducing the strength of augmentation to defensive poisoned samples, increasing the sampling frequency and choosing a proper defensive trigger size.

2024-08-14

Thanks for the author's response. Most of my concerns have been addressed. I strongly suggest the author clarify the motivation and differences between the existing methods in the introduction. Table 1 of complexity should be included in the final version. Overall, I tend to maintain my positive score.

评论- Official Comment by Authors

2024-08-12

Dear Reviewer 8vxw,

We sincerely appreciate your valuable insights and suggestions on our work. We have made our best efforts to address the concerns and queries you raised during the rebuttal process. We would greatly appreciate confirmation on whether our response has effectively resolved your doubts. Your feedback will be instrumental in improving the quality of our work. As the end of the discussion period is approaching, we eagerly await your reply before the end.

Sincerely,

The Authors

2024-08-13

Dear Reviewer 8vxw,

We truly appreciate the thoughtful feedback you provided on our work. We've taken your comments to heart and have worked diligently to address them. We're reaching out again because we're nearing the end of the discussion period, and we hope to hear your thoughts on our rebuttal. Your insights are incredibly valuable to us and will help us enhance the quality of our paper.

Thank you so much for your time and support!

Best regards,

The Authors

审稿意见

评分: 6置信度: 32024-07-12

The paper introduces a novel method called Proactive Defensive Backdoor (PDB) to counter backdoor attacks in deep neural networks. PDB differs from traditional methods, which focus on detecting and eliminating suspicious samples. Instead, PDB proactively injects a defensive backdoor into the model during the training phase. PDB operates by embedding a trigger in the model's inputs during prediction, which neutralizes the effects of any malicious trigger present. The authors designed the defensive backdoor in such a way that it can reverse the model's prediction to its true label, thus maintaining its utility on benign tasks. Through extensive experimentation, the authors have demonstrated PDB's ability to outperform existing defense methods by achieving a balance between suppressing malicious backdoors and preserving model performance across various datasets and model architectures.

优点

The paper is well-structured and clearly written.
Introduces a novel approach for mitigating backdoor attacks.
Detailed experiments and comparison with state-of-the-art.

缺点

The authors evaluated adaptive attacks only by increasing the trigger size of the BadNets attack. A more appropriate adaptive attack would increase the poisoning ratio to strengthen the backdoor effect compared to the defensive backdoor.
It would be interesting to see if PDB can scale for clean label backdoor attacks, where the adversary does not change the trigger label during the backdoor attack.
It is not clear whether the running time in Table 4 refers to training time or inference time. Comparing inference time would be more appropriate, as PDB requires the model to make two predictions for a single image. How does the inference time of PDB compare to other defenses in the literature?
It is not clear why the specific blocks, namely Block 1 and Block 4, were chosen to illustrate the impact of PDB. Also, the plots related to TAC are difficult to understand from the text.

问题

Will PDB be effective against adaptive attacks that increase the poisoning ratio?
How does the inference time of PDB compare with other defenses in the literature?

局限性

The authors adequately addressed the limitations and potential negative societal impact.

作者回复

2024-08-06

Q1. Defend adaptive attacks that increase the poisoning ratio.

R1: Thanks for this suggestion. To provide a more comprehensive evaluation of the proposed method PDB against adaptive attacks, we conduct experiments with poisoning ratios from 10% to 30%, and malicious trigger size from 4x4 to 10x10 using PreAct-ResNet18 on CIFAR-10. The results are summarized below:

Table 1: Results for adaptive attacks

Poisoning ratio →	10%	10%	10%	10%	20%	20%	20%	20%	30%	30%	30%	30%
Defense →	No Defense	No Defense	PDB	PDB	No Defense	No Defense	PDB	PDB	No Defense	No Defense	PDB	PDB
Malicious trigger size ↓	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR
4x4	92.39	96.83	90.66	0.18	91.14	97.67	90.01	0.21	90.38	98.13	89.65	0.49
5x5	93.11	97.69	91.28	0.29	92.79	97.98	90.97	0.28	92.20	98.30	90.02	0.56
6x6	93.26	98.16	91.62	0.27	92.48	98.68	90.83	0.33	92.01	98.83	90.03	0.69
7x7	93.65	98.66	91.46	0.31	93.07	99.03	91.03	0.56	92.56	99.23	90.48	0.67
8x8	93.51	99.24	91.16	0.37	92.82	99.38	91.14	0.58	92.53	99.50	90.27	0.74
9x9	93.45	99.53	91.12	0.51	92.76	99.67	90.84	0.56	92.15	99.72	90.39	0.67
10x10	93.20	99.66	91.37	0.54	93.17	99.74	90.76	0.78	92.58	99.81	90.45	0.82

From Table 1, we can find that PDB can consistently mitigate backdoor against adaptive attacks with various malicious trigger size and poisoning ratio. Note that to keep the stealthness of malicious backdoor, its poisoning ratio and trigger size is expected to be constrained. However, the defensive backdoor can utilize a large trigger size and high sampling frequency to meet the Principle 4: Resistance against other backdoors, therefore, mitigating the malicious backdoor effectively.

Q2. Can PDB scale for clean label backdoor attacks?

R2: Thank you for your question. As discussed in our paper, the proposed method, PDB, does not rely on specific assumptions about the type of attack. The key of PDB is utilizing a proactive defensive backdoor to suppress the malicious backdoor, making it effective for defending against clean label attacks with large model and dataset.

Note that we have shown that PDB can defend against SIG for CIFAR-10 datset (Table 1 in main manuscript). To demonstrate PDB's effectiveness for clean label attack on large scale dataset and model, we conduct experiments on the SIG and LC [1], with ViT-B-16 and Tiny-ImageNet. Note that for Tiny ImageNet (200 classes), the poisoning ratio is at most 0.5%. The results are summarized below:

Table 2: Defending results against clean label attack, on ViT-B-16 and Tiny-ImageNet

	Attack →	LC	LC	SIG	SIG
Poisoning ratio ↓	Defense ↓	ACC	ASR	ACC	ASR
0.10%	No Defense	75.39	1.32	75.86	9.10
0.10%	PDB	74.75	0.23	74.49	0.52
0.50%	No Defense	76.15	32.65	75.29	69.21
0.50%	PDB	75.07	0.37	74.25	0.03

From Table 2, we can find that clean label attacks fail to attack the model with poisoning ratio 0.1%. For poisoning ratio 0.5%, PDB can effectively defend against the clean label attack for large-scale dataset and model.

Q3: Comparing training and inference runtime between PBD and other baselines

R3: Thanks for your constructive suggestion.

Training runtime: Firstly, we would like to clarify that the running time reported in Table 4 of the main manuscript indicates to the training time. Furthermore, to better understand the training complexity of our method, please refer to our response R1 to Reviewer 8vxw.
Inference runtime: As shown in Algorithm 1 (see the bottom four rows), during inference, our PDB also requires only one forward pass like the standard inference. The additional costs involve adding the defensive trigger onto each input image (i.e., $x \oplus \Delta_1$ ), and the inverse mapping (i.e., $h^{-1}(\cdot)$ ). Compared the cost of forward pass, these additional costs are egligible. Thus, our inference cost is almost similar with other baselines.

Q4: Detailed explanation of TAC plot

R4: Thanks. We noticed a typo in the legend of the TAC plots (Figure 4) in our main manuscript, which may confuse the reviewer, and we will fix it in our revisions. Due to the page limit, we only show the visualization of TAC for the 1st and 4th blocks of PreAct-ResNet18 (4 blocks in total) in our main manuscript. To provide a more comprehensive explanation for TAC plots, we visualize the TAC for all blocks of PreAct-ResNet18 and show the plots in Fig 1 of Supplementary pdf. Now, we provide a more detailed explanation for the plots:

Definition of TAC [2]: TAC is designed to measure the change of activation values for each neuron when comparing maliciously poisoned samples to their benign counterparts. Let $\phi$ be a feature extractor which maps an input image $x$ to the latent activations. For an input image $x$ , we can construct the malicious poisoned sample $x\oplus \Delta_0$ . In PBD, a defensive trigger is added to the malicious poisoned sample, crafting sample $x\oplus\Delta_0\oplus\Delta_1$ , aiming to suppress the malicious backdoor. Therefore, for dataset $D$ , we define

$\text{TAC w/o } \Delta_1 = \frac{\sum_{x\in D}(\phi(x\oplus\Delta_0)-\phi(x))}{|D|},$

$\text{TAC w/ } \Delta_1 = \frac{\sum_{x\in D}(\phi(x\oplus\Delta_0\oplus\Delta_1)-\phi(x))}{|D|}.$

Analysis of the plots: From the TAC plots in Fig 1 of Supplementary pdf, we can find that by planting a defensive trigger to a malicious poisoned sample, the activation changes from a malicious backdoor are substantially suppressed, indicating that defensive backdoor can suppress the malicious backdoor, therefore, reducing the ASR.

Reference:

[1] Poison frogs! targeted clean-label poisoning attacks on neural networks. NeurIPS 2018

[2] Data-free backdoor removal based on channel lipschitzness. ECCV 2022

评论- Official Comment by Authors

2024-08-12

Dear Reviewer 8hm3,

We sincerely appreciate your valuable insights and suggestions on our work. We have made our best efforts to address the concerns and queries you raised during the rebuttal process. We would greatly appreciate confirmation on whether our response has effectively resolved your doubts. Your feedback will be instrumental in improving the quality of our work. As the end of the discussion period is approaching, we eagerly await your reply before the end.

Sincerely,

The Authors

2024-08-12

I appreciate the authors' thorough rebuttal, which addressed most of my concerns. I have raised my score. I recommend that the authors incorporate these clarifications into the final version of the paper.

评论- Thanks for your feedback

2024-08-12

Dear Reviewer 8hm3,

We sincerely appreciate your thoughtful response and the time you've dedicated to reviewing our paper. We are pleased to hear that our rebuttal addressed your concerns satisfactorily and we are strongly encouraged by your recognition of our efforts. We will ensure that the clarifications discussed in the rebuttal are incorporated into the next revision of the paper.

Thank you again for your valuable input and support.

Best regards,

The Authors

审稿意见

评分: 7置信度: 42024-07-13

This paper proposes an proactive defense approach called PDB, which aims to combat malicious backdoor attacks by injecting active defense backdoors introduced by the defender. The main goal of PDB is to suppress the impact of malicious backdoors while preserving the utility of the model for its original tasks. PDB first analyzes the objectives for effective backdoor defense and introduces four fundamental design principles: reversibility, inaccessibility to attackers, minimal impact on model performance, and resistance against other backdoors. Then, an additional defensive poisoning dataset is constructed, and the model is trained using both this dataset and the entire poisoned dataset. To evaluate its effectiveness, the paper compares PDB with five SOTA in-training defense methods against seven SOTA data poisoning backdoor attack methods involving different model architectures and datasets. Experimental results show that PDB performs comparably or even better than existing baseline methods.

优点

The perspective of the paper is novel.
This paper is well-written and easy to understand.

缺点

The paper lacks an effective explanation for PDB.
The paper lacks sufficient and reasonable explanations for the experimental results.

问题

The perspective of the paper is innovative, and it is well-written and easy to understand. However, this paper still has the following issues.

The paper lacks sufficient and reasonable explanations for the experimental results, particularly those in Table 2. In Table 2, the ASR results for PDB under different attacks are all 0, while the results for other baseline are close to 100 (i.e., they all fail to defend successfully). The paper should provide a reasonable explanation for the significant difference in ASR between PDB and the other baselines.
In the Section of Resistance to Adaptive Attack, the paper evaluates PDB's resistance to malicious backdoor attacks with trigger sizes ranging from 3x3 to 6x6. In this section, the paper should set the backdoor attack trigger size to greater than 7x7 to match the trigger size set by PDB. Additionally, it raises the question of why the paper sets the trigger size for PDB to 7x7?
In the Section of Influences of Augmentation, different strength of augmentation do not show significant changes in defense-related ACC and ASR, even in cases without any augmentation. Therefore, what is the role of augmentation during the training process? Based on this, it is easy to question why PDB's defense triggers can be stronger than the triggers of the backdoor without relying on augmentation?

局限性

The authors have adequately addressed the limitations and potential negative societal impact of their work.

作者回复

2024-08-06

Q1. Explanations for experimental results in Table 2

R1: Thanks. Firstly, it's important to note that

All attack checkpoints in our experiments were sourced directly from the official BackdoorBench website
All experimental results for baselines in our main manuscript align with those reported by BackdoorBench (refer to the official leaderboard)

The above operation guarantees that the comparisons presented in our paper are both fair and reliable.

Regarding the performance of the baselines in Table 2, we provide the following analysis:

AC and Spectral: Both methods rely on the latent representation of images to detect poisoned samples. AC identifies poisoned samples through clustering in the latent space, considering smaller clusters as likely to contain poisoned data. Spectral Signature detects outliers in the latent space to identify such samples. However, with a poisoning ratio of 5% for Tiny ImageNet (200 classes, each class accounts for 0.5%), the poisoned samples become the majority within the target class, breaking the underlying assumptions of these methods. Therefore, reducing the poisoning ratio to 0.1% helps maintain AC's and Spectral's effectiveness, as shown in Table 1. Furthermore, Table 1 also shows that PDB exhibits robustness across various poisoning ratios and outperforms AC and Spectral even at lower poisoning rates.

Table 1: Results on ViT-b-16 with Tiny ImageNet

Poisoning ratio ↓	Attack →	BadNet	BadNet	WaNet	WaNet	BPP	BPP
	Defense ↓	ACC	ASR	ACC	ASR	ACC	ASR
0.10%	No Defense	76.67	54.03	61.70	51.33	62.89	69.14
0.10%	AC	76.50	24.15	75.65	2.77	76.79	3.55
0.10%	Spectral	75.55	25.47	75.48	2.47	76.09	3.14
0.10%	PDB	75.53	0.02	73.09	0.38	73.58	0.08

ABL: This method leverages the early learning phenomenon of poisoned samples. Upon closer inspection, however, we find that ABL struggles to identify poisoned samples when used with the ViT_b_16 model correctly. Different from the CNN model, Vision Transformers has some intriguing properties, e.g., weaker biases on backgrounds and textures [1] and robust to severe image noise and corruption [2], which may prevent them from learning the triggers in the early stages. Moreover, as ABL takes additional stages to train the model (with a lower learning rate) and unlearn the isolated samples, it achieves higher accuracy.
NAB: It struggles to detect and relabel the poisoned sample correctly for large datasets and large models, resulting in a low ACC and high ASR in Table 2. A more detailed analysis could be found in Appendix C.5.

Q2. Defend adaptive attacks with larger trigger and the choice of defensive trigger size

R2: Thank you for the insightful question. We would highlight the following points:

For adaptive attacks with larger trigger sizes, please refer to Q1 of Reviewer 8hm3. There, we conducted experiments with poisoning ratios ranging from 10% to 30% and malicious trigger sizes varying from 4x4 to 10x10. Our experiments demonstrate that PDB effectively defends against adaptive attacks with various trigger sizes and poisoning ratios. Note that to keep the stealthness of malicious backdoor, its poisoning ratio and trigger size is expected to be constrained. However, the defensive backdoor can utilize a large trigger size and high sampling frequency to meet the Principle 4: Resistance against other backdoors, therefore, mitigating the malicious backdoor effectively.
For the choice of defensive trigger size, we emphasize that the proposed method PDB is not tied to any specific choice of defensive trigger (see Appendix C.2 for PDB with other triggers). We would like to refer you to the Common Response for a detailed analysis of the role of defensive trigger size, from which we can find that a larger trigger size is preferred to ensure the defense effectiveness and a size between 5x5 to 8x8 is recommended for square triggers.

Q3. More explanation for PDB and the role of augmentation

R3. Thanks. We would like to refer you to the Common Response for more comprehensive explanation and experiments for PDB as well as the role of augmentation. In Common Response, we show that the augmentation plays an important role for further enhancing PDB and reducing the ASR.

Reference:

[1] Delving Deep into the Generalization of Vision Transformers under Distribution Shifts, CVPR 2022

[2] Intriguing Properties of Vision Transformers, NeurIPS 2021

评论- Official Comment by Authors

2024-08-12

Dear Reviewer 1B7j,

We sincerely appreciate your valuable insights and suggestions on our work. We have made our best efforts to address the concerns and queries you raised during the rebuttal process. We would greatly appreciate confirmation on whether our response has effectively resolved your doubts. Your feedback will be instrumental in improving the quality of our work. As the end of the discussion period is approaching, we eagerly await your reply before the end.

Sincerely,

The Authors

2024-08-12

Thank you for the detailed responses in the rebuttal.

My concerns have been addressed. Please add the rebuttal content to the final version of the paper.

I agree to accept this paper.

评论- Thanks for your feedback

2024-08-12

Dear Reviewer 1B7j,

We sincerely appreciate your thoughtful response and the time you've dedicated to reviewing our paper.

We are strongly encouraged by your recognition of our efforts. We will incorporate your suggestions and insights into the revised manuscript. Thank you once again for your thorough review and your positive evaluation.

Sincerely,

Authors

作者回复

2024-08-06

Common Response

Q1. Explanation for PDB, including the design of defensive trigger and satisfaction to Principle 4

R1. Thank you for your insightful comments. We would like to clarify that the proposed method, PDB, is not tied to any specific choice of defensive trigger or backdoor enhancement strategy as long as these strategies adhere to the principles outlined in our work. While we primarily focus on a 7x7 square trigger in Section 4 of the main manuscript, the effectiveness of PDB with other configurations has been validated in Appendix C.2 and C.3. For experiments and discussions here, we adopt the implementation of PDB presented in Section 4 unless otherwise specified.

Now, we explain PDB and discuss how to meet Principle 4 (Resistance against other backdoors) from the following perspectives:

Design of defensive trigger:
- Defensive trigger size: Trigger size directly contributes to the strength of the defensive backdoor. In Table 1, we evaluate PDB with a square defensive trigger with sizes ranging from 1x1 to 9x9. From Table 1, we can find that A larger trigger leads to a stronger defensive backdoor, resulting in a higher ACC and a lower ASR. However, as the trigger size increases, it may interfere with the visual content of the image, leading to a slight decrease in ACC. Notably, as the square trigger has strong visibility, a trigger with size 1x1 can still alleviate the malicious backdoor to some extent.
- Defensive trigger position: As discussed in Section 3.2, the position of the defensive trigger is essential for Principle 3, i.e., minimal impact on model performance. Table 2 shows that triggers placed in different positions (corner, random, and center) achieve a similar effect in defending against backdoor attacks. However, placing a trigger at the center of an image significantly degrades accuracy, as the trigger masks the core patterns of the image.
- Pixel value: For a square trigger, pixel value is also an important parameter for PDB. In Table 3, we evaluate PDB with a square defensive trigger with different pixel values, from which we can find that PDB can achieve high effectiveness across different pixel values.
Backdoor enhancement strategy during training:
- Increasing sampling frequency: Given a fixed number of defensive poisoned samples, the defensive backdoor can be further enhanced by increasing the sampling frequency of poisoned samples, forcing the model to pay more attention to defensive poisoned samples. Table 4 shows that a larger sampling frequency leads to a stronger defensive backdoor, resulting in a higher ACC and a lower ASR. Note that for the malicious attacker, the poisoning ratio is expected to be low to ensure the stealthiness of the attack.
- Data augmentation: To demonstrate the role of augmentation in PDB, we first provided a more detailed visualization of results with different strengths of augmentation in Fig 2 of Supplementary pdf. From Fig 2, we can find that PDB, without any sample augmentation ( $\alpha=0$ ), exhibits significant efficacy with ASR lower than 2%. As augmentation strength increases, the ASR decreases, indicating a stronger augmentation can help further enhance PDB's effectiveness. However, a tradeoff between augmentation intensity and model performance is also observed.

In summary, a visible trigger with larger trigger size, higher sampling frequency, and data augmentation contribute to meeting Principle 4.

Table 1: Results on PreAct-ResNet18 with Poisoning Ratio 5% and different defensive trigger size

Attack →	BadNet	BadNet	Blended	Blended	Sig	Sig	SSBA	SSBA	WaNet	WaNet
Defensive Trigger size ↓	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR
1x1	48.75	6.88	52.34	5.06	53.15	5.22	52.03	6.20	58.04	4.26
2x2	74.39	3.37	81.38	2.71	81.13	2.32	77.49	3.87	76.49	3.98
3x3	86.08	0.26	85.60	0.67	86.94	0.07	87.01	0.49	85.70	0.97
4x4	89.46	0.28	90.07	0.56	90.38	0.07	89.60	0.44	89.98	0.92
5x5	91.51	0.33	92.22	0.31	92.35	0.06	92.14	0.64	92.05	0.97
6x6	90.78	0.32	91.93	0.49	92.04	0.04	91.82	0.41	91.52	0.91
7x7	91.08	0.38	91.36	0.70	91.79	0.06	91.58	0.46	91.47	0.92
8x8	90.48	0.33	91.56	0.40	91.59	0.02	91.41	0.39	91.44	0.86
9x9	90.21	0.32	91.24	0.39	90.79	0.03	90.92	0.32	90.87	0.56

Table 2: Results on PreAct-ResNet18 with Poisoning Ratio 5% and different positions

Attack →	BadNet	BadNet	Blended	Blended	Sig	Sig	SSBA	SSBA	WaNet	WaNet
Defensive Trigger Position ↓	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR
Corner	91.08	0.38	91.36	0.7	91.79	0.06	91.58	0.46	91.47	0.92
Random	88.79	0.69	90.10	0.81	90.12	0.16	89.39	0.66	89.49	0.97
Center	87.35	0.63	87.82	0.44	88.19	0.06	87.93	0.89	87.70	0.93

Table 3: Results on PreAct-ResNet18 with Poisoning Ratio 5% and different pixel values

Attack →	BadNet	BadNet	Blended	Blended	Sig	Sig	SSBA	SSBA	WaNet	WaNet
Pixel ↓	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR
1.50	90.69	0.57	91.40	0.56	91.74	0.09	91.54	0.60	91.44	0.83
2.00	91.08	0.38	91.36	0.7	91.79	0.06	91.58	0.46	91.47	0.92
2.50	90.99	0.48	91.39	0.50	91.77	0.04	91.48	0.80	91.54	0.54
-0.50	90.94	0.48	91.78	0.91	91.56	0.01	91.64	0.61	91.69	0.60
-1.00	90.84	0.23	92.31	0.07	91.81	0.00	91.85	1.07	91.83	0.90
-1.50	90.86	0.29	91.62	1.00	91.88	0.00	91.43	0.66	91.86	0.78

Table 4: Results on PreAct-ResNet18 with poisoning ratio 5% and different sampling frequencies

Attack →	BadNet	BadNet	Blended	Blended	Sig	Sig	SSBA	SSBA	WaNet	WaNet
Frequency ↓	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR
1	91.01	0.69	91.19	1.48	91.38	4.62	91.19	1.24	91.13	1.07
3	91.06	0.57	91.27	1.39	91.73	0.10	91.28	0.72	91.44	0.97
5	91.08	0.38	91.36	0.70	91.79	0.06	91.58	0.46	91.47	0.92
7	91.34	0.27	91.56	0.59	91.98	0.04	91.89	0.43	91.84	0.27
9	92.15	0.20	92.19	0.50	92.27	0.02	92.30	0.31	92.48	0.16

最终决定Accept (poster)

2024-09-25

This work proposes a proactive defense against backdoor poisoning. The defender adds a defensive backdoor to some training samples and trains the classifier to classify the samples as belonging to class c + 1 when the defensive backdoor trigger is added to the input, where c is the original class. At test time, the defensive backdoor is employed to nullify the effect of malicious backdoors. All the reviewers have a positive opinion about this work as it brings a novel perspective and is well-written and easy to understand. The authors did a good job in the rebuttal phase, clarifying different aspects and adding extra experiments that strengthened this work. A disadvantage highlighted by the reviewers is that the proposed defense slightly decreases the classifier accuracy. However, it strongly enhances the security against backdoors. Therefore, we recommend acceptance. The authors should add to the paper the clarifications and extra experiments presented in the rebuttal.