PaperHub
6.0
/10
Poster3 位审稿人
最低6最高6标准差0.0
6
6
6
3.7
置信度
ICLR 2024

Adversarial Feature Map Pruning for Backdoor

OpenReviewPDF
提交: 2023-09-23更新: 2024-04-21
TL;DR

We propose a backdoor defense method that effectively detect and pruning feature maps that propagate backdoor information.

摘要

关键词
Backdoor DefenseData Poisoning

评审与讨论

审稿意见
6

In this manuscript, the authors propose a DNN pruning method called FMP to mitigate backdoors. While adding adversarial attacks to feature maps, their method prunes the features weak to the attacks.

优点

I find this paper interesting. It's important to understand the relationship between pruning and backdoors, and the authors explored this in a systematic way.

缺点

While reading the authors' method, it looks similar to Adversarial Neuron Pruning (ANP) by We and Wang (2021). However, the authors don't describe it in the related work, though they compare it with the proposed method in the result section. Discussing the methodological differences between them would help readers understand more.

问题

When the authors have parametric variables in a table, it should be better to draw these results with figures.

评论

Thanks for the reviewer's comments on FMP and ANP.

Firstly, as illustrated in Table 1 in the paper, it is evident that in the majority of experimental results, FMP consistently outperforms ANP. For instance, on average, FMP reduces the ASR from 68.27% to 2.86% in the CIFAR10 dataset. The key reasons for FMP achieving higher performance and the observed differences between FMP and ANP can be simply concluded that FMP focuses on the feature map level, aligning with the backdoor trigger's focus on the DNN feature map, rather than specific neurons within the DNN~(e.g., ANP), where we have added this to our revised paper now.

some other attributes can considered as the following aspects:

1. Feature Map Level Focus:

  • FMP: Concentrates on the feature map level, aligning with the backdoor trigger's focus on the DNN feature map, rather than specific neurons within the DNN.
  • ANP: During the pruning procedure, ANP may only prune a subset of backdoor-related neurons within the backdoor feature map, potentially leaving some undetected. This results in a higher ASR after the defense procedure.

2. Extraction of Backdoor Trigger Information:

  • FMP: Aims to identify the backdoor feature map extractor/learner that captures backdoor trigger information.
  • FMP: Utilizes each feature map to ascend the gradient and reproduce learned features. If the reproduced feature map is linked to the backdoor, there is a significant decrease in DNN prediction. Robust feature maps that are less influenced by the backdoor trigger are not pruned during the FMP pruning procedure.

On the other hand: 1. Direct Perturbation of Neuron's Bias and Weights:

  • ANP: Directly perturbs the neuron's bias and weights, impacting both normal and backdoor-related neurons.
  • ANP: Unlike FMP's focus on input sample perturbation, ANP's direct perturbation can cause neurons with a higher impact on DNN predictions to be represented as backdoor neurons. This may affect the backdoor neuron detector.
  • ANP: If ANP attempts to protect these neurons (i.e., does not remove them), some backdoor neurons may also remain, leading to a higher ASR.

In summary, while both FMP and ANP aim to defend against backdoor attacks, FMP's emphasis on the feature map level and the specific way it reproduces learned features contribute to its higher performance compared to ANP in mitigating backdoor attacks, as observed in experimental results.

Q1: When the authors have parametric variables in a table, it should be better to draw these results with figures. Thanks for the reviewer's recommendation, we are now adding the figures to the appendix. Due to page limitations, we will reorganize it in our final version.

评论

I appreciate the responses from the authors. My concerns were addressed in the author's reply.

评论

Dear Reviewer x3Nd,

Thank you for your comment dated 22 Nov 2023. We are pleased to hear that you are satisfied with our response and appreciate the valuable insights you have provided. We have strived to ensure that all your concerns were comprehensively addressed in our reply. Your feedback has been instrumental in enhancing the quality of our work.

If you find the revisions and responses satisfactory, we kindly ask if you might consider reflecting this in the overall rating for our submission. Your support and constructive feedback are greatly appreciated, and we look forward to any further suggestions or comments you may have.

Best regards,

Authors of FMP

评论

Dear Reviewer x3Nd,

We hope this message finds you well. We are writing to follow up on our previous correspondence dated 22 Nov 2023 regarding the manuscript we submitted for your review.

We would like to reiterate our gratitude for your initial feedback, which was invaluable in guiding our revisions. We endeavored to address all the concerns you raised thoroughly, and it was gratifying to know that our responses were well-received. Additionally, if our revisions and responses have satisfactorily addressed your concerns, we would be grateful if this could be reflected in the overall rating for our submission. Your final review and comments are not only essential for the progress of our manuscript but also serve as a critical benchmark for us to gauge the quality of our work.

We appreciate the time and effort you have put into reviewing our manuscript, and we look forward to any further suggestions or comments you may have.

Thank you once again for your valuable contributions to our work.

Best regards,

Authors of FMP

审稿意见
6

This paper proposes Adversarial Feature Map Pruning for Backdoor (FMP), a new method for backdoor mitigation in neural networks. FMR does not require access to the trigger or poisoned data. Based on a clean data sample, the method reverse-engineers poison triggers from each feature map in the model (at multiple layers) based on back-propagation. The weights determined to be connected to the triggers are reinitialized and fine-tuned on clean data. Experiments are performed on CIFAR-10, CIFAR-100 and GTSRB against a wide range of attacks and defenses.

优点

  • The paper provides an extensive evaluation using the standard BackdoorBench benchmark, against multiple attacks and compared to multiple defenses.
  • The proposed FMT seems to perform well on average.
  • The source code was provided and is pledged to be available open-source upon paper acceptance.

[Update based on authors' response] I would like to thank the authors for their answer and additional results. I think updating the paper based on the discussion would improve it. I have raised my score.

缺点

Novelty and prior work

  • The novelty of the paper seems limited. The ideas of using adversarial examples to reverse engineer triggers (e.g., ANP, AEVA) or pruning and retraining trigger weights (RNP) are not themselves novel. The paper does not cite most of these very close prior results and does not provide a conceptual comparison to them.
  • The prior art section mainly addresses defenses from different categories than the present one, which are then easy to dismiss. Outside the specific ideas used in FMT, there are many methods that address the same setup as the current paper and operate without knowledge of the trigger (e.g., DeepInspect, TABOR, ABS, [Fu et al., 2020]). These could be included in the experimental comparison.

Performance

  • Tab. 1 shows that the proposed method is only the best on average, but not doing so well on individual benchmarks (i.e., for each attack and dataset). The commentary section also fail to quantify how many benchmarks are actually won by the proposed method.

Clarity

  • Certain points in the paper are not clear, see also the questions below. Moreover, the components of the method are rather explained in an algorithm than in the text.
  • Assumptions should be clear earlier in the paper, e.g. the fact that a clean input sample is required.
  • The typography of the paper could be improved, please see some suggestions below. The paper could also use additional proofreading.
  • Tab. 1 is very dense and lacking highlights of the best values for each attack. As such, it is currently difficult to interpret.

Minor remarks

  • It is unclear why the feature maps are summed in the Notations paragraph (Sec. 3).
  • Some numerical results are typeset with spaces before the decimals (e.g., 2. 86%). These spaces should be removed
  • Consider using \citep when the cited references are not part of the sentence (see here).

问题

  1. What is the distinction the paper makes between a neural network layer output and a feature map?
  2. What does it mean in Alg. 1 that Correct_list is sorted in ascending order? According to which criterion?
  3. Are all the weights of the model refined during the fine tuning step or just those detected as being part of backdoors (i.e., the weights that are reset to zero)?
  4. Why is the backdoor feature initialization done by setting weights to zero, instead of using the same initialization strategy as when training the model for the first time (i.e., various random sampling strategies)? Has this alternative been considered?
  5. What is the impact of varying method parameters ϵ\epsilon and pp under other attacks than BadNets?
评论

Thanks for the reviewer's comments on FMP.

For Q1, in DNN, each layer contains multiple channels, whereas in FMP, we define the feature map as a channel in the neural network layer.

For Q2, first, the correct_list can also represent the accuracy list of the deep neural network for different feature maps attacked by FRG. Next, once FRG is conducted, each feature map may have different accuracy. Our motivation is that in the backdoor model, once the backdoor trigger has been added to the input (the backdoor feature map reproduces the backdoor information with an adversarial attack), the DNN will have lower accuracy compared to attacking other feature maps. Then, we will ArgSort the correct_list and obtain the N/p feature maps that have lower accuracy on the left N/p (that's why we use ascending order).

For Q3 and Q4, all feature maps are fine-tuned (but only backdoor feature maps will be initially set to zero). To address the reviewer's concern about the training strategies, we evaluatedte how the initialization and tuning methods affect FMP's effectiveness in the table below:

BackdoorBadNets AccBadNets ASRBadNets RABlended AccBlended ASRBlended RAFrequency AccLow Frequency ASRFrequency RASSBA AccSSBA ASRSSBA RAWaNet AccWaNet ASRWaNet RA
All_tuning91.671.6791.7191.856.4474.4391.771.9090.5291.922.8988.5993.421.3888.98
vulnerable feature map tuning91.541.691.6391.946.3274.1992.022.0290.3691.973.1688.7593.491.1291.86
Zexavier_uniform91.681.8491.8191.846.3374.2891.811.8190.4792.022.9588.4893.611.3892.03
kaiming_uniform91.621.5691.7191.76.374.5491.751.9590.5591.782.8888.4493.411.3492.13

Table 1: Performance comparison (%) of backdoor defense methods on CIFAR10, CIFAR100, and GTSRB datasets under PreActResNet18, under different attack strategies with a poison rate of 10% and retraining data ratio of 100%. We set the ϵ\epsilon to 1/255, and the pp is set to 64.

Answer to Q5: Thanks for reviewer's concern for the ϵ\epsilon and pp under other attacks. To address reviewer's concern, we conduct experiment for LowFrequency and WaNet with different ϵ\epsilon and pp in the following tables:

ϵ\epsilon1/2554/25516/255
Acc91.7790.3288.59
ASR1.901.461.31
RA90.5290.1789.10

Table 1: FMP's effectiveness under different ϵ\epsilon in CIFAR10 dataset under Low Frequency attack.

pp41664
Acc84.9889.4191.77
ASR1.791.921.90
RA83.2788.6490.52

Table 2: FMP's effectiveness under different pp in CIFAR10 dataset under Low Frequency attack.

ϵ\epsilon1/2554/25516/255
Acc93.4291.0789.78
ASR1.381.351.39
RA92.1391.1489.53

Table 3: FMP's effectiveness under different ϵ\epsilon in CIFAR10 dataset under WaNet attack.

pp41664
Acc86.1790.8293.42
ASR1.641.811.38
RA85.3189.8792.13

Table 4: FMP's effectiveness under different pp in CIFAR10 dataset under WaNet attack.

We canfind that in different ϵ\epsilon and pp configuration, FMP has same behaviors in BadNets. Hope these experiments can address reviewer's concern for the affect of hyper-parameters in our experiments.

Notes: we will address all minor comments in our final version. For the provided related works, we will also add it in our final version.

评论

First, to address Reviewer gATw's concern for FMP and other strategies that use adversarial examples to reverse engineer triggers (e.g., ANP, AEVA) or pruning and retraining trigger weights (RNP), we evaluate FMP with these strategies in the CIFAR10 dataset with several backdoor attack scenarios in our evaluation set up.

The evaluation results are shown in below:

BackdoorBadNets AccBadNets ASRBadNets RABlended AccBlended ASRBlended RAFrequency AccLow Frequency ASRFrequency RASSBA AccSSBA ASRSSBA RAWaNet AccWaNet ASRWaNet RA
ANP91.2273.3626.1693.2599.440.5693.1998.031.8892.9268.5929.1390.811.9388.98
AEVA91.0550.9647.5392.2859.3738.6693.0559.8136.3892.2967.5626.0190.266.5490.59
RNP90.5555.0136.4692.2955.5942.1592.4158.7140.191.9461.2430.690.2218.1572.95
FMP91.671.6791.7191.856.4474.4391.771.9090.5291.922.8988.5993.421.3888.98

We can observe that FMP's Attack Success Rate (ASR) is lower than that of other strategies. The primary reason for this is FMP's focus on the feature map level, which aligns with the backdoor trigger's emphasis on the DNN feature map, rather than on specific neurons within the DNN (e.g., as in ANP).

Secondly, the pruning at the feature map level enables FMP to rapidly eliminate the backdoor trigger from the model. In contrast, neuron-level pruning typically requires significant overhead due to the need for repeated prune-finetune-evaluation cycles. FMP is particularly advantageous in scenarios with limited computational resources, where developers may not have the capacity to extensively evaluate and remove backdoors from the model. This limitation results in strategies like ANP, AEVA, and RNP exhibiting higher ASR compared to FMP in our setup.

评论

First, to address Reviewer gATw's concern for FMP's performance compared with other SOTA ways reported in security conference, we evaluate FMP with DeepInspect, TABOR, ABS, and [Fu et al., 2020] in the following Tab.

The evaluation results are shown below:

BackdoorBadNets AccBadNets ASRBadNets RABlended AccBlended ASRBlended RAFrequency AccLow Frequency ASRFrequency RASSBA AccSSBA ASRSSBA RAWaNet AccWaNet ASRWaNet RA
DeepInspect90.5115.8764.4490.893.577.1690.834.777.5190.0510.5773.8390.315.9777.94
TABOR90.789.1979.0290.7811.1378.2290.145.4876.4490.0313.0976.7190.957.3678.29
ABS90.785.6177.3490.9514.4480.3390.9310.4388.3290.9517.1277.4590.613.0477.6
[Fu et al., 2020]90.711.2276.7290.289.3775.6390.469.6271.8790.066.7477.4390.6614.9869.13
FMP91.671.6791.7191.856.4474.4391.771.9090.5291.922.8988.5993.421.3888.98

We can find that FMP obtains SOTA performance in most of our experimental results. As discussed before, the primary reason for this is the pruning at the feature map level enables FMP to efficiently and rapidly eliminate the backdoor trigger from the model. While other defense strategies will not obtain the SOTA performance due to the limited computational resources.

评论

Performance: We noticed Reviewer gATw's concern that in Table 1, FMP is only the best on average. In reality, we used bold text only for the average results because applying it to all experimental results would make them difficult for humans to read. However, FMP is the state-of-the-art (SOTA) in most experiments. We have revised the paper to add bold text for all results. Specifically, FMP achieves SOTA performance in 25 out of 30 evaluation metrics for ASR and RA. We have now bolded all SOTA results in Table 1 to prevent any misunderstandings from Reviewer gATw.

Clarity and Minor Issues: Thank you for Reviewer gATw's recommendations regarding revisions to the paper. We will add more details about our algorithm, rather than only presenting it. We will also use a proofreading tool to address any typos and grammar issues.

评论
BackdoorBadNets AccBadNets ASRBadNets RABlended AccBlended ASRBlended RAFrequency AccLow Frequency ASRFrequency RASSBA AccSSBA ASRSSBA RAWaNet AccWaNet ASRWaNet RA
All_tuning91.671.6791.7191.856.4474.4391.771.9090.5291.922.8988.5993.421.3888.98
Vulnerable feature map tuning91.541.691.6391.946.3274.1992.022.0290.3691.973.1688.7593.491.1291.86
Zexavier_uniform91.681.8491.8191.846.3374.2891.811.8190.4792.022.9588.4893.611.3892.03
kaiming_uniform91.621.5691.7191.76.374.5491.751.9590.5591.782.8888.4493.411.3492.13

Table 1: Performance comparison (%) of backdoor defense methods on CIFAR10, CIFAR100, and GTSRB datasets under PreActResNet18, under different attack strategies with a poison rate of 10% and retraining data ratio of 100%. We set the ϵ\epsilon to 1/255, and the pp is set to 64.

the "All_tuning" strategy, where all feature maps in the model are fine-tuned, there's a consistently high accuracy across different backdoor attacks, with BadNets Acc reaching 91.67%, Blended Acc at 91.85%, Frequency Acc at 91.92%, and WaNet Acc at 88.98%. The Attack Success Rate (ASR) and Robustness Accuracy (RA) also indicate effective mitigation, particularly notable in the WaNet scenario with a low ASR of 1.38% and high RA of 88.98%.

On the other hand, the "Vulnerable feature map tuning" strategy, which focuses on fine-tuning only vulnerable feature maps, shows a slightly varied performance. The accuracy is slightly lower in some cases, like BadNets Acc at 91.54% and Blended Acc at 91.94%, compared to "All_tuning". However, this strategy shows a better robustness in the WaNet scenario with an improved ASR of 1.12% and RA of 91.86%.

Looking at the initialization strategies, "Zexavier_uniform" and "kaiming_uniform", we see that these methods also maintain high accuracy and robustness. "Zexavier_uniform" shows slightly higher ASR in some cases, such as 1.84% in BadNets and 6.33% in Blended, but maintains good RA, particularly in WaNet with 92.03%. "kaiming_uniform" demonstrates a consistent performance with low ASR, like 1.56% in BadNets and 1.34% in WaNet, and high RA, peaking at 92.13% in WaNet.

Overall, the differences in performance metrics across these strategies are relatively minor, suggesting that each of these fine-tuning and initialization strategies is effective in mitigating backdoor attacks in the context of this experiment. The slight variations highlight the importance of choosing the right strategy depending on the specific requirements of the defense scenario, such as prioritizing either accuracy or robustness. The overall effectiveness of these strategies in the face of different attack methods, as shown in Table 1, provides a comprehensive view of their applicability and efficiency in enhancing the security of machine learning models against backdoor attacks.

评论

Dear Reviewer gATw,

we have add all experiment required by you, e.g., comparison with other adversarial-related baselines and compasiron with other SOTA defense strategies, in our paper's appendix.

In summary, the primary reason for this is FMP's focus on the feature map level, which aligns with the backdoor trigger's emphasis on the DNN feature map, rather than on specific neurons within the DNN (e.g., as in ANP).

Secondly, the pruning at the feature map level enables FMP to rapidly eliminate the backdoor trigger from the model. In contrast, neuron-level pruning typically requires significant overhead due to the need for repeated prune-finetune-evaluation cycles. FMP is particularly advantageous in scenarios with limited computational resources, where developers may not have the capacity to extensively evaluate and remove backdoors from the model. This limitation results in strategies like ANP, AEVA, and RNP exhibiting higher ASR compared to FMP in our setup.

We highly hope Reviewer gATw can consider our experiment results and if Reviewer gATw has any question for our paper, feel free to point out and we will try to address it quickly.

审稿意见
6

This paper attempts to mitigate the backdoor model by generating all possible adversarial feature maps. Each generated adversarial feature map is fed to the model to test whether it will misclassify data samples, aiming to identify malicious feature maps that may be caused by a trigger. The innovation of the proposed algorithm lies in the fact that it does not require prior knowledge of the trigger pattern through reverse engineering, and it does not impose constraints on the trigger pattern size, as seen in other defense algorithms like Neural Cleanse. The proposed algorithm was evaluated on three datasets: CIFAR-10, CIFAR-100, and GTSRB.

优点

  1. The proposed algorithm is effective for large trigger backdoored models.
  2. The proposed algorithm mitigates the backdoored model without the need for reverse engineering the trigger.
  3. It has been evaluated on three datasets.

缺点

  1. The presentation needs improvement as there are many confusing descriptions, referring to the Question section.
  2. The three datasets appear to contain a relatively small number of classes. It would be more convincing if the algorithm could be evaluated on more complex datasets, such as ImageNet.

问题

  1. There are several confusing descriptions. For instance, 'f' and 'F' represent the model and feature map, as described in the Notations section. However, in Section 3.2 and Algorithm 1, 'f' has a mixed meaning.
  2. Should the second 'for' loop in Algorithm 1 return '\hat{x'}? Is that correct?
  3. The logic of 'inference()' in Algorithm 1 appears to be incorrect. If a feature map does not change the classification of 'x,' it is a normal feature map and should be retained. However, in Algorithm 1, it is pruned. Why is this the case?
评论

We deeply appreciate the feedback from Reviewer W4kG on our paper.

We have revised the paper for the comments provided by Reviewer W4kG.

Specifically:

For Q1: Thank you for pointing out the confusing descriptions in our paper. Specifically, both FθiF_{\theta}^{i} and fθif_{\theta}^{i} refer to the i-th feature map in the DNN. We will replace FθiF_{\theta}^{i} to fθif_{\theta}^{i} and clarify this in our final version.

For Q2: Regarding the returned FRG-generated adversarial sample xx', Reviewer W4kG can also consider it as x^\hat{x'}, consistent with y^\hat{y} in Algorithm 1, line 8. We acknowledge the confusion caused by using xx in Algorithm 1, line 7. To avoid misunderstanding, the our revised version, we use the xx', yy' to replace Algorithm 1, lines 7 and 8 now.

For Q3: Normal feature maps will not be pruned. To clarify, "If a feature map does not change the classification of 'x,' is it a normal feature map and should be retained?" is accurate. In other words, robust/normal feature maps will not be pruned, meaning we will only prune feature maps with lower accuracy in the Correct_List. This is why we use ascending order and then prune the left N/p feature maps.

We also acknowledge Reviewer W4kG's suggestion to conduct experiments on a more complex dataset, such as ImageNet. Unfortunately, due to the extended training time required for ImageNet, we can only provide experimental results on Tiny-ImageNet now. However, we commit to presenting results on ImageNet in our final version.

The experiment results are presented below:

AttackBenign ACCBenign ASRFP ACCFP ASRANP ACCANP ASRFMT ACCFMT ASR
BadNet55.1399.9251.2899.3751.381.3955.240.08
Blended55.0399.8551.8493.2852.0719.3553.810.01
WaNet54.7399.3252.1765.3251.098.9253.541.37

It's noteworthy that FMT performs better than baseline approaches. For instance, in BadNet, FMT reduces the ASR from 1.39% to 0.08%.

If Reviewer W4kG has any questions, feel free to provide them, and we would be more than happy to address and clarify any queries or concerns.

评论

Thank you for your reply. Since the Tiny-ImageNet is still relatively small-sized dataset, my concern remains.

Reviewer W4kG

评论

Dear reviewers,

Thank you again for all the informative and constructive feedback! We truly appreciate all the suggestions from the reviewers to improve this work. We have revised the paper and address all concerns from the reviewers. As the discussion period is ending soon, please do let us know you have any more concerns or would like to discuss further about the paper.

Regards,

评论

Dear Area Chairs, Senior Area Chairs, and Program Chairs,

I hope this message finds you well. I am following up on our previous correspondence regarding the rebuttal for our paper titled "Adversarial Feature Map Pruning for Backdoor."

As of today, we have not observed any engagement from the reviewers with our rebuttal, submitted on November 11, 2023. This lack of interaction is particularly concerning as it directly impacts the fairness and thoroughness of the review process, especially with the review deadline of November 23, 2023, looming.

We understand and respect the immense workload and pressures faced during the review period. However, the absence of reviewer engagement, coupled with the lack of revised scores or additional queries about our paper, puts us at a significant disadvantage. It denies us the opportunity to clarify misunderstandings or provide additional information that could be crucial in evaluating our work fairly.

Therefore, we respectfully urge a prompt intervention. Could you please confirm whether the reviewers have been reminded of their responsibility to consider our rebuttal? We are deeply invested in this process and rely on its integrity for an equitable assessment.

We remain available to provide any further information or clarification that may assist the reviewers in their task. Your prompt action in this matter would be greatly appreciated and could be instrumental in ensuring a fair and effective review process.

Thank you for your understanding and support. We look forward to a resolution that allows for a comprehensive and fair evaluation of our work.

Best regards,

评论

Dear reviewers,

I would like to extend our gratitude once more for your invaluable comments and guidance. Your detailed and constructive feedback has been instrumental in enhancing the quality of our work. We have carefully revised the paper, ensuring that all the concerns raised by you have been thoroughly addressed. As the discussion period is drawing to a close, we invite any further comments or points of discussion you might have regarding our paper. Your insights are highly valued and crucial in refining our work to its best form.

Best regards,

AC 元评审

The authors propose a new method using sensitive feature maps for the neural networks' finetuning to remove the backdoor. They first try to generate all possible adversarial feature maps via the classification results and then use the generated feature map to remove the potential malicious triggers inside the original networks. The paper has the following strengths:

  1. The proposed algorithm is an easy and effective way to generate trigger features, compared with other reverse engineering methods.
  2. The paper's empirical results are substantial, clearly demonstrating their effectiveness.

Weaknesses:

  1. Lack of analysis or understanding of the reason for their methods. Readers may not get many insights from this method.

Decision:

Since the method is effective and all reviewers agree with accepting this paper. I'd like to accept it as an ICLR poster.

为何不给更高分

Lack of analysis or understanding of the reason for their methods. Readers may not get many insights from this method.

为何不给更低分

All reviewers agree to accept and their methods are effective.

最终决定

Accept (poster)