3.0

/10

Rejected7 位审稿人

最低1最高5标准差1.1

4.0

置信度

ICLR 2024

Certified Copy: A Resistant Backdoor Attack

Omid Rajabi Rostami,Rui Ning,Chunsheng Xin,Jin-Hee Cho,Jiang Li,Hongyi Wu

OpenReview PDF

提交: 2023-09-24更新: 2024-02-11

TL;DR

A resistant backdoor attack designed to escape detection methods and show the potential malicious uses of deep neural networks.

摘要

关键词

Backdoor attackDeep Neural NetworkDetection methods

评审与讨论

审稿意见

评分: 1置信度: 42023-10-28

This paper proposes a backdoor attack, which aims to mix the poisoned samples and the normal samples in the feature space, in order to bypass the detection.

优点

The experiments at least cover the ResNet-50 on CIFAR-10.

缺点

The presentation and the writing is poor. For example, all the figures are not well-prepared. Typos and grammatical errors.
What's more important, the motivation, the strength, and the novelty of this attack is not clearly addressed. For example, in the related work part, the authors write "However, the problem is that most attacks do not follow the assumptions of defense mechanisms, allowing them to bypass detection mechanisms". This is usually not a good reason to not compare with existing attacks. I cannot find the strength of this work compared with Latent Backdoor and other more advanced attack, other than the above one.

问题

Please discuss and compare with existing backdoor attacks.

审稿意见

评分: 3置信度: 42023-10-31

The paper proposes a backdoor attack called "Certified Copy" to evade most of existing backdoor detection methods. By using a cost function, Certified Copy can train backdoor models that generate similar activations of neurons for clean samples and backdoored samples. Specifically, the method can be divided into two stages: (1) train a model by adding the poisoned samples as a extra class (2) remove the extra class and finetune the model using augmented datasets. Extensive experiments on BadNet demonstrate the effectiveness of proposed method against seven backdoor defenses.

优点

The motivation of proposed backdoor attack is to ensure the similarities of neuron activations of models for clean samples and backdoored samples. This motivation is rationable and interesting. Many ablation studies including activations of different models and detection results of different methods, are conducted to analyze the proposed method.

缺点

The novelty is limited. For the first stage, the method just trains a backdoor model with an extra class. For the second stage, the method needs to train a clean model and align the outputs of clean model and backdoor model via cost function. Used cost function is just the integration of KL loss, MAE loss and Cosine loss.

The experiments are not sufficient. The paper only uses one backdoor attack to demonstrate the effectiveness of proposed method. It is better to show the results for more backdoor attacks such as Blended [1], WaNet [2], label-consistency [3].

[1] Chen, Xinyun, et al. "Targeted backdoor attacks on deep learning systems using data poisoning." arXiv preprint arXiv:1712.05526 (2017).

[2] Nguyen, Tuan Anh, and Anh Tuan Tran. "WaNet-Imperceptible Warping-based Backdoor Attack." International Conference on Learning Representations. 2020.

[3] Turner, Alexander, Dimitris Tsipras, and Aleksander Madry. "Label-consistent backdoor attacks." arXiv preprint arXiv:1912.02771 (2019).

问题

The method needs to train a clean model at the second stage. Is the proposed method efficient enough for a large dataset e.g. ImageNet?

The proposed method aims to ensure similaries of neurons activations of models for clean and backdoored samples. It makes sense that the proposed method can evade neurons activations-based backdoor detection methods. How about other kinds of backdoor detection methods?

How to set hyperparameters including \alpha_1, \alpha_2, \alpha_3 and \alpha_4 in cost function? Are these hyperparameters data-dependent?

审稿意见

评分: 3置信度: 42023-11-01

The paper presents a code-poisoning, targeted backdoor attack against deep image classifiers that is claimed to be less detectable than existing works. The main idea is to add a loss term that regularizes activations on poisoned samples (containing a secret trigger pattern) to match those on clean samples (without the trigger). The authors evaluate the detectability of their backdoor against five defenses.

优点

The paper is well-written and easy to follow
The experiments are sound, and the authors evaluate their method's detectability using many attacks

缺点

Motivation. The threat outlined by the authors on p.3 is that one downloads a model from a model zoo and deploys it into a security-critical domain. However, how much of a threat is this in practice? Is there any example where models from untrusted providers are used in security-critical domains?

Certification. I was initially a little confused about the word "certified" as it has a distinct meaning in the domain of trustworthy ML. Certified usually means that something is guaranteed true with a probabilistic bound, but this does not seem true here as there is no formal guarantee. Do the authors agree that this may be confusing to some?

Missing Robustness Evaluation. The paper, unfortunately, does not evaluate robustness. There is likely a trade-off between robustness and detectability, and I would love to see that reflected and analyzed in the paper. We know that code poisoning attacks exist that are difficult to detect [A], but so far, no one has come up with defenses against these attacks (likely because the attacker has too many capabilities).

Limited Novelty. What are your improvements over the attack proposed in [A]? Also, could you please elaborate on the difference between your loss and the loss used in [B] to bring poison and clean activations closer together? Why is one better than the other?

Minor Comments:

Typo: Page 2 has a bold question mark in what should be an icon of a white square.

[A] Hong, S., Carlini, N., & Kurakin, A. (2022). Handcrafted backdoors in deep neural networks. Advances in Neural Information Processing Systems, 35, 8068-8080.

[B] Jia, Hengrui, et al. "Entangled watermarks as a defense against model extraction." 30th USENIX Security Symposium (USENIX Security 21). 2021.

问题

Please elaborate on the novelty of your work compared to [A] and [B] (see above).
How robust is your attack? Do you observe a robustness-detectability trade-off?
What chances do you see for a defender in defending against your attack? Is it a lost cause?

审稿意见

评分: 3置信度: 52023-11-02

This paper proposes a backdoor attack method called Certified Copy to evade detection of the defenders.

优点

This paper proposes a backdoor attack method called Certified Copy to evade detection of the defenders.

缺点

The novelty is limited. Leveraging representation similarity loss to evade detection has been proposed before.
The backdoor detection methods and backdoor attack methods are too old. Most of them are 2018-2019.
Figure 4 is vague.
The presentation of this paper is poor.

问题

1.What if the proposed attack is tested on backdoor removal defense methods?

审稿意见

评分: 3置信度: 42023-11-02

This paper presents a new backdoor attack that is designed to evade existing backdoor detection methods. Specifically, the attacker trains the backdoored model side-by-side with a clean model and ensures the similarity between their similarity on clean data; meanwhile, the attacker superimposes the backdoor function into the backdoored model. It is shown that the backdoored model is evasive with respect to a number of defenses including Neural Cleanse, TAO, ABS, TABOR, NNoculation, IBAU, and STRIP.

优点

Backdoor attacks represent a major threat to machine learning security. The paper contributes to the rich literature on backdoor attacks.
The evaluation shows the attack's evasiveness with respect to a number of defenses.
The paper is well structured and easy to follow.

缺点

Given the plethora of backdoor attacks, it is unclear to me what the new insights this work provides in addition to the exiting literature. There are many backdoor attacks that also aim to evade existing defenses (e.g., "Revisiting the Assumption of Latent Separability for Backdoor Defenses", Qi et al, ICLR '23).
The attack assumes a strong assumption that the victim will use the backdoored model as it is. It is more often in practice that the user will adapt the pre-trained model to downstream datasets. It is suggested to consider such scenarios.
The proposed attack technique seems similar to "Latent Backdoor Attacks on Deep Neural Networks" (Yao et al, CCS '19). It is suggested to make a detailed comparison.

问题

Please articulate the new insights of this work.
Please compare the proposed attack with "Latent Backdoor Attacks on Deep Neural Networks".

审稿意见

评分: 5置信度: 42023-11-05

The authors introduce "Certified Copy," a deceptively simple yet effective backdoored model that evades detection by training with a novel cost function. This cost function is key to the model's stealthiness, as it ensures neuron activation remains consistent between clean and poisoned inputs, only diverging from the "clean" model behavior when triggered by poisoned data.

The research provides an extensive evaluation of this approach against seven advanced defense mechanisms, demonstrating the Certified Copy model's ability to bypass these detection systems effectively.

优点

I think the proposed attack is interesting and inspiring. It manipulates backdoor behavior on latent representation space, making backdoor detection more challenging.
The presentation is motivating and easy to follow.

缺点

The rationale behind the design details is not fully convincing. For example, why does the MAE term have control over trigger location?

问题

For the MAE term in Section 3.4, why the MAE can have control over the locations?
Could you discuss or compare with latest feature-level backdoor detection?

SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning, M. Zheng et al., 2023
Detecting Backdoors in Pre-trained Encoders, S. Feng et al., CVPR'2023

Will the augmented dataset be biased? Because true pattern + true location is less frequent. Does this affect the attack performance?
Why the method is called certified copy? What is the relation with certification?
Is the proposed attack limited to all-to-one attack? It seems so because in te first phase there is only one extra class, which is latter finetuned to the target class.

审稿意见

评分: 3置信度: 32023-11-05

The paper proposes a new method called Certified Copy, which is able to enhance simple backdoor attack's robustness against existing backdoor detection mechanisms. The method includes training the backdoor model using a novel cost function that controls the activation of neurons to ensure that the model behaves similarly on both clean and poisoned input data. This allows the model to evade detection by most state-of-the-art defense mechanisms. Experiments are conducted with seven state-of-the-art defense mechanisms, including Neural Cleanse, TAO, ABS, TABOR, NNoculation, IBAU, and STRIP to show the attack's robustness.

优点

The idea of training a backdoored model to have the same hidden representations as a clean model, while still retaining the ability to perform the malicious behavior is interesting.
The method appears to have noticable improvements compared to the baseline.

缺点

The motivation of this work is quite questionable. While the paper claims, "This study aims to answer the question of whether it is possible to maintain those assumptions and still bypass detection", I am still quite confused why those assumptions of defense mechanisms are necessarily needed to be maintained. As it is also stated, "However, the problem is that most attacks do not follow the assumptions", which indicates there are existing attacks that can evade defenses by breaking typical backdoor assumptions. Indeed, for example, [1] already proposed to add the constraint to make the backdoor hidden in latent space. So, why do we need this study? It seems to me that this work mostly introduces a technique to improve a quite outdated attack (BadNets).
As the data agumentation step of this method encourages the backdoor behavior to be sensitive to the trigger's appearance and location, it might be ineffectual in physical world where the trigger in the digitized image may be different from that of the one used for training, as discussed in [2]. Therefore, I think the authors should consider conducting experiments with transformation-based defense proposed in [2] to better evaluate the attack's robustness.
I quite disagree with the statement, "fine-tuning the attacked model, even with validation data, may not be practical in real-world applications." in the conclusion section. Fine-tuning appears in many practical AI systems as a post-processing technique to revise the pretrained models to better fit the user’s need. Therefore, the fact that the attack's performance of this method can be successfully reduced by fine-tuning-based defenses limits its applicability.

[1] Doan, Khoa, Yingjie Lao, and Ping Li. "Backdoor attack with imperceptible input and latent modification." (NeurIPS 2021)
[2] Li, Yiming, et al. "Backdoor attack in the physical world." arXiv preprint arXiv:2104.02361 (2021).

问题

Regarding my concerns above, I hope the authors could elaborate this work's motivation as well as further prove the proposed method's robustness.
(minor) It seems like there is a typo in the backdoor attacks passage of the related works section ("...such as a white square ?")

AC 元评审

2023-12-05

This paper creates an adaptive backdoor attack specifically to evade detectors by optimizing an objective so that the model behaves similarly on clean and backdoored samples. Reviewers took issue with the motivation, quality of writing, and the significance of the work’s contributions. The authors did not rebut any of the reviews, so I am inclined to reject this paper.

为何不给更高分

The authors did not respond to any of the multiple issues brought up by reviewers, and without improving the manuscript, this paper is not ready for publication in its current form.

为何不给更低分

N/A

最终决定Reject

2024-01-16

Reject