$DPOT_{L_0}$: Concealing Backdoored model updates in Federated Learning by Data Poisoning with $L_0$-norm-bounded Optimized Triggers
摘要
评审与讨论
The authors proposed a new backdoor attack method named DPOT. DPOT generates triggers by: 1) utilizing the sensitivity of the model's output concerning each pixel of the input to determine which positions should be selected, and 2) optimizing the pixel values of the triggers to maximize their effectiveness. The authors conducted numerous experiments to validate the performance of DPOT.
给作者的问题
See above.
论据与证据
I am confused about the authors' threat model. I do not understand why a malicious client cannot manipulate their local training process (Lines 128-133).
方法与评估标准
The method seems reasonable.
理论论述
Since the triggers are added to the input data rather than the features, Proposition 5.1 cannot fully demonstrate the effectiveness of the authors' method.
实验设计与分析
Overall, the experiments are comprehensive. However, there are some issues. First, the authors do not specify the exact size of the triggers. Second, in federated learning, not every client is selected to participate in each round (Line 269).
补充材料
I reviewed the theoretical proofs and experimental results in the appendix.
与现有文献的关系
Although the authors present a new backdoor attack method, I currently do not see its practical value. In my view, it merely utilizes gradient information to determine the location of the triggers.
遗漏的重要参考文献
Some federated backdoor attacks [1,2,3] also employ dynamic triggers.
[1] Iba: Towards irreversible backdoor attacks in federated learning
[2] Bad-PFL: Exploring Backdoor Attacks against Personalized Federated Learning
[3] Lurking in the shadows: Unveiling Stealthy Backdoor Attacks against Personalized Federated Learning
其他优缺点
The authors' method achieves state-of-the-art performance, and the experiments are very comprehensive.
The writing needs improvement. At present, the motivation behind the authors' method is limited.
其他意见或建议
Since the authors used a Non-IID setting, they could test the proposed method's attack performance for personalized federated learning.
- I am confused about the authors' threat model. I do not understand why a malicious client cannot manipulate their local training process (Lines 128-133).
We found this Wikipedia page very helpful for understanding our threat model: https://en.wikipedia.org/wiki/Trusted_execution_environment. It introduces the concept of TEEs and lists many commercial TEEs. Our threat model assumes that the local training process is executed within TEEs, which the FL server can verify by having each TEE attest to its software and inputs. TEEs greatly increase the difficulty of malicious manipulation.
- Since the triggers are added to the input data rather than the features, Proposition 5.1 cannot fully demonstrate the effectiveness of the authors' method.
Following the problem setup of the feature learning theory paper (line 589), we consider a datum is composed by various patterns with some of patterns are critical for the classification, which we model as linear vectors and name them as features. Backdoor data's features are trigger patterns. We used to represent a dataset containing only benign features, and to represent a dataset containing both benign and backdoor features. Proposition 5.1 correctly explains theoretical insights of our method in dataset-level, rather than data-level.
- In federated learning, not every client is selected to participate in each round (Line 269).
We experimented with scenarios where not every client is selected to participate in each round by randomly sampling a portion of clients for each round training, and used Selection Ratio (SR) to determine this portion. We present results for SR = 0.5 and SR = 1, comparing with FT and DFT against three different defense strategies on CIFAR-10 as the main training task. As shown in the following table, demonstrates better attack effectiveness in both SR settings and achieves a sufficient ASR faster than the other attacks.
| Final | ASR | ----- | Avg | ASR | ----- | MA | ----- | ----- | Rounds to | achieve | ASR > 50 | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ours | FT | DFT | Ours | FT | DFT | Ours | FT | DFT | Ours | FT | DFT | ||
| SR = 0.5 | Fedavg | 100 | 100.0 | 97 | 98.6 | 90 | 65 | 70.5 | 70.3 | 69.8 | 2 | 16 | 47 |
| - | Trimmed Mean | 100 | 100.0 | 70.4 | 95 | 75.1 | 37.5 | 70.4 | 69.6 | 70.0 | 3 | 34 | 104 |
| - | FoolsGold | 100 | 100.0 | 97.5 | 98.8 | 93.9 | 73.6 | 70.2 | 70.3 | 69.6 | 2 | 7 | 33 |
| SR = 1 | Fedavg | 100 | 100.0 | 92.5 | 98.5 | 88.1 | 50.2 | 70.7 | 70.4 | 71.4 | 3 | 21 | 80 |
| - | Trimmed Mean | 100 | 94.8 | 38.3 | 88.6 | 59.4 | 22.6 | 70.4 | 70.2 | 70.8 | 7 | 65 | - |
| - | FoolsGold | 100 | 100.0 | 94.1 | 98.5 | 88.4 | 53 | 71 | 71.2 | 71.7 | 3 | 18 | 74 |
- Although the authors present a new backdoor attack method, I currently do not see its practical value. In my view, it merely utilizes gradient information to determine the location of the triggers.
considers a FL settings where all clients are run in a TEE environment, which has strict limitation over adversaries' capability in comparison to previous works. This highlights the enhanced practical value of .
- Some federated backdoor attacks [1,2,3] also employ dynamic triggers.
[1] Iba is one of the baselines that we did thorough comparison with (line 281, 311). Please see section 6.6 for our comparison results from 4 different dimensions. We will respectively mention reference [2], [3] for their contribution of employing dynamic triggers against Personalized Federated Learning in the related work, and thank you for sharing these references with us.
- The writing needs improvement. At present, the motivation behind the authors' method is limited.
We'd love to adopt suggestions to improve our writing. The motivation behind is to expose the vulnerabilities in current FL defenses and emphasize the need for more robust countermeasures by demonstrating the effectiveness of the introduced data-poisoning-only backdoor attacks.
- Since the authors used a Non-IID setting, they could test the proposed method's attack performance for personalized federated learning.
Our work focuses on studying the security performance of the fundamental FL structure introduced by McMahan et al. (2017), with selected attack and defense baselines all served for this structure. We agree that exploring the attack performance of on advanced FL structures, such as Personalized Federated Learning, Vertical Federated Learning, and Federated Transfer Learning, would be a highly promising direction for future work.
This paper presents a method to optimize backdoor attack triggers in federated learning systems. The proposed scheme is validated by experiments. The main contributions of this work include: 1. Proposing a simple and effective method for generating triggers, simultaneously optimizing the pixel values and positions of the triggers. 2. Proposing a defense method based on statistical information to resist the proposed backdoor attack. This work conducts research from both attack and defense perspectives, making it a comprehensive study.
给作者的问题
Simple defenses against poisoning data being filtered and deleted need to be discussed to reveal the actual value of the proposed methods
论据与证据
The claims made in the paper are reasonable and verifiable
方法与评估标准
The evaluation method used is reasonable
理论论述
The theorem provided in this paper is reasonable
实验设计与分析
All the results of the experimental demonstration part of the paper are checked.
补充材料
Read all the supporting materials
与现有文献的关系
The content of this paper is related to the security and vulnerability of distributed training, and the proposed method is related to the robustness of distributed system
遗漏的重要参考文献
N/A
其他优缺点
This paper presents a method to optimize triggers in backdoor attacks. My main concerns are as follows:
- As can be seen from Figure 2, the generated trigger is distinguishable by the human eye. Does this lead to smart clients using simple data filtering/clear methods to suppress backdoor attacks?
- What insights does the theoretical analysis provided in section 5 specifically provide for the design of backdoor attacks?
其他意见或建议
N/A
- As can be seen from Figure 2, the generated trigger is distinguishable by the human eye. Does this lead to smart clients using simple data filtering/clear methods to suppress backdoor attacks? (Simple defenses against poisoning data being filtered and deleted need to be discussed to reveal the actual value of the proposed methods)
We agree that our generated triggers are distinguishable by the human eyes.
During the FL training, malicious clients as the attack executors should not want to filter triggers out, and benign clients without having trigger information are unable to filter triggers out. Even though we let TEE carry out clear methods on data, backdoor data still cannot transform to benign data since their labels have been changed. An adaptive change on attack can be constraining both and bounds of trigger during optimization so that the magnitude of trigger pixels can be better controlled.
During the inference stage, smart users can clear the images before inputing them into the victim FL model to bypass backdoor attacks, but this clear method might also alter features in benign images which will degrade main-task accuracy. Nonetheless, we acknowledge that improving data cleansing methods to more accurately filter backdoor triggers is a valuable defense strategy worth exploring.
- What insights does the theoretical analysis provided in section 5 specifically provide for the design of backdoor attacks?
Thank you for this valuable feedback. Section 5 provided the theoretical insights that the difference in update directions between benign and malicious objectives is bounded by the error of the malicious data on the model. Section 4 introduced how decreases backdoor data's error (loss) on the global model, and section 5 provided justification that the error reduction can help conceal malicious model updates among benign ones. We will add the following formula to paper for better connecting section 4 and section 5.
By optimizing -norm bounded trigger to minimize the error , not only reduces backdoor loss but also conceals malicious model updates (proposition 5.1).
This paper introduces , a new backdoor attack method in FL that dynamically optimizes an L0-norm-bounded trigger to conceal malicious model updates among benign ones. By focusing on data poisoning alone, the attack avoids reliance on model poisoning, which is increasingly impractical under Trusted Execution Environments (TEEs). The authors theoretically justify the concealment property of DPOT_{L₀} in linear models and empirically demonstrate its effectiveness across four datasets and 12 defense strategies, outperforming existing methods in attack success rate (ASR) while maintaining main-task accuracy. The work highlights vulnerabilities in current FL defenses and underscores the need for more robust countermeasures.
给作者的问题
N/A
论据与证据
Yes
方法与评估标准
Yes
理论论述
N/A
实验设计与分析
Yes.
补充材料
No
与现有文献的关系
This work proposes a new backdoor attack with a L0 regularization. The idea is a direct extension from previous optimized trigger backdoor attacks.
遗漏的重要参考文献
N/A
其他优缺点
Strengths:
- This paper considers a FL settings where all clients are run in a TEE environment, which has strick limitation over adversaries' capability in comparison to previous works.
- The evaluation is comprehensive, including a range of baseline attacks and benchmarks.
- The intuition and algorithms are clearly explained and easy to follow.
Weaknesses:
- What is the direct relationship between Section 5 and Section 4? There is no clear explanation provided to demonstrate how the design of DPOT is benefited from the theoretical analysis. I would appreciate a brief explanation here to emphasize how Section 5 can help the reader better understand DPOT.
- Section 4.2 is confusing. If the trigger size in the following part is simply pre-defined, it is unnecessary to have this section here. Besides, how the trigger size is chosen for the evaluation?
其他意见或建议
N/A
- What is the direct relationship between Section 5 and Section 4? There is no clear explanation provided to demonstrate how the design of DPOT is benefited from the theoretical analysis. I would appreciate a brief explanation here to emphasize how Section 5 can help the reader better understand DPOT.
Thank you for this valuable feedback and advice. Section 5 provided the theoretical insights that the difference in update directions between benign and malicious objectives is bounded by the error of the malicious data on the model. Section 4 introduced how decreases backdoor data's error (loss) on the global model, and section 5 provided justification that the error reduction can help conceal malicious model updates among benign ones. We will add the following formula to paper for better connecting section 4 and section 5.
By optimizing -norm bounded trigger to minimize the error , not only reduces backdoor loss but also conceals malicious model updates (proposition 5.1).
- Section 4.2 is confusing. If the trigger size in the following part is simply pre-defined, it is unnecessary to have this section here. Besides, how the trigger size is chosen for the evaluation?
Thank you for this feedback. Section 4.2 introduces the strategy we use to pre-define the trigger size. We determined the trigger size by ensuring that the accuracy drop of poisoned data, predicted as benign by an un-attacked model, does not exceed 30% (subtlety goal). Concrete examples of trigger size selection used in evaluation are provided in Appendix J due to space limitations, as referenced in line 418. Based on your valuable feedback, we will improve the clarity of our statement and mention those examples earlier in the method section for better readability.
We found an experiment suggested by Reviewer Vb6j very interesting, and would like to share it here.
- Duration of attack effectiveness
We compared and on the duration of their effectiveness after attack termination. Using FedAvg on the CIFAR-10 dataset, we first applied 50 rounds of data poisoning for both attacks reached 100% ASR. We then stopped the poisoning and recorded the training rounds needed for ASR to drop to 50%.
Testing with different learning rates (lr), we found that higher lr shortened the attack's duration. As shown in the table, consistently had a longer duration than across all lr. A possible explanation is that disperses trigger pixels across the image, activating more neurons and reinforcing trigger feature retention. In contrast, clusters trigger pixels in a corner, affecting fewer neurons and leading to faster forgetting.
| lr | 0.01 | 0.015 | 0.02 |
|---|---|---|---|
| 275 | 155 | 65 | |
| 330 | 328 | 101 |
The paper proposes , a backdoor attack strategy for Federated Learning that focuses on concealing malicious model updates. Unlike traditional backdoor attacks that use fixed triggers or obvious model poisoning, dynamically optimizes an -norm-bounded trigger for each round. This trigger is designed to be subtle (minimally impacting data) and to align the malicious updates with benign updates, making detection difficult. The core idea is to create a per-round backdoor objective by optimizing a small number of pixels ( constraint) to still achieve the backdoor goal (misclassification to a target label) but minimize the deviation from the expected model update direction. The paper includes theoretical justification for linear models and experimental evaluation on several datasets (FashionMNIST, FEMNIST, CIFAR10, Tiny ImageNet) and against multiple defenses.
给作者的问题
- While Proposition 5.1 provides a theoretical foundation for linear models, how well do you expect this result to generalize to non-linear neural networks? Have you considered any empirical ways to validate the bound in the non-linear case?
- The paper uses a trigger training dataset (D) to optimize the trigger. How sensitive is the attack's performance to the choice of this dataset? Does the trigger generalize well to unseen data from the same distribution? What if the distribution of the trigger training data is different from the distribution of the data seen during FL training?
- How are the attackers selected in each communication round, how many rounds are necessary to achieve the attack goal, and is it required for them to carry out the attack in consecutive rounds?
- How durable is in comparison to other backdoor attacks (after how many training rounds does the ASR significantly decrease, i.e., by 50% or more)?
论据与证据
- Claim 1: effectively conceals malicious model updates.
- Theoretical Justification (Proposition 5.1): Shows that in a linear model, the difference in update directions between benign and malicious objectives is bounded by the error of the malicious data on the model.
- Experimental Results (Tables 1, 8, 9, and Figure 4): Demonstrate high attack success rates (ASR) across various datasets and defenses, often outperforming other attack methods.
- Comparison to other optimized triggers (Section 6.6): Shows superior performance compared to L₂-norm-bounded triggers and partially optimized L₀-norm triggers, highlighting the benefits of optimizing both trigger value and placement.
- Claim 2: undermines state-of-the-art defenses. The evidence includes:
- Extensive Defense Evaluation (Section 6.5 and Appendix D, G, H, I): The paper tests against a wide array of defenses, including robust aggregation methods, outlier detection, and adversarial training.
- Claim 3: preserves the global model's main-task performance The experiments show that the MA (main-task accuracy) stays within a reasonable range.
方法与评估标准
- Methods:
- Trigger Optimization (Algorithms 1 & 2): Algorithm 1 finds the trigger locations (pixels with the largest gradient magnitude), and Algorithm 2 optimizes the trigger values using gradient descent. The use of a "trigger training dataset" to enhance generalization is a good design choice.
- Data Poisoning: The optimized trigger is added to a subset of the malicious clients' data.
- Evaluation Criteria:
- Attack Success Rate (ASR): Standard metric for evaluating backdoor attacks.
- Main-task Accuracy (MA): Crucial to show that the attack doesn't completely degrade the model's performance on the main task.
- Subtlety: Measured by the accuracy drop of a clean model on poisoned data.
- Comparison to Baselines: The paper compares to several relevant baselines, including fixed-pattern triggers, distributed triggers, and optimized triggers ( and ).
理论论述
Proposition 5.1 (Concealment Property): This is the main theoretical result. It provides a bound on the difference between the update directions for benign and malicious objectives in a linear model (The proof is in the appendix B.1).
实验设计与分析
- Datasets: Four standard image datasets are used (FashionMNIST, FEMNIST, CIFAR10, Tiny ImageNet), providing a good range of complexity.
- Models: ResNet and VGGNet architectures are used, which are common choices for image classification.
- Defenses: A comprehensive set of defenses is considered, including robust aggregation, and outlier detection.
- Baselines: Relevant baselines are included for comparison.
- Ablation Studies: The paper contains limited studies of MCR and Data poison rate.
- Parameter Settings: The paper provides details on key parameters (MCR, DPR, trigger size, etc.).
补充材料
The supplementary material is extensive and provides valuable details on the experimental setup, related work, and additional results.
与现有文献的关系
- The paper does a good job of citing relevant prior work in backdoor attacks, defenses, and federated learning. Key papers like Bagdasaryan et al. (2020), Sun et al. (2019), and Nguyen et al. (2024) are cited and discussed.
- The paper clearly differentiates itself from existing work by focusing on -norm-bounded triggers that optimize both value and placement, and by demonstrating effectiveness through data poisoning alone.
遗漏的重要参考文献
While the paper mentions adversarial examples (Szegedy, 2014; Carlini & Wagner, 2017), a more in-depth discussion of the connection between attacks and backdoor attacks could be beneficial.
其他优缺点
-
Strengths:
- The extensive supplementary material and the promise to release code enhance reproducibility.
- The paper is generally well-written and easy to follow. The algorithms are clearly presented, and the experimental setup is well-described.
-
Weaknesses:
- The theoretical justification relies on a linearity assumption, which limits its direct applicability to neural networks.
其他意见或建议
Consider adding a discussion of potential countermeasures against . Even if the paper focuses on the attack, briefly mentioning possible defense strategies would strengthen the work.
Thank you for the effort you put into providing feedback and advice!
- Limited studies of MCR and Data poison rate
Due to the space limitation, we discussed ablation studies of MCR and Data poison rate in Appendix L and N. We also discussed ablation studies of Trigger size and Non-iid degree in Appendix M and O.
- Potential countermeasures against
The real issue exposed by crafty data-poisoning attacks like is a private data governance problem—how to supervise clients' private data to ensure alignment with the common goal of the entire FL system. Addressing this would need a security framework to protect the entire FL lifespan—from clients deciding to form an FL group to the completion of their FL training. Ensuring both privacy and security may demand system architecture or infrastructure-level interventions beyond ML solutions. We hope to see such frameworks developed collaboratively by the ML and system communities to make FL more practical in real-world applications.
- Connection between attacks and backdoor attacks
Can we understand the attacks as -norm-bounded adversarial examples used in testing? We cited Papernot et al. (2016) (line 83) for reference. The similarity of adversarial example and backdoor example is that they both constrain the number of altered pixels. The difference is that adversarial examples vary pixel changes per image, while backdoor examples use consistent patterns (shape, value, and placement) to embed a learnable backdoor feature. Unlike adversarial examples, backdoor examples should avoid introducing excessive new features to prevent hindering the main task's convergence.
- Empirically validate the bound in the non-linear case
We generated 20 random data points (each has 50 dimensions) to simulate and 20 random target values for them. contains 5 random points with specified targets and , where is a model with random parameters. We defined and , and studied the relationship between and their distance .
We tested three models: Linear, Two-layer (ReLU activation between linear layers), and LeNet5 (2 convolutional and 3 linear layers). More details are provided here.
Varying by changing , we recorded the corresponding gradient distance changing along with it. The results for different model architectures are presented in this figure, all of which align with Proposition 5.1. For LeNet5, we also demonstrated the gradient distances separately for its convolutional layers and linear layers in this figure, finding both bounded by with different coefficients. These results indicate that Proposition 5.1 can be applied to non-linear neural networks.
- Trigger training data from different distributions
To study this insightful proposal, we constructed Trigger Training Datasets (TTDs) using Out-Of-Distribution (OOD) data and evaluated the impact on 's performance. For the Fashion MNIST task, we used MNIST data to train triggers, and for CIFAR10, we selected data from CIFAR100 that are in different categories to CIFAR10. Triggers generated by OOD data can be found here, and the results comparing In-Distribution (ID) and OOD TTDs are shown here.
Triggers generated from OOD data might not be the optimal ones to apply to ID victim data, therefore we saw OOD TTDs producing less effective results compared to ID TTDs. Nonetheless, they still show strong attack performance, making this a promising direction for further exploration.
- Attack in non-consecutive rounds
In our main evaluation, a few pre-selected attackers participate in every FL round. We also tested non-consecutive attacks, discussed in Appendix P, and found that non-consecutive attacks can enhance 's effectiveness against certain defenses.
- The number of rounds needed to achieve the attack goal
We set = 50% as the attack goal and recorded the number of rounds required for different attacks to achieve it under various training tasks and defenses. The results can be found via this Link. shows the fastest speed in reaching the attack goal among all attacks.
- Duration of attack effectiveness
Please see our response to Reviewer KkLM, section 3.
This paper proposed a new backdoor attack method, DPOT, against FL systems that constructs a per-round dynamic, L0-norm objective to bound the generated trigger. Three reviewers voted weak accepts and one voted accept. The authors have responded well to most concerns from the reviewers. The major concerns, which prevent some reviewers from championing the paper, are:
-
The evaluation does not reflect real-world setting -- specifically, most experiments are constructed on the ability of the attacker to participate in every round, which is usually not true in practice. While the authors pointed to the experiments in Appendix P, the reviewers also believed that this is not enough. This lack of practical evaluations in several main experiments is a concern.
-
The trigger is highly visible, which is a concern. In my opinion, this also raises the question about the effectiveness against defenses such as FLIP, which clearly shows that the method is not effective w.r.t ASR. While accuracy drops under FLIP, this is less than 4%. This raises questions regarding whether DPOT's risks are well understood against these types of defenses. The experiments on 1 dataset, FashionMNIST, do not sufficiently warrant a conclusion.
For these reasons, I believe that the paper is not yet ready for a publication as there are concerns about the real practical risks of the proposed method, even though the averaged rating of this paper is slightly above the borderline.