6.8

/10

Poster4 位审稿人

最低4最高5标准差0.4

4.0

置信度

创新性2.8

质量2.8

清晰度2.8

重要性2.5

NeurIPS 2025

MARS: A Malignity-Aware Backdoor Defense in Federated Learning

Wei Wan,Ning Yuxuan,Zhicong Huang,Cheng Hong,Shengshan Hu,Ziqi Zhou,Yechao Zhang,Tianqing Zhu,Wanlei Zhou,Leo Yu Zhang

OpenReview PDF

提交: 2025-05-12更新: 2025-10-29

摘要

关键词

federated learningbackdoor defensebackdoor attack

评审与讨论

审稿意见

评分: 5置信度: 52025-06-05

Current defenses often fail against attacks like 3DFed, which adaptively optimizes backdoor models, because they rely on empirical statistical measures not strongly linked to backdoor activity. This paper introduces MARS, a novel approach that uses "backdoor energy" (BE) to quantify the maliciousness of individual neurons. MARS further refines this by extracting the most significant BE values into a "concentrated backdoor energy" (CBE) to enhance the malicious signal. It then employs a new Wasserstein distance-based clustering technique to identify and isolate backdoor models. Experiments show MARS effectively defends against state-of-the-art backdoor attacks and surpasses existing defense mechanisms.

优缺点分析

Strengths

The paper addresses a meaningful topic in the field.
The paper highlights that other defenses are often loosely coupled with backdoor attacks, and this paper specifically addresses this issue through its method.
A series of experiments is conducted to prove the effectiveness of the method.

Weaknesses

In the experiments, BackdoorIndicator is specifically selected to make comparisons and discussions. What is the reason? What is the major difference between other defenses, BackdoorIndicator, and MARS? Please provide more discussion on this.
In the experiments, hyperparameters $\kappa$ and $\epsilon$ are set to 5 and 0.03. In the appendix, the results indicate that these values fall within the optimal range for CIFAR-10. How can you make sure these values are still optimal for other datasets? Is there any theoretical discussion for setting these hyperparameters?
Why are the TPR and FPR results missing for FedCLP in Table 2?

问题

See comments above.

局限性

Yes.

最终评判理由

The rebuttal has addressed my concerns. I will keep the positive rating.

格式问题

None.

作者回复

2025-07-31

We sincerely thank Reviewer 1dAU for the thoughtful review and the positive rating. We hope our responses are helpful in addressing your concerns and would be happy to discuss if anything remains unclear.

In the experiments, BackdoorIndicator is specifically selected to make comparisons and discussions. What is the reason? What is the major difference between other defenses, BackdoorIndicator, and MARS? Please provide more discussion on this.

Thank you for raising this point. We selected BackdoorIndicator as our main point of comparison for three reasons. First, it was very recently published at USENIX Security ’24 and has already garnered strong community recognition. Second, like MARS, it departs from conventional FL defenses—such as norm-constraint, out-of-distribution detection, and consistency checks—and offers a fresh insight and mechanism. Third, it achieves robust empirical performance against standard backdoor attacks, making it a natural baseline for any new defense.

However, MARS and BackdoorIndicator differ in a critical way:

Theoretical underpinnings vs. empirical observation: BackdoorIndicator is motivated by the empirical finding that deploying repeated backdoors with the same target label amplifies the trigger effect. Importantly, this observation has only been validated under unconstrained attack settings; for constrained attacks like 3DFed, the assumption may no longer hold, as demonstrated by the sharp performance drops in Table 4 of our paper. In contrast, MARS derives a formal upper bound on backdoor energy via Lipschitz and Wasserstein metrics, providing a provable foundation for its clustering defense that remains valid regardless of attack constraints.

No reliance on external data: BackdoorIndicator injects “indicator tasks” using an auxiliary OOD dataset and judges updates by the model’s accuracy on those tasks, making its effectiveness dependent on the availability and suitability of that data. MARS, by contrast, analyzes only the model parameters—specifically, neurons’ activation “energy”—and requires no external samples.

These distinctions highlight MARS’s practical advantages—strong theoretical guarantees and no need for trusted external data—which we will underscore in the revised manuscript.

In the experiments, hyperparameters $\kappa$ and $\epsilon$ are set to 5 and 0.03. In the appendix, the results indicate that these values fall within the optimal range for CIFAR-10. How can you make sure these values are still optimal for other datasets? Is there any theoretical discussion for setting these hyperparameters?

Thank you for this question. Frankly, our choices of both $\kappa$ and $\epsilon$ are guided primarily by practical intuition and empirical analysis rather than by a strict theoretical derivation:

Setting $\kappa = 5%$ : We drew inspiration from pruning-based defenses—originally developed to mitigate gradient-inversion attacks—which commonly remove a small fraction of parameters (e.g., 5%) as a default that balances defense strength and model utility. For example, “Revisiting Gradient Pruning: A Dual Realization for Defending against Gradient Attacks” (AAAI ’24) uses 5% pruning with strong results. In our context, selecting the top 5% of neurons by backdoor energy similarly captures the most suspicious activity while preserving computational efficiency.

Setting $\epsilon = 0.03$ : We examined pairwise Wasserstein‐distance matrices of Cumulative Backdoor Energy (CBE) vectors across multiple datasets (CIFAR-10, CIFAR-100, MNIST) and attack scenarios (including 3DFed). In every case, we observed a pronounced “gap” between intra‐group distances (malicious–malicious or benign–benign) and inter‐group distances, which offers a wide margin for threshold selection.

For example, Table 4.1 reports the distances for 10 clients on CIFAR-10 under a 3DFed attack (clients 1–2 malicious, 3–10 benign). All pairwise distances among malicious clients or among benign clients remain below 0.03, while every malicious–benign distance lies around 5.95–5.98. This clear separation means that setting $\epsilon = 0.03$ places the threshold squarely in the middle of these two clusters. We observed the same pattern on CIFAR-100 and MNIST, making 0.03 a reliable, off-the-shelf default that requires no per-dataset tuning.

Table 4.1: Pairwise Wasserstein Distances (CIFAR-10, 3DFed)

Client	1	2	3	4	5	6	7	8	9	10
1	0.00	0.00	5.97	5.95	5.97	5.97	5.96	5.96	5.96	5.98
2	0.00	0.00	5.97	5.95	5.97	5.97	5.96	5.96	5.96	5.98
3	5.97	5.97	0.00	0.03	0.03	0.02	0.02	0.03	0.02	0.02
4	5.95	5.95	0.03	0.00	0.03	0.03	0.03	0.03	0.02	0.03
5	5.97	5.97	0.03	0.03	0.00	0.02	0.03	0.02	0.03	0.03
6	5.97	5.97	0.02	0.03	0.02	0.00	0.02	0.03	0.03	0.02
7	5.96	5.96	0.02	0.03	0.03	0.02	0.00	0.02	0.03	0.03
8	5.96	5.96	0.03	0.03	0.02	0.03	0.02	0.00	0.03	0.03
9	5.96	5.96	0.02	0.02	0.03	0.02	0.02	0.03	0.00	0.02
10	5.98	5.98	0.02	0.03	0.03	0.02	0.03	0.03	0.02	0.00

While these settings are ultimately empirical, they have proven stable across diverse tasks and architectures. We view $\kappa = 5%$ and $\epsilon = 0.03$ as sensible defaults for practitioners, with the understanding that one could perform a quick pre-deployment calibration—by inspecting the distance matrix on a small held-out validation set—to adjust them if a dramatically different data distribution or attack style is anticipated.

Why are the TPR and FPR results missing for FedCLP in Table 2?

Thank you for pointing this out. The reason TPR and FPR are not reported for FedCLP in Table 2 is that FedCLP treats every local update identically—pruning all client models before aggregation—rather than attempting to distinguish benign from malicious submissions. As a result, it does not produce per‐client labels and we cannot compute true‐positive or false‐positive rates. In our Appendix F (Lines 706–708) we clarify this and denote TPR/FPR as “–” for FedCLP. To avoid any confusion, we will move that explanation into the main text in the revised manuscript.

2025-08-06

Dear Reviewer,

I hope this message finds you well. As the discusion period is nearing its end with less than three days remaining, I want to ensure we have adressed all your concems satisfactorly. If there are any additional points or feedback you'd like us to consider, please let us know. Your insights are invaluable to us, and we're eager to address any remaining issues to improve our work.

Thank you for your time and effort in reviewing our paper.

2025-08-06

Thanks for the authors' detailed response, which has addressed my concerns. I will maintain the positive score as it is.

2025-08-06

We’re very glad to hear that our response has addressed your concerns. Thank you for your thoughtful feedback and for maintaining the positive score—we truly appreciate your time and support.

审稿意见

评分: 4置信度: 42025-07-02

This paper introduces MARS, a novel defense mechanism against backdoor attacks in FL. The core problem addressed is the failure of existing defenses against adaptive attacks that mimic benign model updates. The authors propose a new metric, "backdoor energy" (BE), to quantify the maliciousness of individual neurons within client models. By concentrating the highest BE values into a "concentrated backdoor energy" (CBE) vector and employing Wasserstein distance-based clustering, MARS aims to effectively distinguish and exclude malicious updates from the global aggregation process. The experimental results, conducted on several standard datasets, show that MARS outperforms a wide range of state-of-the-art defenses, particularly against strong adaptive attacks.

优缺点分析

Strengths

The paper correctly identifies a critical flaw in existing defenses: their reliance on statistical metrics (like model norm or cosine similarity) that advanced adaptive attacks can easily evade. The motivation for a new, "malignity-aware" approach is well-argued and empirically supported.
The introduction of "backdoor energy" (BE) as a metric sensitive to trigger-related neuron activity is innovative. The subsequent steps of forming a concentrated BE (CBE) vector to amplify the malicious signal and using Wasserstein distance for clustering are logical and effective. The choice of Wasserstein distance is particularly well-justified, as it provides a more robust measure for comparing the distributions found in CBE vectors compared to standard Euclidean or cosine metrics.
The paper presents a robust evaluation against eight SOTA defenses across multiple datasets and a diverse set of attacks, including challenging adaptive ones. The results consistently demonstrate MARS's superiority in maintaining high main task accuracy (ACC) while achieving a low attack success rate (ASR). Weaknesses
The paper argues for the use of Wasserstein distance and the K-WMeans algorithm but primarily supports this with final performance metrics (ACC/ASR). To make the core mechanism more transparent and convincing, the paper would benefit from showing direct evidence of separability. A toy example or a conceptual illustration is insufficient to prove its effectiveness in a real, complex scenario. The absence of a real Wasserstein distance matrix from the experiments leaves a gap between the method's description and its empirical validation.
The paper states that Backdoor Energy (BE) values are calculated only for convolutional and batch normalization layers, while ignoring the fully connected (FC) layers. This introduces a significant potential vulnerability: an attacker aware of this design could craft a backdoor that concentrates its malicious impact within the ignored FC layers, thereby evading detection entirely. While this choice might be motivated by reducing computational overhead (as FC layers can be time-consuming to analyze), the paper lacks a thorough justification or an ablation study to prove that this exclusion has a negligible impact on the defense's overall effectiveness. This selective strategy weakens the method's claim to general and robust protection.

问题

To substantiate the claim that K-WMeans effectively separates clients, could the authors provide a visualization (e.g., a heatmap) of a real, pairwise Wasserstein distance matrix computed on the CBE vectors from one of the main experimental settings (e.g., CIFAR-10)? This would provide more direct and compelling evidence of client separability than a toy example and greatly strengthen the justification for the core defense mechanism.
Could the authors elaborate on the rationale for excluding fully connected layers from the BE calculation? To validate this design choice, could you provide an ablation study comparing the defense performance (both ACC and ASR) and computational cost when MARS is applied to (a) only convolutional/BN layers versus (b) all layers, including FC layers? Does this selective approach create an exploitable loophole for attackers to plant backdoors in the final layers of a network?

局限性

yes

最终评判理由

Most of the concerns raised during the review process have been adequately addressed. However, after careful consideration, I believe the significance and novelty of this work do not justify changing my original rating. Although important improvements have been made, the overall contribution remains modest within the context of the field.

格式问题

作者回复

2025-07-31

We sincerely thank Reviewer hznG for the thoughtful review and the positive rating. We hope our responses are helpful in addressing your concerns and would be happy to discuss if anything remains unclear.

To substantiate the claim that K-WMeans effectively separates clients, could the authors provide a visualization (e.g., a heatmap) of a real, pairwise Wasserstein distance matrix computed on the CBE vectors from one of the main experimental settings (e.g., CIFAR-10)? This would provide more direct and compelling evidence of client separability than a toy example and greatly strengthen the justification for the core defense mechanism.

Thank you for this valuable suggestion. Below, we provide the pairwise Wasserstein distance matrix computed over the CBE vectors of 10 clients under the CIFAR-10 + 3DFed attack (clients 1–2 are malicious; clients 3–10 are benign). As shown in Table 3.1, intra-malicious and intra-benign distances remain very small (0.02–0.03), while malicious–benign distances are very large ( 5.95–5.98), which clearly justifies K-WMeans’ ability to separate the two groups. We will include this table in the revised manuscript to provide direct and compelling evidence for the separability that underpins our core defense mechanism.

Table 3.1: Pairwise Wasserstein Distances (CIFAR-10, 3DFed)

Client	1	2	3	4	5	6	7	8	9	10
1	0.00	0.00	5.97	5.95	5.97	5.97	5.96	5.96	5.96	5.98
2	0.00	0.00	5.97	5.95	5.97	5.97	5.96	5.96	5.96	5.98
3	5.97	5.97	0.00	0.03	0.03	0.02	0.02	0.03	0.02	0.02
4	5.95	5.95	0.03	0.00	0.03	0.03	0.03	0.03	0.02	0.03
5	5.97	5.97	0.03	0.03	0.00	0.02	0.03	0.02	0.03	0.03
6	5.97	5.97	0.02	0.03	0.02	0.00	0.02	0.03	0.03	0.02
7	5.96	5.96	0.02	0.03	0.03	0.02	0.00	0.02	0.03	0.03
8	5.96	5.96	0.03	0.03	0.02	0.03	0.02	0.00	0.03	0.03
9	5.96	5.96	0.02	0.02	0.03	0.02	0.02	0.03	0.00	0.02
10	5.98	5.98	0.02	0.03	0.03	0.02	0.03	0.03	0.02	0.00

Could the authors elaborate on the rationale for excluding fully connected layers from the BE calculation? To validate this design choice, could you provide an ablation study comparing the defense performance (both ACC and ASR) and computational cost when MARS is applied to (a) only convolutional/BN layers versus (b) all layers, including FC layers? Does this selective approach create an exploitable loophole for attackers to plant backdoors in the final layers of a network?

Thank you for the insightful comment. We acknowledge the importance of justifying the design choice of excluding fully connected (FC) layers from the Backdoor Energy (BE) computation. To this end, we conducted a thorough ablation study comparing two variants of MARS: one that computes BE using only convolutional and batch normalization (BN) layers ("partial-layers"), and another that includes all layers, including FC layers ("all-layers"). The results in Table 3.2 demonstrate that both versions achieve nearly identical defense effectiveness under 3DFed attacks on CIFAR-10 with ResNet-18: the clean accuracy (ACC) and attack success rate (ASR) are virtually unchanged (e.g., ACC ≈ 85%, ASR ≈ 9%). However, including FC layers increases computational cost by approximately 13% per communication round. This validates our decision to focus on convolutional/BN layers as a practical tradeoff between performance and efficiency.

Table 3.2 Partial-Layers vs. All-Layers Ablation

Defense	ACC (%)	ASR (%)	Time per round (s)
MARS (partial-layers)	85.07	9.86	6.57
MARS (all-layers)	85.11	9.79	7.43

More importantly, we also evaluated whether this selective layer choice introduces a potential vulnerability—for instance, if an attacker attempts to inject a backdoor exclusively through FC layers. To simulate this, we conducted an additional experiment where malicious clients freeze all parameters except the FC layers during local training in Table 3.3. In this case, MARS (partial-layers) fails to detect the malicious updates (TPR = 0%, FPR = 100%), since it ignores the manipulated layers. However, the attack itself is unsuccessful: the ASR drops to just 9.86%, while clean accuracy also suffers. This suggests that injecting a backdoor using only the FC layers fails to achieve both stealth and effectiveness. We hypothesize that this is due to the limited expressive capacity of isolated FC-layer tuning: the convolutional layers generate nearly identical features for a clean sample and the corresponding triggered sample, making it difficult for the final layer alone to simultaneously satisfy both objectives (i.e., clean accuracy and attack success). Finally, we note that MARS (all-layers) can still detect such attacks, achieving 100% TPR even when malicious updates are limited to FC layers. Therefore, although our default implementation prioritizes efficiency by ignoring FC layers, MARS can be easily extended to handle such edge cases when needed, without compromising robustness.

Table 3.3 FC Layers Tuning Attack

Defense	ACC (%)	ASR (%)	TPR (%)	FPR (%)
MARS (partial-layers)	82.74	9.86	0.00	100.00
MARS (all-layers)	85.29	9.25	100.00	25.00
FedAvg (attack-free)	85.26	9.34	-	-

We will include these ablation results and analysis in the revised manuscript to clarify both the rationale and empirical support for this design choice.

2025-08-04

Thank you for your detailed and constructive response to my comments. I appreciate the effort you put into addressing my concerns with additional experiments and analyses.

However, I noticed that MARS (all-layers) still shows a 25% False Positive Rate (FPR) in this scenario, which suggests potential issues with misclassifying benign clients. A brief discussion on why this occurs or possible strategies to reduce the FPR would be helpful.

Additionally, while your current analysis indicates that FC-layer-only attacks are ineffective, I encourage you to consider whether more sophisticated attack strategies—such as combining minor Conv layer adjustments with FC layer tuning—could pose a greater threat.

2025-08-04

Thank you for your continued constructive feedback! We appreciate your insightful comments and suggestions.

Regarding the observed 25% False Positive Rate (FPR) in the FC-only attack scenario with MARS (all-layers), we would like to clarify that MARS is fundamentally designed as a malignity-aware backdoor defense. When the attacker only modifies the fully connected (FC) layers, the malicious impact on the global model is limited. As a result, a small number of benign models may be classified as malicious—thus leading to a higher FPR. Importantly, this misclassification does not degrade the overall global model performance, as the attack is inherently weak (e.g., ASR = 9.91% under FedAvg).

To further investigate whether more sophisticated attacks can bypass MARS, we extended our experiments to include incremental modifications to convolutional (Conv) layers, starting from the last Conv block and moving backward. As shown in Table 3.4, we evaluated various attack strategies, including:

• FC-only

• 1Conv+FC

• 2Convs+FC

• 3Convs+FC

• 4Convs+FC

• All layers (i.e., full-parameter attacks)

We observed that even when the attacker modifies one or two Conv blocks in addition to the FC layer, the resulting attack is still weak (e.g., ASR = 12.36% and 56.60%, respectively, under FedAvg). Nonetheless, both MARS (partial-layers) and MARS (all-layers) consistently achieve 100% TPR and 0% FPR, demonstrating strong resilience against these more involved but still low-intensity attacks. When more Conv blocks are compromised, and especially when all parameters are manipulated, the attack becomes significantly more effective (e.g., ASR = 99.68%). However, MARS still maintains perfect detection performance, with 100% TPR and 0% FPR in all such cases. This suggests that stronger malicious behavior actually makes detection easier for MARS, further validating its robustness.

Table 3.4 Robustness of MARS to Varying Depths of Parameter Compromise

Attack Strategy	Defense	ACC	ASR	TPR	FPR
FC only	MARS (partial-layers)	82.74	9.86	0.00	100.00
	MARS (all-layers)	85.29	9.25	100.00	25.00
	FedAvg	85.33	9.91	0.00	0.00
1Conv+FC	MARS (partial-layers)	85.48	9.34	100.00	0.00
	MARS (all-layers)	85.46	9.49	100.00	0.00
	FedAvg	84.51	12.36	0.00	0.00
2Convs+FC	MARS (partial-layers)	85.51	9.55	100.00	0.00
	MARS (all-layers)	85.57	9.32	100.00	0.00
	FedAvg	84.32	56.60	0.00	0.00
3Convs+FC	MARS (partial-layers)	85.37	9.49	100.00	0.00
	MARS (all-layers)	85.55	9.41	100.00	0.00
	FedAvg	84.94	91.45	0.00	0.00
4Convs+FC	MARS (partial-layers)	85.54	9.23	100.00	0.00
	MARS (all-layers)	85.56	9.14	100.00	0.00
	FedAvg	85.32	96.17	0.00	0.00
All layers	MARS (partial-layers)	85.16	9.40	100.00	0.00
	MARS (all-layers)	85.28	9.67	100.00	0.00
	FedAvg	78.32	99.68	0.00	0.00

In conclusion, while weak attacks such as FC-only modifications may result in occasional false positives due to their benign nature, they do not impact overall performance. More advanced attacks involving broader parameter manipulation do not successfully evade MARS. We agree that designing adversaries that can simultaneously maintain stealth and effectiveness remains a challenging open problem and will consider exploring this direction in future work.

2025-08-06

Dear Reviewer,

Thank you for your time and effort in reviewing our paper.

2025-08-07

Thank you to the authors for their detailed response, which has addressed my concerns. Given the significance and novelty of the work, I will maintain my positive score.

2025-08-07

Thank you very much for your thoughtful feedback and for taking the time to review our work. We’re glad that our rebuttal addressed your concerns, and we truly appreciate your recognition of the significance and novelty of our contribution. Your continued support and positive evaluation mean a great deal to us.

审稿意见

评分: 4置信度: 42025-07-02

The paper proposes MARS, a defense against backdoor attacks in federated learning. Unlike prior methods relying on weak statistical heuristics, MARS uses backdoor energy (BE) to quantify neuron-level malignancy and clusters models via Wasserstein distance on concentrated BE (CBE). It reliably detects backdoor models without clean data or attacker ratio assumptions. Experiments show MARS outperforms 8 SOTA defenses, including against adaptive and high-ratio attacks.

优缺点分析

Strengths: It proposes a novel and practical backdoor energy (BE) metric that avoids reliance on clean data. The clustering method is well-suited to unordered representations via Wasserstein distance. The method is validated through thorough experiments across datasets and challenging attack settings.

Weaknesses:

MARS relies on Lipschitz-based backdoor energy, which becomes unreliable in architectures like Vision Transformers due to the complex, input-dependent nature of self-attention and normalization layers.
MARS assumes that all clients use structurally similar models, but in practice, heterogeneous client architectures may yield incomparable backdoor energy profiles.
The paper states that if the Wasserstein distance between the two clusters does not exceed a threshold ϵ, "all local models are considered benign". This decision operates under the assumption that "an FL system with only attackers is meaningless". However, an FL system comprised solely of attackers is not necessarily "meaningless" in an adversarial context; it represents an extreme, albeit plausible, attack scenario where all participating clients are compromised. In such a situation, even if all clients in an FL system are malicious, it is plausible for their Concentrated Backdoor Energy (CBE) distributions to be sufficiently similar such that the inter-cluster distance falls below ϵ. In this scenario, where all local models are, in fact, malicious, MARS would mistakenly classify them all as benign, allowing a complete compromise. This directly contradicts the paper's own threat model, which explicitly states that "Attackers can constitute a majority, with their proportion not restricted to below 50% as typically assumed in existing research", as a 100% attacker scenario clearly falls within this majority condition.
In the "Resilience to adaptive attack" section, MARS fails significantly (TPR 0%, FPR ~100%) when the adaptive attack's regularization coefficient λ is increased to 0.05 or higher. This indicates a critical failure of MARS when attackers effectively minimize their backdoor energy. While MARS* is introduced as a modification employing a majority-based cluster selection, this reintroduces the very assumption (implicit reliance on a benign majority) that the initial MARS design aimed to avoid. Given the threat model explicitly allows attackers to "constitute a majority, with their proportion not restricted to below 50%” and the defender has "minimal knowledge" about the proportion of attackers, the defense faces a dilemma when λ is increased to 0.05 or higher. The defender cannot reliably determine when to switch strategies, nor can they consistently rely on a benign majority under the defined threat model. This compromises MARS's robustness across all adversarial scenarios outlined.

问题

The paper assumes an "FL system with only attackers is meaningless". However, a 100% attacker scenario is plausible (though rare) and falls within your threat model where attackers can constitute a majority. How does MARS ensure effectiveness in such a scenario, where all malicious models might be misclassified as benign?
MARS fails against adaptive attacks with λ≥0.05. While MARS* uses a majority-based selection, this reintroduces the benign-majority assumption that MARS aims to avoid. Given the defender's minimal knowledge of attacker proportion and λ, how can the correct strategy (MARS vs. MARS*) be reliably chosen in practice?

局限性

The paper's threat model assumes a practical and challenging scenario where "Attackers can constitute a majority, with their proportion not restricted to below 50% as typically assumed in existing research". However, the proposed MARS defense, as detailed in the "Weaknesses" and "Questions" raised in the review, appears to have significant limitations in fully addressing this threat model:

MARS's cluster selection strategy assumes that if the Wasserstein distance between clusters is low, "all local models are considered benign". This implicitly contradicts the threat model by assuming an "meaningless" system in an all-attacker scenario, potentially leading to a complete compromise if 100% of clients are malicious yet their CBEs are similar. This is a crucial gap given the majority attacker assumption.
While MARS* is introduced to counter adaptive attacks with high λ values, its reliance on a majority-based selection reintroduces the very assumption the initial MARS design aimed to avoid. The defender's "minimal knowledge" about attacker proportion and λ means there's no clear mechanism for reliably switching between MARS and MARS* strategies. This leaves the defense's robustness compromised across the full range of adversarial scenarios outlined in the threat model.

最终评判理由

The rebuttal addresses most of my concerns and I have elevate my rating accordingly. Thank you for your efforts!

格式问题

None

作者回复

2025-07-31

We sincerely thank Reviewer oRiw for the thoughtful review. We hope our responses are helpful in addressing your concerns and would be happy to discuss if anything remains unclear.

MARS relies on Lipschitz-based backdoor energy, which becomes unreliable in architectures like Vision Transformers due to the complex, input-dependent nature of self-attention and normalization layers.

Thank you for raising this important point. To demonstrate that MARS scales to Vision Transformer architectures, we evaluated it on a pretrained ViT model using the Hugging Face Transformers library. Specifically, we loaded

ViTForImageClassification.from_pretrained( 'google/vit-base-patch16-224-in21k',num_labels=10,ignore_mismatched_sizes=True)

and fine-tuned it on CIFAR-10. This ViT contains a very high proportion of linear layers (99.09% of its parameters), so MARS remains fully applicable. As shown in Table 2.1, MARS on ViT achieves detection performance comparable to Baseline (i.e., FedAvg in attack-free scenario), confirming its effectiveness on large-scale models. We will include this experiment (and its detailed results) in the revised manuscript.

Table 2.1 Performance on ViT

Round	Defense	ACC	ASR	TPR	FPR
1	FedAvg	24.79	8.24	0.00	0.00
	MARS	96.98	9.93	100.00	0.00
	Baseline	97.36	10.01	-	-
5	FedAvg	94.31	99.59	0.00	0.00
	MARS	97.69	10.00	100.00	0.00
	Baseline	97.91	10.02	-	-
10	FedAvg	96.03	99.89	0.00	0.00
	MARS	97.88	9.98	100.00	0.00
	Baseline	98.08	9.98	-	-
15	FedAvg	96.62	99.89	0.00	0.00
	MARS	97.97	10.00	100.00	0.00
	Baseline	98.11	9.99	-	-
20	FedAvg	96.82	99.85	0.00	0.00
	MARS	97.93	9.99	100.00	0.00
	Baseline	98.08	10.00	-	-

MARS assumes that all clients use structurally similar models, but in practice, heterogeneous client architectures may yield incomparable backdoor energy profiles.

Thank you for raising this important point. We would like to clarify that MARS does not require all clients to use the same network architecture. While many existing defenses—such as Multi-Krum (which computes pairwise Euclidean distances) or FoolsGold (which relies on cosine similarity)—indeed break down when models differ structurally, MARS operates on the distribution of backdoor energies rather than on raw parameter vectors. Specifically, in Eq. (5) we use the Wasserstein distance to compare each client’s CBE profile. Since the Wasserstein distance measures the distance between two distributions, it naturally accommodates vectors of differing lengths or supports. For instance, consider the toy example in the paper: assume that $L_1=[1,3,4,5]$ (originally [1,2,3,4,5] with one layer missin) and $L_2=[5,5,2]$ (originally [5,5,4,3,2] with two layers missing) are the CBEs of backdoor models, while $L_3=[1,1,1,1,1]$ (the complete model) is the CBE of a benign model. According to Eq. (5), we obtain $\mathrm{Wass}(L_1, L_2) = 1.75$ , $\mathrm{Wass}(L_1, L_3) = 2.25$ , $\mathrm{Wass}(L_2, L_3) = 3.00$ , allowing us to cluster the more similar $L_1$ and $L_2$ together. This demonstrates that MARS can effectively distinguish backdoored from benign clients even when their architectures—and thus the dimensionality of their BE vectors—differ. We will clarify this in the revised manuscript.

The paper assumes an "FL system with only attackers is meaningless". However, a 100% attacker scenario is plausible (though rare) and falls within your threat model where attackers can constitute a majority. How does MARS ensure effectiveness in such a scenario, where all malicious models might be misclassified as benign?

Thank you for highlighting this edge case. We acknowledge that MARS, like nearly all existing defenses, cannot recover a clean global model if all clients are malicious—an FL setting in which no honest update exists to guide aggregation. We consider a 100%–attacker scenario to be fundamentally impractical, because without any benign contributions an FL system cannot produce a useful model, irrespective of defense strategy. For context, FLTrust (NDSS ’21) evaluates up to 95% attackers and BackdoorIndicator (USENIX Security ’24) up to 60%, and both require external “trusted” or out-of-distribution data to function. In contrast, MARS can tolerate over 50% malicious clients without any auxiliary dataset—an advancement we believe represents a significant step forward. We will clarify in the revision that handling a 100%–attacker case lies outside the scope of our threat model and is shared by prior work.

MARS fails against adaptive attacks with $\lambda \ge 0.05$ . While MARS uses a majority-based selection, this reintroduces the benign-majority assumption that MARS aims to avoid. Given the defender's minimal knowledge of attacker proportion and λ, how can the correct strategy (MARS vs. MARS ) be reliably chosen in practice?

Thank you for this perceptive observation. As reported in Table 3 of our paper, when the adaptive attacker increases the regularization coefficient to $\lambda \ge 0.05$ , MARS indeed swaps the labels of benign and malicious updates—but in that regime, the aggregated global model’s ACC plunges to just 10%. Such a collapse violates the key stealth requirement of backdoor attacks (i.e., remaining accurate on clean data while misbehaving only on triggered inputs), meaning the attack itself is no longer successful or realistic.

Moreover, empirical studies in production‐scale federated learning—most notably “Back to the drawing board: A critical evaluation of poisoning attacks on production federated learning” (IEEE S&P ’22)—show that real‐world adversaries rarely control more than 10 % of clients. Under these practical conditions, MARS*’s majority‐based cluster selection is both justified and effective. We will clarify these points in the revised manuscript to emphasize that (1) overly aggressive minimization of backdoor energy undermines the attack’s stealth and (2) in realistic threat settings where attackers form a small minority, MARS* provides robust defense.

2025-08-06

Dear Reviewer,

Thank you for your time and effort in reviewing our paper.

2025-08-07

Dear Reviewer oRiw,

We hope you’re doing well. As the rebuttal discussion period will close in under two days, we wanted to kindly follow up on our previous message. If you have any additional concerns or comments on our paper, we would be grateful to address them before the deadline. Your feedback is invaluable in helping us improve our work.

Thank you again for your time and effort.

审稿意见

评分: 4置信度: 32025-07-03

This paper presents MARS, a novel defense mechanism against backdoor attacks in Federated Learning (FL). The core idea is to use a new metric called Backdoor Energy (BE), which captures the malignancy level of individual neurons, and to extract a Concentrated Backdoor Energy (CBE) vector for each model. Models are then clustered via Wasserstein distance-based K-means (K-WMeans), allowing the server to detect and exclude backdoored updates. Experimental results show strong defense performance against state-of-the-art attacks like 3DFed and CerP, with extensive evaluation and ablation experiments.

优缺点分析

Strengths: • The paper departs from conventional defenses relying on empirical statistics (e.g., norm, cosine similarity) and instead introduces a theoretically motivated malignity-aware metric (BE), leading to a more fundamental defense approach. • The method does not require access to clean data or the backdoor trigger, solving a key practical limitation in existing defenses. • Provides a formal upper bound for BE under Lipschitz smoothness, offering a clean justification for using layer-wise Lipschitz norms. • The use of Wasserstein distance in clustering CBEs overcomes limitations of Euclidean/cosine distances, particularly in handling unordered high-magnitude components across neurons. Weaknesses: • Although briefly referenced in the appendix, the computational cost of computing Lipschitz constants and Wasserstein distances for all neurons/models needs more discussion in the main paper. Can it scale to large models like ViT or LLMs?

• While the paper justifies approximating BE with Lipschitz norms, the assumption that relative ordering is preserved may not always hold, especially in deep networks with normalization layers. Some discussion or empirical validation would be helpful. • The definition of “Backdoor Energy” is somewhat abstract. A diagram or intuition about what a “malignant neuron” looks like (activation-wise) might help reader understand the intuition.

问题

How does MARS perform if benign updates contain high-magnitude neurons due to data heterogeneity?

Could this lead to false positives?

Could this method generalize to detecting other stealthy behaviors (e.g., model inversion or privacy leakage)?

How sensitive is the model performance to the percentage of top BE values (e.g., κ = 5%)? Are results robust to this choice?

Consider including runtime/efficiency information.

局限性

Yes

最终评判理由

After reading the rebuttal and I appreciate the authors' effort to address my concerns. I will keep my original score.

格式问题

None

作者回复

2025-07-31

We sincerely thank Reviewer xx46 for the thoughtful review and the positive rating. We hope our responses are helpful in addressing your concerns and would be happy to discuss if anything remains unclear.

Although briefly referenced in the appendix, the computational cost of computing Lipschitz constants and Wasserstein distances for all neurons/models needs more discussion in the main paper.

Thank you for this valuable suggestion. We have added a discussion of the computational cost of computing Lipschitz constants and Wasserstein distances to the main text:

Cost of computing the Lipschitz constant For an $m×n$ weight matrix, we approximate its Lipschitz constant via the power-iteration method. Each iteration involves one matrix–vector multiplication at cost $O(mn)$ ; after k iterations, the total complexity is $O(kmn)$ . Cost of computing the Wasserstein distance Given two n-dimensional vectors, we first sort each in ascending order at cost O(nlogn), then compute and sum the absolute differences element-wise at cost $O(n)$ . The overall complexity is therefore $O(nlogn)+O(n)=O(nlogn)$ .

In the final manuscript, we will also report actual timing results from our experiments and include implementation details in the appendix, so that readers can clearly assess the efficiency of our approach. Thank you again for your insightful feedback.

Can it scale to large models like ViT or LLMs?

ViTForImageClassification.from_pretrained( 'google/vit-base-patch16-224-in21k',num_labels=10,ignore_mismatched_sizes=True)

and fine-tuned it on CIFAR-10. This ViT contains a very high proportion of linear layers (99.09% of its parameters), so MARS remains fully applicable. As shown in Table 1.1, MARS on ViT achieves detection performance comparable to Baseline (i.e., FedAvg in attack-free scenario), confirming its effectiveness on large-scale models. We will include this experiment (and its detailed results) in the revised manuscript.

Table 1.1 Performance on ViT

Round	Defense	ACC	ASR	TPR	FPR
1	FedAvg	24.79	8.24	0.00	0.00
	MARS	96.98	9.93	100.00	0.00
	Baseline	97.36	10.01	-	-
5	FedAvg	94.31	99.59	0.00	0.00
	MARS	97.69	10.00	100.00	0.00
	Baseline	97.91	10.02	-	-
10	FedAvg	96.03	99.89	0.00	0.00
	MARS	97.88	9.98	100.00	0.00
	Baseline	98.08	9.98	-	-
15	FedAvg	96.62	99.89	0.00	0.00
	MARS	97.97	10.00	100.00	0.00
	Baseline	98.11	9.99	-	-
20	FedAvg	96.82	99.85	0.00	0.00
	MARS	97.93	9.99	100.00	0.00
	Baseline	98.08	10.00	-	-

While the paper justifies approximating BE with Lipschitz norms, the assumption that relative ordering is preserved may not always hold, especially in deep networks with normalization layers. Some discussion or empirical validation would be helpful.

Thank you for pointing this out. We believe there may be a misunderstanding: our method does not rely on preserving the relative ordering of backdoor energies. Instead, by using Wasserstein distance based clustering, MARS is inherently order‐insensitive. For example, in our toy example(line 263) we show two CBE vectors,L₁ = [1, 2, 3, 4, 5], L₂ = [5, 5, 3, 2, 2], they differ greatly element-wise yet have nearly identical Wasserstein distances and are correctly assigned to the same cluster.

Moreover, all of our main experiments on CIFAR-10 and CIFAR-100 use ResNet-18—an architecture with batch-normalization layers—and still demonstrate strong defense performance, confirming that MARS remains effective even when standard normalization may disrupt simple ordering assumptions. We will clarify this point in the revised manuscript.

The definition of “Backdoor Energy” is somewhat abstract. A diagram or intuition about what a “malignant neuron” looks like (activation-wise) might help reader understand the intuition.

Thank you for this helpful suggestion. We agree that a visual intuition can greatly aid understanding. In the revised manuscript, we will include a schematic illustration to show typical activation patterns of “malignant” versus benign neurons.

How does MARS perform if benign updates contain high-magnitude neurons due to data heterogeneity? Could this lead to false positives?

Thank you for raising this important concern. We have evaluated the impact of data heterogeneity on MARS in Appendix G (page 17). As shown in Fig. 3, when using a Dirichlet sampling parameter of $\alpha = 0.2$ —which induces extreme heterogeneity across client data distributions—existing defenses suffer dramatic performance degradation. In contrast, MARS maintains a true-positive rate (TPR) of 100% and a false-positive rate (FPR) of 0% under these highly heterogeneous conditions. This result demonstrates that MARS’s Wasserstein-distance–based clustering remains robust even when benign updates contain high-magnitude neurons due to non-IID data.

Could this method generalize to detecting other stealthy behaviors (e.g., model inversion or privacy leakage)?

Thank you for this insightful question. MARS is primarily designed to detect backdoor attacks, but in our experiments we also observed strong defense against Byzantine attacks (see Appendix M, page 22), successfully mitigating both data-poisoning and model-poisoning threats. However, the current version of MARS cannot directly detect other stealthy behaviors such as model inversion or privacy leakage. These threats are typically addressed on the client side—for example, by adding Gaussian noise, applying Top-kkk sparsification, or clipping sensitive parameters before uploads—whereas MARS operates on the server side and does not modify or inspect client-side parameter updates in that manner. Extending MARS to cover privacy-related attacks is an interesting direction for future work, and we plan to explore how MARS could be adapted or combined with client-side defenses to detect or mitigate such stealthy behaviors.

How sensitive is the model performance to the percentage of top BE values (e.g., $\kappa = 5%$ )? Are results robust to this choice?

Thank you for this valuable question. As shown in Table 9 (page 19), when $\kappa$ varies between 1% and 10%, MARS consistently achieves a 100% true-positive rate (TPR) and 0% false-positive rate (FPR). However, when $\kappa$ is increased to 20%, the TPR slightly decreases to 94.44% and the FPR rises to 0.35%. For even larger $\kappa$ , we observe further fluctuations in both TPR and FPR, never quite matching the optimal performance seen at lower values.

This behavior arises because including too many “irrelevant” backdoor energy values dilutes the Wasserstein-distance calculation and thus impairs the K-WMeans clustering. Crucially, since $\kappa$ values from 1% to 10% all yield the best possible defense metrics, we conclude that MARS’s performance is indeed robust to the choice of $\kappa$ within this range. We will clarify these findings in the revised manuscript.

Consider including runtime/efficiency information.

Thank you for the suggestion. We have empirically evaluated MARS’s runtime in Appendix J (page 20), demonstrating that it runs faster than existing defenses under the same experimental settings. Furthermore, as noted above, we will also add a dedicated discussion of the computational cost of computing Lipschitz constants and Wasserstein distances to the main paper to give readers a clear picture of our method’s efficiency.

2025-08-06

Thank you for your detailed rebuttal and addressed most of my concerns. I will keep my original score.

2025-08-06

We sincerely appreciate your valuable time dedicated to reviewing our manuscript and your constructive feedback. Your insightful comments have been pivotal in polishing our work!

最终决定Accept (poster)

2025-09-17

The paper proposes a malicious-aware defense against backdoor attacks in Federated Learning (FL). The proposed defense, MARS, relies on quantifying the backdoor energy (BE) of malignant neurons (without clean data access) and clusters the models via Wasserstein distance. Extensive experiments show that MARS is effective against several FL backdoor attacks and exhibits better performance than several state-of-the-art backdoor defenses.

There were initially concerns, including evaluation with more complex models (e.g., ViT), and questions about the Lipschitz/Wassertein distance usage. There are also questions about the clarity of the methodology. However, the authors have provided additional experiments and also clarified all the questions raised by the reviewers. The reviewers positively support the paper, and I encourage the authors to revise the paper accordingly in the final version.