5.5

/10

Rejected4 位审稿人

最低2最高5标准差1.5

3.8

置信度

创新性2.5

质量2.5

清晰度2.5

重要性2.3

NeurIPS 2025

LayerGuard: Poisoning-Resilient Federated Learning via Layer-Wise Similarity Analysis

Kaiqi Wang,Jiangang Shu,Qingfeng Tan,Bo Hu

OpenReview PDF

提交: 2025-05-12更新: 2025-10-29

TL;DR

A new FL defense achieves state-of-the-art robustness against advanced model poisoning attacks and effectively counters the emerging threat of Targeted Layer Poisoning (TLP) attacks.

摘要

关键词

Federated Learning; Security; Model Poisoning Attacks; Robust Aggregation

评审与讨论

审稿意见

评分: 2置信度: 42025-07-02

This paper proposes LayerGuard, a defense framework against Targeted Layer Poisoning (TLP) and other model poisoning attacks in Federated Learning (FL). The method introduces a novel layer-wise similarity analysis and a dual-weighting aggregation mechanism to better suppress malicious updates while preserving benign ones.

优缺点分析

Strength:

1 The paper is well-structured, making it easy to read.

2 Extensive experiments across various datasets demonstrate the effectiveness of LayerGuard.

Weaknesses:

1 The paper fails to clearly articulate how LayerGuard uniquely addresses the problem beyond previous layer-wise strategies. The motivation remains vague and feels incremental, especially considering that previous work has already hinted at per-layer granularity as a promising direction.

2 While the layer-wise cosine similarity metric (LCSS) and the joint weighting mechanism are well-formulated, the key ideas seem like relatively straightforward extensions of prior defenses (e.g., MultiKrum, FLDetector). The paper lacks a strong theoretical insight or technical innovation that would mark a significant advancement in the field.

3 The experimental evaluation only includes baselines published between 2017 and 2022 (Krum, Trimmed Mean, FLAME, FLTrust, etc.). No baseline from 2023 - 2025 is considered.

问题

1 How do the results of this paper compare to the latest methods?

局限性

The paper fails to clearly articulate how LayerGuard uniquely addresses the problem beyond previous layer-wise strategies. The motivation remains vague and feels incremental, especially considering that previous work has already hinted at per-layer granularity as a promising direction.

最终评判理由

While the authors have provided responses, my initial concern regarding the theoretical robustness of the proposed method remains unresolved. Therefore, I will maintain my original score.

格式问题

No formatting issues.

作者回复

2025-07-31

Thank you very much for the valuable comments. Please find our responses below.

Q1＆Q2: The paper fails to clearly articulate how LayerGuard uniquely addresses the problem beyond previous layer-wise strategies. The motivation remains vague and feels incremental, especially considering that previous work has already hinted at per-layer granularity as a promising direction.

＆

While the layer-wise cosine similarity metric (LCSS) and the joint weighting mechanism are well-formulated, the key ideas seem like relatively straightforward extensions of prior defenses (e.g., MultiKrum, FLDetector). The paper lacks a strong theoretical insight or technical innovation that would mark a significant advancement in the field.

R1＆R2: Thank you for the insightful comment. TLP attacks are not applied to the entire model but rather to specific layers. In contrast, most existing defenses perform analysis at the whole-model level. Therefore, to defend against TLP attacks, one naturally considers fine-graining the defense by shifting the perspective from the entire model to individual layers. This idea is intuitive and easy to come up with, as it follows a natural progression. It represents the first and most straightforward step toward addressing TLP.

However, simply recognizing this step is far from sufficient to build a defense that is stable, robust, and effective against TLP attacks. Every step beyond this point introduces substantial challenges and presents meaningful research problems：

1、Different layers have functional, parametric, and update-scale differences. As a result, malicious updates may behave differently across layers. Even assuming that a method like Layer-wise MultiKrum already knows which layers are poisoned, it still struggles to identify malicious updates in some layers—because in certain layers, the Krum Distance of benign and malicious updates are almost indistinguishable. This issue is detailed in Appendix-14 of [11]. Therefore, designing a robust metric to amplify the difference between benign and malicious updates is a highly challenging problem. To address this, LayerGuard proposes LCSS, which is not a simple cosine similarity, but a local similarity–based averaging cluster metric designed to mine the collusive behavior of attacks. LCSS can detect malicious updates in a way that is independent of the specific properties of different layers. Experiments also validate its superior performance.

2、Analyzing only intra-layer information is insufficient to defend against TLP attacks, as confirmed by our ablation study in Section 4.4. Layer-wise MultiKrum only analyzes information within each layer. In contrast, our method not only analyzes intra-layer information but also innovatively proposes to combine inter-layer information. In the aggregation stage, LayerGuard introduces a joint weighting mechanism to assess the credibility of each client in each layer from two complementary perspectives:User-level weights capture inter-layer behavior,Layer-specific weights reflect intra-layer behavior.

3、The most insidious aspect of TLP attacks is that they affect only selected layers while leaving others largely benign. Suppressing malicious updates in targeted layers without disrupting benign ones in non-targeted layers is highly challenging. Unlike Layer-wise MultiKrum, which applies a fixed treatment strategy across different layers, LayerGuard performs adaptive handling. In the Layer Update Boundary Definition phase, each layer is assigned its own benign boundary, allowing reasonable weight assignment even for benign updates coming from malicious users (i.e., for non-targeted layers).

The solutions we propose to address the above research challenges are the true innovations of our approach. The key idea of LayerGuard is not a straightforward extension of prior methods like MultiKrum or FLDetector, but a defense framework designed to tackle TLP-specific difficulties in a principled and robust way.

Q3: The experimental evaluation only includes baselines published between 2017 and 2022 (Krum, Trimmed Mean, FLAME, FLTrust, etc.). No baseline from 2023 - 2025 is considered.

R3: Thank you for this valuable suggestion. In the main experiments, we have added two stronger and more recent baselines for comparison:

Xu et al. (2024) – Dual Defense: Enhancing Privacy and Mitigating Poisoning Attacks in Federated Learning, NeurIPS 2024.

Yan et al. (2023) – RECESS Vaccine for Federated Learning: Proactive Defense Against Model Poisoning Attacks, NeurIPS 2023.

Following the suggestion in the original RECESS paper, we adopt the variant RECESS with Median when evaluating against MPAF and backdoor attacks.

Under untargeted attacks(The additions in Table 1):

Defense	Dataset	No Attack	Lie	Fang	Min-Max	Min-Sum	MPAF
RECESS	CINIC	53.41	49.57	51.13	50.29	51.38	50.22
	FashionMNIST	87.74	86.58	86.71	86.14	86.60	84.92
	CIFAR-10	63.88	59.10	60.99	61.32	62.16	60.75

Under LP attack(The additions in Table 2):

Defense	Model(Dataset)	Acc	BSR
RECESS	VGG19 (CIFAR-10)	74.65	78.44
	CNN (FashionMNIST)	88.43	76.59
	ResNet18 (CIFAR-10)	70.74	87.10
DDFed	VGG19 (CIFAR-10)	73.26	83.11
	CNN (FashionMNIST)	87.95	94.63
	ResNet18 (CIFAR-10)	68.07	94.44

RECESS demonstrates strong defense performance against untargeted attacks, but still falls short compared to LayerGuard. This robustness stems from LayerGuard’s fine-grained evaluation of client credibility across layers. Under the LP attack, both RECESS and DDFed fail to defend effectively. This is because defenses based on whole-network analysis are not sufficient to counter the layer-wise fine-grained nature of TLP attacks.

2025-08-06

Thank the authors for their rebuttal. Most of my previous concerns have been adequately addressed, and I will revise my score accordingly (before the official deadline). However, I still have some concerns regarding the theoretical robustness of the proposed method. Specifically, while the addition of a user-level filtering mechanism is intuitively appealing, it remains unclear whether this is sufficient to fundamentally overcome the limitations of layer-wise detection. For example, in scenarios where only a few layers are poisoned and these poisoned updates are crafted to closely resemble benign ones—as the authors themselves note (R1&R2-1)—such attacks may remain indistinguishable under the current framework. I would encourage the authors to provide a more rigorous theoretical analysis to better support the robustness claims of their method in such cases.

评论- Latest Clarifications in Response to Reviewer EDKW

2025-08-07

Thank you for your feedback and recognition of our previous response. We attempt to further address your concern.

Q1: However, I still have some concerns regarding the theoretical robustness of the proposed method. Specifically, while the addition of a user-level filtering mechanism is intuitively appealing, it remains unclear whether this is sufficient to fundamentally overcome the limitations of layer-wise detection. For example, in scenarios where only a few layers are poisoned and these poisoned updates are crafted to closely resemble benign ones—as the authors themselves note (R1&R2-1)—such attacks may remain indistinguishable under the current framework. I would encourage the authors to provide a more rigorous theoretical analysis to better support the robustness claims of their method in such cases.

R1: In TLP attack, malicious updates in most poisoned layers tend to resemble benign updates, as illustrated in Appendix-14 of [11]. Moreover, it is indeed possible that only a small number of layers are poisoned, especially when the model is approaching convergence.

Nonetheless, our defense remains effective for two main reasons:

The similarity metric we propose—LCSS—is not easily misled by malicious updates that imitate benign ones. It can still effectively identify malicious updates, regardless of which layer they appear in.
The User-Level Weight Assignment is unaffected by the number of poisoned layers. This is because the purpose of the user-level weight calculation is simply to determine whether a user is likely to be malicious, regardless of the specific layer(s) in which they are detected. Even if only a few layers are poisoned, anomalous users in those layers can still be flagged and assigned lower user-level weights.

We appreciate your suggestion to strengthen the theoretical analysis, and we plan to include additional discussion in the revised version of the paper.

审稿意见

评分: 5置信度: 42025-07-03

This paper introduces LayerGuard, a robust defense framework specifically designed to counter Targeted Layer Poisoning (TLP) attacks in federated learning. LayerGuard effectively identifies anomalous clients and adaptively detects compromised layers through fine-grained, layer-wise similarity analysis and the clustering behavior of malicious updates. A key innovation is its proposed joint weighting mechanism, which evaluates client credibility at the layer level from both inter-layer and intra-layer perspectives, enabling dynamic detection and adaptive aggregation. Extensive experiments demonstrate LayerGuard's superior performance compared to existing defense methods.

优缺点分析

Strengths

This paper identifies the limitations of existing defense mechanisms against Target Layer Poisoning (TLP) attacks and non-independent and identically distributed (non-IID) data, thereby establishing a strong rationale and necessity for the proposed LayerGuard.
LayerGuard introduces several original contributions that effectively mitigate TLP attacks. These include the Layer-wise Cosine Similarity Score (LCSS), mechanisms for detecting anomalous users and high-risk layers, and a dual-level (user and layer-wise) weighted aggregation mechanism.
The experimental evaluation is comprehensive, spanning diverse datasets, models, attack types, and federated learning settings. The results convincingly demonstrate LayerGuard's superior performance in defense effectiveness, detection accuracy, and robustness.

Weaknesses

It is recommended that the authors include theoretical analyses of the LCSS or weighted mechanism, providing theoretical bounds on aspects such as convergence or robustness.
The paper mentions a potential slight decrease in LayerGuard's accuracy when no attacks are present. This observation requires a clearer explanation and quantification. Furthermore, an analysis of the trade-off between defensive robustness and benign performance for the system would be beneficial.
The authors did not clarify the number of classes for the three datasets in Table 1. It is recommended to use datasets with different class numbers to enhance the credibility of the experimental results.
The paper currently lacks a detailed quantitative analysis of LayerGuard's computational overhead. We request the authors to provide a comprehensive breakdown of their computational complexity and practical runtime performance to thoroughly assess their scalability in large-scale federated learning deployments.
There are a few minor formatting issues throughout the paper. For instance, on page 4, "i, j ∈ 1, . . . , N" is missing parentheses and should be corrected to "i, j ∈ {1, . . . , N}".

问题

The paper in section 4.3 states that LayerGuard requires more than one malicious client to be effective. Could the authors please elaborate on the fundamental reasons for this limitation?
The benign boundary calculation in LayerGuard relies on the assumption that malicious clients constitute less than 50% of the total. Please clarify the specific impact on LayerGuard's effectiveness if the actual poisoning rate exceeds this threshold.
LayerGuard assigns user-level weights based on the number of high-risk layers identified for a client. However, is it possible for a malicious client to target only a small number of layers? In this case, the user-level weight assigned by the scheme would be relatively large, even though it should be smaller in practice.

局限性

yes.

格式问题

None.

作者回复

2025-07-31

We thank the reviewer for the thoughtful comments. We provide our responses below.

Q1: It is recommended that the authors include theoretical analyses of the LCSS or weighted mechanism, providing theoretical bounds on aspects such as convergence or robustness.

R1: We appreciate the reviewer’s suggestion regarding the theoretical analysis of LCSS and the proposed weighting mechanism. While LayerGuard is currently motivated and validated from a practical perspective—demonstrating robust empirical performance under a wide range of attacks and settings—we acknowledge that formal theoretical guarantees (e.g., convergence or robustness bounds) would further strengthen the framework's rigor and completeness.

However, providing such analysis is non-trivial due to the layer-wise structure and adaptive weighting scheme involved, which introduce strong interdependencies across clients and layers. We believe that a comprehensive theoretical analysis requires dedicated future work, potentially combining tools from robust statistics and federated optimization theory.

We have clarified this limitation and our plan to explore it further in future versions of the paper.

Q2: The paper mentions a potential slight decrease in LayerGuard's accuracy when no attacks are present. This observation requires a clearer explanation and quantification. Furthermore, an analysis of the trade-off between defensive robustness and benign performance for the system would be beneficial.

R2: We appreciate the reviewer’s suggestion. As briefly explained in Sec. 3.5 and Sec. 4.2, we now provide a more detailed clarification.

In the absence of malicious clients, the Anomalous User Identification module still filters users based on LCSS in each layer. As a result, some benign users with relatively high LCSS values may be mistakenly included in the anomalous user set. These users will be assigned lower weights in both the user-level and layer-specific weighting processes, which leads to a slight drop in accuracy even without attacks.

The High-Risk Layer Detection mechanism helps alleviate this issue. Since all users are benign, it is rare for any client to consistently exhibit high LCSS across multiple layers. Therefore, even if a benign user appears anomalous in a certain layer, it will not be assigned a significantly low user-level weight, which helps maintain overall performance in the benign setting.

Q3: The authors did not clarify the number of classes for the three datasets in Table 1. It is recommended to use datasets with different class numbers to enhance the credibility of the experimental results.

R3: Thank you for pointing this out. We clarified the number of classes for FashionMNIST and CIFAR-10 in Appendix B.1, but we did not explicitly state that CINIC also contains 10 classes. We will make this clearer in the revised version.

As for your suggestion to include datasets with varying numbers of classes to enhance the credibility of the results, we agree that this is a valuable direction. We plan to explore this aspect in future work to further validate the generality of LayerGuard under diverse classification settings.

Q4: The paper currently lacks a detailed quantitative analysis of LayerGuard's computational overhead. We request the authors to provide a comprehensive breakdown of their computational complexity and practical runtime performance to thoroughly assess their scalability in large-scale federated learning deployments.

R4: Thank you for raising this important concern. We now provide a detailed analysis of LayerGuard's computational complexity and its practical scalability in federated learning. The dominant cost of LayerGuard arises from computing pairwise cosine similarities per layer. Let $n$ be the number of clients, $L$ the number of parameter layers (e.g., weights and biases), and $\bar{D}$ the average parameter dimensionality per layer. The overall time complexity of LayerGuard is:

O(L \cdot n^2 \cdot \bar{D}) + O(L \cdot n \cdot \bar{D}) + \text{lower-order terms}

We evaluated the average per-round runtime of LayerGuard on different models under untargeted attack settings.

Model(Dataset)	FedAvg	LayerGuard
CNN (CIFAR-10)	13.85(s)	29.72(s)
VGG19 (CIFAR-10)	23.21(s)	53.20(s)

We believe that LayerGuard is scalable in large-scale federated learning (FL) systems. Although it theoretically incurs a quadratic cost with respect to the number of clients $n$ , i.e.,

O(L \cdot n^2 \cdot \bar{D}),

the design of LayerGuard is naturally parallelizable. Since the similarity computations and subsequent analysis are independent across layers, both the cosine similarity computations and layer-wise aggregation processes can be distributed across multiple computational threads or devices.

This parallel structure enables efficient execution even in high-dimensional and large-client FL scenarios, making the overhead practically manageable on modern hardware.

Q5: There are a few minor formatting issues throughout the paper. For instance, on page 4, "i, j ∈ 1, . . . , N" is missing parentheses and should be corrected to "i, j ∈ {1, . . . , N}".

R5: Thank you for pointing out the formatting issue. We will revise the notation on page 4 from "i, j ∈ 1, . . . , N" to "i, j ∈ {1, . . . , N}" and carefully check the rest of the paper to fix any remaining minor formatting inconsistencies in the final version.

Q6: The paper in section 4.3 states that LayerGuard requires more than one malicious client to be effective. Could the authors please elaborate on the fundamental reasons for this limitation?

R6: Thank you for the question. This limitation arises from the core idea of LCSS, which is to exploit the observation that the similarity among malicious updates tends to be higher than that among benign updates. When there is only one malicious client, this intra-group similarity pattern no longer exists, making it difficult for our method to distinguish the malicious update from benign ones. Therefore, LayerGuard is only effective when there are at least two malicious clients.

Q7: The benign boundary calculation in LayerGuard relies on the assumption that malicious clients constitute less than 50% of the total. Please clarify the specific impact on LayerGuard's effectiveness if the actual poisoning rate exceeds this threshold.

R7: Thank you for your question. If the proportion of malicious clients exceeds 50%, the computation of the benign boundary in the Layer Update Boundary Definition step becomes unreliable, as it may include malicious updates. This contamination can cause the estimated benign boundary to become abnormally high. As a result, some malicious updates may appear close to or even below the boundary, and thus be assigned relatively high layer-wise weights—potentially even reaching 1. Consequently, the influence of these poisoned updates may not be properly suppressed, leading to a rise in the Backdoor Success Rate (BSR).

Q8: LayerGuard assigns user-level weights based on the number of high-risk layers identified for a client. However, is it possible for a malicious client to target only a small number of layers? In this case, the user-level weight assigned by the scheme would be relatively large, even though it should be smaller in practice.

R8: Thank you for pointing this out. This situation—where a malicious client only poisons a small number of layers—can indeed occur, especially when the model is approaching convergence. However, LayerGuard’s High-Risk Layer Detection mechanism is specifically designed to mitigate such cases. When attacks are concentrated on only a few layers, those layers still tend to stand out in the similarity analysis and are likely to be identified as high-risk.

Even if some detection error occurs, as discussed in Sections 3.5 and 3.6, the number of high-risk layers for such clients typically remains low. This still results in a noticeable penalty in user-level weight, ensuring that their poisoned updates receive lower aggregation weights. Thus, the impact of malicious updates remains effectively suppressed..

2025-08-07

Thanks for the detailed response. I have no further questions.

审稿意见

评分: 2置信度: 52025-07-03

This paper proposes a new defense mechanism against poisoning attacks in Federated Learning (FL) by detecting anomalies on a layer-wise basis, making it particularly effective against targeted layer poisoning attacks. The defense then uses a joint weighting mechanism during aggregation to reduce the influence of malicious updates. Experimental results suggest that this method outperforms prior defenses like FLMAE and FLTrust under targeted poisoning attacks.

优缺点分析

Strengths:

The idea of leveraging layer-wise anomaly detection to identify malicious clients is interesting and addresses a relatively unexplored attack vector.
The method claims effectiveness against both targeted and untargeted poisoning attacks.

Weaknesses:

The method heavily relies on heuristic thresholding. For example, the malicious boundary $\gamma_{\text{mal}}^l$ is defined as the minimum similarity score among clients flagged as anomalous in layer $l$ . However, there is no justification provided for this choice or explanation of why it is optimal. It is strongly recommended that an ablation study be conducted to evaluate the impact of these thresholds, and that guidance be provided for setting them.
The lack of theoretical guarantees. The defense's success seems to depend on assumptions that are not clearly formalized. The threat model should be made more explicit—what exact conditions must be met for this method to remain effective?
The motivation to defend against targeted layer poisoning (TLP) attacks is only weakly justified. While Table 2 shows that existing defenses like FLAME and FLTrust underperform in this setting, the broader relevance of such attacks remains unclear. And I have another concern about whether the method is robust under standard (non-layer-targeted) backdoor attacks. Demonstrating performance in such settings would significantly improve the paper’s generalizability and impact.
Equation (3) appears to implicitly assume client collaboration. The threshold $\tau$ and the iterative process for its update seem ad hoc. There’s no clear stopping criterion or explanation for the choice of step size (0.05). This raises questions about robustness—e.g., what happens if the step size or initial value is changed?
The surprisingly low Mean Accuracy (MA), even in the no-attack scenario, raises concerns about the experimental setup or model convergence. Additional diagnostics or explanations are needed.
The comparison baselines are outdated—most defenses cited predate 2023. Stronger, more recent defenses should be included for a fair evaluation.

问题

Conduct thorough ablation studies for all threshold-related parameters and justify design choices.
Clearly define the threat model and discuss the assumptions under which this defense is expected to work.
Evaluate the method against stronger, more recent baselines, including:

Xu et al. (2024). Dual Defense: Enhancing Privacy and Mitigating Poisoning Attacks in Federated Learning. NeurIPS 2024.

Xie et al. (2024). FedRedefense: Defending Against Model Poisoning Attacks Using Model Update Reconstruction Error. ICML 2024.

Yazdinejad et al. (2024). A Robust Privacy-Preserving Federated Learning Model Against Model Poisoning Attacks. IEEE TIFS.

Test the defense under general backdoor attacks, beyond just layer-specific ones, to demonstrate broader applicability.
Revise and validate the experimental setup, particularly in the benign case, to ensure baseline accuracy is reasonable.

局限性

Not provided

格式问题

N/A

作者回复

2025-07-31

Thanks to the reviewers' insightful suggestions, which are helpful to strengthen the completeness of this work.

Q1: Conduct thorough ablation studies for all threshold-related parameters and justify design choices.

R1.1-τ₀ and step size: In our method, the parameters τ₀ = 0.95 and step size = 0.05 are used to detect anomalous users that exhibit high Layer-wise Cosine Similarity Scores (LCSS) in each layer. Since different layers may have different parameter distributions, the LCSS of malicious updates can vary significantly across layers. For example, in layer A, malicious updates may lie in the range [0.90, 0.94], while in layer B they may lie in [0.80, 0.83].

Therefore, starting from τ₀ and gradually decreasing it with a fixed step size allows us to adaptively identify anomalous users in each layer while avoiding misclassification of benign users. In this work, we set τ₀ = 0.95 and step size = 0.05 based on the trends observed in Figure 2, as explained in Section 3.4.

Below, we conduct an ablation study to demonstrate that selecting a reasonable τ₀ and step size is a flexible process, and it becomes even easier with the help of distributions such as those shown in Figure 2. The focus is on step size, since τ₀ = 0.95 is simply the result of the first decrement from 1 (i.e., 1 − 0.05 = 0.95). Therefore, we experimentally evaluate how the step size affects the defense performance.

step size	Untargeted Attack(Acc)	LP Attack(Acc/BSR)
0.01	62.85	86.76/4.30
0.03	64.12	89.11/1.05
0.05	63.91	89.88/0.41
0.07	63.99	90.01/0.45
0.1	63.84	88.68/0.38
0.15	62.77	85.32/0.43
0.2	61.68	81.91/0.39

The experimental settings are the same as those in Section 4.3. From the results, we observe that when the step size is set between 0.05 and 0.1, the defense performance shows no significant difference. A smaller step size leads to a higher BSR, because it may fail to cover most of the malicious users' LCSS, resulting in some malicious users being assigned larger user-level weights, which in turn increases the BSR. On the other hand, a larger step size reduces accuracy, because although it can cover malicious updates, it may also mistakenly include some benign updates. This causes certain benign users to be assigned smaller user-level weights, ultimately degrading the accuracy of the main task.

R1.2- $\gamma_{\text{mal}}^l$ : The Layer-Specific Weight Assignment is essentially a process of assigning linearly decreasing penalty weights based on the LCSS values. Any user with an LCSS value higher than $\gamma_{\text{mal}}^l$ will be assigned a layer-specific weight of 0. In Anomalous User Identification, we determine the set of anomalous users in each layer — that is, a group of users with relatively high LCSS values. Therefore, the most reasonable and intuitive choice for $\gamma_{\text{mal}}^l$ is to set it as the minimum LCSS value within the set of anomalous users.

In this way, all users in the anomalous set—whose LCSS values are greater than or equal to the minimum—will be assigned the lowest layer-specific weight, i.e., 0.

We now conduct an ablation study to evaluate how different settings of $\gamma_{\text{mal}}^l$ affect the defense performance. Specifically, we test four strategies for setting $\gamma_{\text{mal}}^l$ : the maximum, median, average, and minimum LCSS values within the anomalous user set.

$\gamma_{\text{mal}}^l$	Untargeted Attack(Acc)	LP Attack(Acc/BSR)
maximum	63.87	72.62/16.11
median	64.06	74.13/11.48
average	63.81	73.99/10.87
minimum	63.91	75.41/5.22

The experimental settings are the same as those in Section 4.4. As the value of $\gamma_{\text{mal}}^l$ increases, malicious updates in the anomalous user set whose LCSS values are lower than $\gamma_{\text{mal}}^l$ will be assigned larger layer-specific weights, which ultimately leads to a higher BSR.

Q2: Clearly define the threat model and discuss the assumptions under which this defense is expected to work.

R2: We sincerely apologize for not clearly stating our threat model earlier. We will add a dedicated Threat Model section after Related Works in the revised version to clarify this. Below, we provide a concise description of the threat model.

For Attacker, Attacker’s Goal. The attacker’s goal is to perform both untargeted model poisoning attacks and targeted layer poisoning (TLP) attacks. It is important to note that we do not define TLP as a backdoor attack in general. In this paper, the introduced LP attack (Section 2.2) is a backdoor-based TLP attack, and we do not discuss other forms of TLP attacks because this fine-grained poisoning strategy has only recently been proposed. The LP attack is the first and, to our knowledge, the only representative of such attacks so far. Nevertheless, we believe that our fine-grained detection method is general and can be extended to future variants of TLP attacks.Attacker’s Capabilities. The attacker controls a subset of clients and can fully manipulate their gradient updates in each round. The attacker has access only to local training information of these clients and cannot compromise or interfere with the server.Attacker’s Knowledge. The attacker has no knowledge of the gradient updates of benign clients.

For Defender, The defender (i.e., the server) does not know the distribution of client-side data, nor the number or identities of malicious clients. The only available information is the uploaded gradient updates from all participating clients.Defender’s Assumptions. There are two key conditions for the defense to be effective: First, the number of attackers must be greater than one, because the core idea of LCSS is to capture that the similarity among malicious updates is higher than that among benign updates. Second, the number of attackers must be smaller than the number of benign users, which is a necessary condition for the High-Risk Layer Detection and Layer Update Boundary Definition to function correctly.

Q3: Evaluate the method against stronger, more recent baselines.

R3: Thank you for this valuable suggestion. In the main experiments, we have added two stronger and more recent baselines for comparison:

Xu et al. (2024) – Dual Defense: Enhancing Privacy and Mitigating Poisoning Attacks in Federated Learning, NeurIPS 2024.

Yan et al. (2023) – RECESS Vaccine for Federated Learning: Proactive Defense Against Model Poisoning Attacks, NeurIPS 2023.

Following the suggestion in the original RECESS paper, we adopt the variant RECESS with Median when evaluating against MPAF and backdoor attacks.

Under untargeted attacks(The additions in Table 1):

Defense	Dataset	No Attack	Lie	Fang	Min-Max	Min-Sum	MPAF
RECESS	CINIC	53.41	49.57	51.13	50.29	51.38	50.22
	FashionMNIST	87.74	86.58	86.71	86.14	86.60	84.92
	CIFAR-10	63.88	59.10	60.99	61.32	62.16	60.75

Under LP attack(The additions in Table 2):

Defense	Model(Dataset)	Acc	BSR
RECESS	VGG19 (CIFAR-10)	74.65	78.44
	CNN (FashionMNIST)	88.43	76.59
	ResNet18 (CIFAR-10)	70.74	87.10
DDFed	VGG19 (CIFAR-10)	73.26	83.11
	CNN (FashionMNIST)	87.95	94.63
	ResNet18 (CIFAR-10)	68.07	94.44

Q4: Test the defense under general backdoor attacks, beyond just layer-specific ones, to demonstrate broader applicability.

R4: Our work primarily focuses on Targeted Layer Poisoning (TLP) attacks — a newly proposed and highly stealthy and promising poisoning strategy — as well as untargeted attacks, which typically cause more severe degradation in model performance.

Nevertheless, to demonstrate LayerGuard’s versatility, we have additionally conducted experiments against constrain-and-scale, a representative general backdoor attack.

Dataset	constrain-and-scale	Krum	LayerGuard
CINIC	Acc	25.95	53.41
	BSR	98.61	0.26
FashionMNIST	Acc	64.16	86.98
	BSR	99.14	0.36
CIFAR-10	Acc	32.33	63.96
	BSR	99.54	0.17

Q5: Revise and validate the experimental setup, particularly in the benign case, to ensure baseline accuracy is reasonable.

R5: We appreciate the reviewer’s concern. The experimental setup in our main experiments mostly follows the configurations referenced in [26] and [11]. The reproduced baseline results are consistent with those reported in their original papers, which supports that the baseline accuracy is reasonable.

We believe the relatively lower Mean Accuracy under the benign setting is primarily due to the high degree of non-IID data distribution in this setup. This reflects a more realistic and challenging federated learning scenario.

2025-08-06

Thanks to the authors for their efforts in addressing my concerns. While I appreciate the idea of incorporating layer-wise and client-wise information, my main concern about the method's heavy reliance on heuristic thresholding still remains. There are multiple heuristic thresholds involved, and the process often follows a "try-until-it-works" approach.

For example,

"We then determine the final anomalous set $B_l$ by selecting the largest threshold $\tau$ in the descending sequence $\{\tau_0, \tau_0 - 0.05, \ldots, 0\}$ such that $f_l(\tau)$ is non-empty."

It’s unclear why the first threshold that yields a non-empty set is sufficient to capture all malicious clients, especially in the absence of theoretical justification.
Similarly, for Eqn. (5)—is it possible that the resulting set contains only one layer? If so, how reliable is the method when only one layer is flagged as highly risky?
Another point: based on Figure 2 and the overall methodology, malicious clients tend to exhibit the highest LCSS values on the attacked layers. Doesn’t this imply that their model updates are also quite similar? If so, is there an implicit assumption that malicious clients are aligned in their updates? Would the method still work if the attackers are not coordinated, or use adaptive strategies to intentionally reduce similarity?
Regarding DDFed, while I recognize its not strong under LP attacks, the results for untargeted attacks are missing. Including them would give a clearer picture of its general robustness.
Finally, although the added results with the constrain-and-scale attack are appreciated, they alone aren’t enough to demonstrate the method’s generalizability to broader backdoor attacks. Many modern backdoor attacks avoid obvious scaling and keep poisoned models close to benign ones. Can the method still detect such subtle attacks?
Overall, I believe the paper still needs to better clarify the applicability of the method, and explain why the current set of heuristic thresholds is sufficient to consistently detect malicious clients across various attack types.

I would appreciate if the authors could further discuss these points.

评论- 2/2-Latest Clarifications in Response to Reviewer QnKZ

2025-08-06

Q3: Another point: based on Figure 2 and the overall methodology, malicious clients tend to exhibit the highest LCSS values on the attacked layers. Doesn’t this imply that their model updates are also quite similar? If so, is there an implicit assumption that malicious clients are aligned in their updates? Would the method still work if the attackers are not coordinated, or use adaptive strategies to intentionally reduce similarity?

R3: Thank you for the question. In TLP attacks, the model updates of malicious clients on the attacked layers often exhibit high similarity. As noted in Sec. 3.1, our core idea is inspired by prior work [13, 14] highlighting that a certain degree of similarity among malicious updates is typically necessary to significantly compromise FL. In our setting, such high similarity is not an explicit assumption but rather an observed behavioral outcome: malicious clients share a common attack objective, which naturally drives their updates toward similar directions. For example, in targeted poisoning attacks, malicious clients may use inputs containing the same backdoor trigger or share the same predefined label-mapping objective. In untargeted poisoning attacks, their common goal is to find a malicious update direction that maximizes the degradation of overall model accuracy while remaining stealthy .

Moreover, even if attackers deliberately attempt to reduce their mutual similarity—such as in adaptive strategies—the method remains effective, as shown in our evaluation of the Adaptive Min-Sum attack (Appendix D). In this case, LayerGuard’s accuracy shows a slight decrease but still remains within an acceptable range. Although the Adaptive Min-Sum attack is more stealthy than traditional data poisoning attacks, it does not cause substantial damage to the system. This outcome is the result of a trade-off: in attempting to evade detection by our method, the attackers reduce their mutual similarity and thereby inadvertently weaken the effectiveness of their own attack.

Q4: Regarding DDFed, while I recognize its not strong under LP attacks, the results for untargeted attacks are missing. Including them would give a clearer picture of its general robustness.

R4: We apologize that, due to the limited time available during the rebuttal phase, we were not able to include the untargeted attack results for DDFed in our earlier response. We have since conducted these experiments, and the results are as follows:

Defense	Dataset	No Attack	Lie	Fang	Min-Max	Min-Sum	MPAF
DDFed	CINIC	52.52	46.80	48.19	47.73	47.95	46.48
	FashionMNIST	86.56	82.01	85.77	83.28	82.77	82.22
	CIFAR-10	62.38	55.33	56.79	57.24	57.61	55.15

Q5: Finally, although the added results with the constrain-and-scale attack are appreciated, they alone aren’t enough to demonstrate the method’s generalizability to broader backdoor attacks. Many modern backdoor attacks avoid obvious scaling and keep poisoned models close to benign ones. Can the method still detect such subtle attacks?

R5: We thank you for raising this point and would like to respectfully clarify a potential misunderstanding.

The constrain-and-scale attack is specifically designed to produce poisoned models that remain close to benign models in terms of weight distance or update direction. Therefore, it represents a subtle type of attack.

Although this attack applies a scaling operation, our defense method, LayerGuard, does not rely on any clipping-like operation. Instead, it achieves significant defensive effectiveness by analyzing the malicious updates purely in the dimension of update direction. This demonstrates that LayerGuard possesses the capability to withstand such subtle attacks.

2025-08-09

Thank you for all the input. I don’t have any further questions at this point. I appreciate all the effort from the authors so far. My last argument is that although the constrain-and-scale attack contains the "constrain" component, it also includes the "scale" part — which causes the updates of the malicious clients to be proportionally larger than those of the benign ones. This leads to their high similarity in LCSS, making the generalizability of the method to other backdoor attacks still not strong.

I’ll take our discussion into account and update my score accordingly before August 13.

评论- Latest Clarifications in Response to Reviewer QnKZ

2025-08-09

We sincerely thank you for your continued engagement and thoughtful feedback. We would like to take this opportunity to briefly clarify our perspective regarding your latest comment.

Q1: My last argument is that although the constrain-and-scale attack contains the "constrain" component, it also includes the "scale" part — which causes the updates of the malicious clients to be proportionally larger than those of the benign ones. This leads to their high similarity in LCSS, making the generalizability of the method to other backdoor attacks still not strong.

R4: We will address your concern from three aspects.

1.The constrain-and-scale attack first performs the constrain step and then applies the scaling operation. The scaling operation, in simple terms, multiplies the final update by a constant. This does not change the direction or make it closer to that of other updates; it only changes the magnitude, thereby affecting distance-based metrics such as Euclidean distance, but not angle-based metrics.

2.The TLP attack itself is a subtle type of attack. In a TLP attack, malicious updates in most poisoned layers tend to resemble benign updates, as illustrated in Appendix-14 of [11]. The authors of [11] found that even when using their proposed adaptive defense against TLP, such attacks could not be effectively mitigated precisely because, in the majority of layers, the malicious updates are extremely similar to benign ones.

3.We have newly introduced an advanced and subtle backdoor attack, Attack-Krum-Backdoor (Bhagoji et al., 2019 – Analyzing Federated Learning Through an Adversarial Lens, ICML 2019), as an additional comparison to evaluate the defensive performance of LayerGuard. This attack is highly stealthy because the malicious parameters are close to the benign parameters and thus appear harmless. Its core objective is to make predictions on backdoor samples as accurate as possible while keeping the accuracy on clean samples unaffected. To remain stealthy, the attacker also ensures that its updates are as close as possible to the average of benign updates. The FL setup follows that of our main experiments.

Dataset	Attack-Krum-Backdoor	Krum	LayerGuard
CINIC	Acc	22.58	53.20
	BSR	99.11	0.87
FashionMNIST	Acc	63.45	87.19
	BSR	96.44	0.11
CIFAR-10	Acc	28.21	63.77
	BSR	98.93	0.65

评论- 1/2-Latest Clarifications in Response to Reviewer QnKZ

2025-08-06

We greatly appreciate your follow-up feedback, which allows us to provide further clarifications and address the remaining concerns in more detail.

Q1: It’s unclear why the first threshold that yields a non-empty set is sufficient to capture all malicious clients, especially in the absence of theoretical justification.

R1: We would like to clarify that our method does not assume the first threshold yielding a non-empty set will necessarily capture all malicious clients. As observed in our experiments, malicious updates typically exhibit relatively high LCSS concentrated within a certain range; however, the threshold at which Anomalous User Identification produces a non-empty set may not fully cover this range. We refer to this as the “near-threshold distribution phenomenon” (see Sec. 3.6). The Layer-Specific Weight Assignment is explicitly designed to address this situation: while the User-Level Weight Assignment handles the set of anomalous users captured at the identified threshold, the Layer-Specific Weight Assignment further assigns lower layer-specific weights to malicious updates that fall just outside this set but still exhibit high LCSS.

Q2: Similarly, for Eqn. (5)—is it possible that the resulting set contains only one layer? If so, how reliable is the method when only one layer is flagged as highly risky?

R2: Thank you for raising this important concern. There are two situations in which the set obtained from Eqn. (5) may indeed contain only a single layer.

The first is when a TLP attack poisons multiple layers, but due to the near-threshold distribution phenomenon (see Sec. 3.6), the sizes of the anomalous user sets vary across layers. In this case, only the layer(s) with the largest anomalous set size will be identified as high-risk. If there is only one such layer, then $|L_{HR}| = 1$ . Although this results in the loss of information from other layers, it does not significantly affect the calculation of user-level weights. This is because the purpose of the user-level weight calculation is simply to determine whether a user is likely to be malicious, regardless of the specific layer(s) in which they are detected. When the denominator in Eqn. (7) becomes 1, all anomalous users contained in the set for that single layer will receive a user-level weight of 0. Even if not all malicious users are captured in that layer due to the near-threshold distribution phenomenon, the Layer-Specific Weight Assignment will further assign them lower layer-specific weights to mitigate their impact. The primary effect in this case is on benign users: if a benign user is mistakenly included in that single high-risk layer (see Sec. 3.5), their user-level weight will also be set to 0.

The second situation is when all TLP attackers poison only the same single layer. This can occur in practice, especially when the model is near convergence. In such a case, the reasoning above applies directly, and the reliability of the method remains unaffected.

审稿意见

评分: 5置信度: 22025-07-03

The paper addresses the following vulnerability in FL: existing defenses struggle to detect targeted layer poisoning attacks, which stealthily manipulate only selected layers of the global model. Moreover, since existing defense methods evaluate the models as a whole, they are not able to detect such stealthy attacks. The authors propose LayerGuard, a defense framework that detects malicious clients by performing layer-wise similarity analysis rather than treating the model as a whole. LayerGuard computes per-layer cosine similarity scores to identify suspiciously similar clients and layers likely to be under attack. They compute user-level and layer-specific weights to down-weight updates likely to be poisoned. Their adaptive weighting system helps mainintaining benign information even from the untargeted layers of the malicious users.

优缺点分析

Strengths:

The paper identifies a clear weakness in existing FL defenses: they evaluate only the whole network, missing stealthy attacks that focus on single (or a few ) layer(s) of the model.
The proposed method is technically sound. Layer-wise similarity analysis seems effective experimentally even in the scenarios that there is no attack.
Evaluations are comprehensive, containing various datasets, models, attack types and defense methods.
An adaptive attack scenario is investigated.

Weakness:

LayerGuard computes pair-wise cosine similarities for every layer, performs clustering, and maintains per-layer weighting. The paper gives no timing, memory-footprint, or communication-round analysis, leaving open whether it scales to large models or hundreds of clients.
Authors state that their method works when malicious number of clients are larger than 1, which we accept as the limitation. However, what if there are multiple malicious clients and they target different layers of the model each round. I doubt that the proposed method would be able to detect it. This might be considered as an adaptive attack as well.

问题

see weaknesses

局限性

yes

最终评判理由

Authors were engaging during the rebuttal phase. Going over discussions with other reviewers too, I will keep my score as 5. Authors mostly address questions and provided results for more baselines. However, still the theoretical backround for the proposed method is missing.

格式问题

none

作者回复

2025-07-31

We thank the reviewer for the thoughtful comments. We provide our responses below.

Q1: LayerGuard computes pair-wise cosine similarities for every layer, performs clustering, and maintains per-layer weighting. The paper gives no timing, memory-footprint, or communication-round analysis, leaving open whether it scales to large models or hundreds of clients.

R1 Thank you for raising this important concern. We now provide a detailed analysis of LayerGuard's computational complexity and its practical scalability in federated learning. The dominant cost of LayerGuard arises from computing pairwise cosine similarities per layer. Let $n$ be the number of clients, $L$ the number of parameter layers (e.g., weights and biases), and $\bar{D}$ the average parameter dimensionality per layer. The overall time complexity of LayerGuard is:

O(L \cdot n^2 \cdot \bar{D}) + O(L \cdot n \cdot \bar{D}) + \text{lower-order terms}

The first term corresponds to computing cosine similarities between every pair of client updates at each layer, and the second term stems from the final weighted aggregation step. The rest of the pipeline (user-level weight computation, layer-specific weight adjustment, etc.) contributes only $O(L \cdot n + n \log n)$ , which is negligible compared to the similarity computations. LayerGuard exhibits a time complexity comparable to that of some advanced defenses (e.g., Krum), and the overhead remains acceptable even in the presence of multiple clients.

We evaluated the average per-round runtime of LayerGuard on different models under untargeted attack settings.

Model(Dataset)	FedAvg	LayerGuard
CNN (CIFAR-10)	13.85(s)	29.72(s)
VGG19 (CIFAR-10)	23.21(s)	53.20(s)

We believe that LayerGuard is scalable in large-scale federated learning (FL) systems. Although it theoretically incurs a quadratic cost with respect to the number of clients $n$ , i.e.,

O(L \cdot n^2 \cdot \bar{D}),

This parallel structure enables efficient execution even in high-dimensional and large-client FL scenarios, making the overhead practically manageable on modern hardware.

Q2: Authors state that their method works when malicious number of clients are larger than 1, which we accept as the limitation. However, what if there are multiple malicious clients and they target different layers of the model each round. I doubt that the proposed method would be able to detect it. This might be considered as an adaptive attack as well.

R2: Thank you for raising this insightful point.

In TLP attacks, the choice of poisoned layers is typically driven by specific objectives. For instance, in Layer Poisoning (LP) attacks, layers are selected based on their influence on the Backdoor Success Rate (BSR)—layers with higher impact on BSR are more likely to be targeted. This leads to a natural convergence in target layer selection across attackers, as discussed in Section 5.4 of [11]. That is, to mount an effective attack, adversaries tend to poison the same or similar subset of sensitive layers.

Under the scenario you proposed—where multiple malicious clients each poison different layers in each round—our method's detection accuracy may decrease due to reduced anomaly aggregation in individual layers. However, this strategy also dilutes the strength of the attack itself, making it harder for the malicious gradients to shift the global model meaningfully. As a result, there exists a trade-off for the attacker: while attempting to evade detection by distributing poisoned layers, the attack’s effectiveness is weakened and more likely to be neutralized during aggregation.

This phenomenon highlights an important strength of LayerGuard: by enforcing fine-grained layer-wise analysis, it encourages adversaries to spread their efforts thinly across layers, which in turn reduces attack efficacy.

2025-08-09

Thank you for the response. I don't have further questions.

2025-08-06

Dear Reviewer,

We sincerely appreciate your time and effort in the review process. Could you kindly confirm whether you have read the authors’ rebuttal and reassessed the paper, if you have not already done so?

As a reminder, Reviewers must participate in discussions with authors before submitting “Mandatory Acknowledgement”.

Best regards, AC

最终决定Reject

2025-09-17

This paper received the following mixed ratings: Accept, Accept, Reject, Reject. Authors present a defense framework called LayerGuard to counter Targeted Layer Poisoning (TLP) attacks in federated learning by analyzing model updates at the layer level. It introduces a new layer-wise similarity analysis and a dual-weighting aggregation mechanism to better suppress malicious updates while preserving benign ones. Experiments show good performance across datasets and attack types. However, reviewers also highlighted various concerns, including the lack of theoretical robustness and comparisons to more recent baselines, multiple heuristic thresholds involved, etc. After the author-reviewer discussion, several concerns remained on the method’s generalizability and robustness. AC encourages the authors to revise the paper to address those concerns raised by the reviewers, and make a future submission. After careful consideration, AC recommends not accepting the paper in its current form.