5.8

/10

Poster4 位审稿人

最低5最高7标准差0.8

4.0

置信度

正确性3.3

贡献度2.3

表达2.8

NeurIPS 2024

RFLPA: A Robust Federated Learning Framework against Poisoning Attacks with Secure Aggregation

Peihua Mai,Ran Yan,Yan Pang

OpenReview PDF

提交: 2024-05-15更新: 2024-11-06

摘要

Federated learning (FL) allows multiple devices to train a model collaboratively without sharing their data. Despite its benefits, FL is vulnerable to privacy leakage and poisoning attacks. To address the privacy concern, secure aggregation (SecAgg) is often used to obtain the aggregation of gradients on sever without inspecting individual user updates. Unfortunately, existing defense strategies against poisoning attacks rely on the analysis of local updates in plaintext, making them incompatible with SecAgg. To reconcile the conflicts, we propose a robust federated learning framework against poisoning attacks (RFLPA) based on SecAgg protocol. Our framework computes the cosine similarity between local updates and server updates to conduct robust aggregation. Furthermore, we leverage verifiable packed Shamir secret sharing to achieve reduced communication cost of $O(M+N)$ per user, and design a novel dot product aggregation algorithm to resolve the issue of increased information leakage. Our experimental results show that RFLPA significantly reduces communication and computation overhead by over $75%$ compared to the state-of-the-art secret sharing method, BREA, while maintaining competitive accuracy.

关键词

federated learningpoisoning attackprivacy protection

评审与讨论

审稿意见

评分: 5置信度: 42024-07-03

The paper proposes RFLPA, which deploys a robustness algorithm based on cosine similarity detection on SecAgg, defending against privacy attacks from the server and poisoning attacks from clients. Furthermore, the paper reduces communication and computation costs through packed Shamir secret sharing and dot-product aggregation.

优点

1.The structure of the paper is clear, providing a comprehensive and detailed explanation of how RFLPA deploys FLTrust's cosine similarity detection scheme on SecAgg and reduces SecAgg's overhead through packed Shamir secret sharing and dot-product aggregation.

2.Experiments show that, compared to the previous work BREA, RFLPA reduces communication and computation costs by 75% while maintaining the same accuracy.

3.The paper also provides a systematic theoretical analysis and proof of RFLPA's convergence and the cryptographic protocols used.

缺点

The paper seems to achieve practical improvements over BREA mainly by reducing communication costs, but the phase of computing cosine similarity among clients introduces additional communication costs. The advantages over other state-of-the-art MPC-based solution ELSA[1], HE & ZKP-based solution RoFL[2], and Blockchain-based solution PBFL[3] are not explained in detail in the paper.
Regarding Byzantine robustness, the paper adopts the FLTrust approach, assuming the server holds clean root dataset. However, this requires a substantial number of clean public samples of all labels, making this assumption impractical in real-world scenarios, which is conflicted to the goal of practical PPFL solution.
The RFLPA framework seems like questionable.As descripted in section 3.2, the security goal is to defend passive inference and active inference attacks both. But the packed secret shares of $g_i$ generted by client i are sent to the server and the server transmit them to the other clients using a broadcast channel. Which means that the server have all shares of the secret $g_i$ and can recover it. So, what kind of role is played by the server?
In the experimental section, a) the robustness testing of BREA only includes two types of Byzantine attacks. The paper lacks strong experiments to demonstrate how BREA performs under more subtle attacks like Krum-attack and Badnet. b) The communication overhead of RFLPA could not be small, because all clients are involved all computation of consin similarity $s^j_k$ of any user pair (i,j), the author should give out the detail results.
The notations and formulations should be more clear.

a) Line95, $D$ is not predefined.

b) Line210, $\tilde{\mathbf{g}}_i$ should be $\bar{\mathbf{g}_i}$ .

[1] Rathee, Mayank, et al. "Elsa: Secure aggregation for federated learning with malicious actors." 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 2023. [2] Lycklama, Hidde, et al. "Rofl: Robustness of secure federated learning." 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 2023. [3] Miao, Yinbin, et al. "Privacy-preserving Byzantine-robust federated learning via blockchain systems." IEEE Transactions on Information Forensics and Security 17 (2022): 2848-2861. [4] Hao, Meng, et al. "Efficient, private and robust federated learning." Proceedings of the 37th Annual Computer Security Applications Conference. 2021.

问题

see the above

局限性

作者回复

2024-08-06

Dear Reviewer VWie,

Thanks for your time and valuable comments. Hope our response below could address your concern.

W1-a&W4-b: Additional communication cost of cosine similarity computation

The additional communication cost introduced by cosine similarity computation is small compared to the communication overhead reduced by our approach. We utilize secret sharing (SS) instead of homomorphic encryption (HE) to compute the cosine similarity since: (1) Existing HE-based frameworks rely on the unrealistic assumption of 2 non-colluding parties. (2) According to Appendix L.7.1, the HE-based methods take massive computation time, which far outweights the communication time introduced by SS-based frameworks.

We leverage packed secret sharing to reduce the communication overhead. Appendix H presents the analysis of the overhead, and here we explain in more details: (1) For N users and M-dimensional gradients, we packed O(N) elements within a single polynomial. Then each user transmits O(max(M/N,1)*N)=O(M+N) (instead of O(MN)) elements. (2) By computing cosine similarity of other users, each user obtains O(N) partial dot products in total. (3) During dot product aggregation, each user secret shares the partial dot products. To reduce overhead, they pack O(N) elements within a single polynomial. Then each user communicates O(N*N/N)=O(N) elements. (4) The user locally computes final secret shares of the consine similarity and uploads them to the server. Since the final shares are secret shares of dot products by packing O(N) secret shares, each user only requires to send O(N/N)=O(1) elements to the server. Benefiting from the packed secret share algorithm, the total per-user communication cost is reduced to O(M+N), a siginificant improvement over the O(MN + N) cost in BREA.

[Table R1. Break down of per-user communication cost. Cosine similarity computation incurred O(N) additional communication cost, lower than the original scale O(M+N).]

Stage	Communication cost	Incurred by cosine similarity computation
Download parameters	O(M)	No
Secret share local gradient	O(M+N)	No
Upload aggregated update	O(M/N+1)	No
Sum (original)	O(M+N)
Secret share partial dot product	O(N)	Yes
Upload final share of dot product	O(1)	Yes
Receive trust score	O(N)	Yes
Sum (cosine similarity)	O(N)

W1-b: Advantages over other solutions

Thanks for your suggestion. We summarize the advantages of our method with in Table R2, with explanation as followed:

ELSA[1]: Firstly, ELSA relies on the vulnerable assumption of 2 non-colluding servers. Secondly, ELSA utilizes a naive and limited method to filter poisonous gradients, i.e., bounding the $l_2$ norm of each gradient. It's completely impractical to generalize their framework to more advance defense method such as Krum. In contrast, our framework is compatible with defense method such as FLTrust and Krum.
RoFL[2]: Similar to ELSA, RoFL is designed specifically for a naive robust aggragation method, norm bounding. Furthermore, RoFL relies on expensive zero-knowledge proofs. It's estimated to take 3.6 hours per round to train a 1.6M MNIST classifier, while for RFLPA it's within 10 minutes in the same setting.
PBFL[3]: We have explained the limitations of PBFL in appendix L.4 and L.7.1, in terms of the assumption of 2 non-colluding parties, and expensive computation cost.
SecureFL[4]: SecureFL also relies on the unrealistic assumption of two non-colluding parties. Additionally, it utilizes expensive MPC and HE methods. The computation time for training a 1.6M MNIST classifier is estimated to be 2.5 hours per round, much higher than that for RFLPA.

[Table R2. Comparison of various solutions. More * implies higher level of computation overhead.]

	Collusion threshold	Robust Aggregation Rule	Computation Overhead
ELSA	1	Norm bound	**
RoFL	O(N)	Norm bound	***
PBFL	1	FLTrust	****
SecureFL	1	FLTrust	***
RFLPA	O(N)	FLTrust	**

W2: Assumption on clean public samples

Thanks for your comment. Firstly, the required sample data is of small size, e.g., 200 samples. The gobal model is robust even when the root dataset diverges slightly from the overall training data distribution. Secondly, we have proposed some remedies in the absence of such dataset (see response to Q1 in global rebuttal), including comparison with global weights and KRUM.

W3: The server have all shares of the secret gi and can recover it

Thanks for your comments. We have explained in Section 4.6 that the secret shares are encrypted by the clients' secret key before sending to the server, and thus the server is ignorance of the values for secret shares given the IND-CPA of the encryption system. Specifically, the clients establish the secret keys with each other through Diffie-Hellman key exchange protocol. During secret sharing, each client u uses the common key $k_{uv}$ to encrypt the message sent to client v, and client v could decrypt the cyphertext with the same key.

W4: Experiments on subtle attacks

Thanks for your suggestion. In our response to Q2 in global rebuttal, we added experiments on the subtle attacks. The empirical results validate RFLPA's robustness against KRUM attack, BadNets, and scaling attacks.

W5: Thanks for pointing out. We will address these issues.

[1]Rathee, Mayank, et al. "Elsa: Secure aggregation for federated learning with malicious actors." 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 2023.

[2]Lycklama, Hidde, et al. "Rofl: Robustness of secure federated learning." 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 2023.

[3]Miao, Yinbin, et al. "Privacy-preserving Byzantine-robust federated learning via blockchain systems." IEEE Transactions on Information Forensics and Security 17 (2022): 2848-2861.

[4]Hao, Meng, et al. "Efficient, private and robust federated learning." Proceedings of the 37th Annual Computer Security Applications Conference. 2021.

2024-08-12

Dear Reviewer,

We are grateful for your constructive feedback. We are happy for your recongnition in the comprehensive and detailed explanation, effectiveness, and systematic theoretical analysis of our paper. Below we summarize the main concerns we addressed in the rebuttal:

Additional communication cost of cosine similarity computation: We provide detailed analysis for the communication cost in each stage. We show that the additional communication cost introduced by cosine similarity computation is small compared to the communication overhead reduced by our approach.

Advantages over other solutions: We summarize the advantage of our solution over the provided baseline in terms of three dimensions: collusion threshold, robust aggregation rule, and computation overhead.

Assumption on clean public samples: We clarify that the required clean root dataset could be of small size and slightly diversed from the from the overall training data distribution. We also propose several remedies suppose we cannot get any clean root dataset even if the required size is small.

The server have all shares of the secret and can recover it: We clarify that the secret shares are encrypted sending to the server and explain the encryption process.

Experiments on subtle attacks: We conduct additional experiments against KRUM attack, BadNets, and scaling attacks and will include the results in our paper.

Once more, we thank the Reviewer for the constructive feedback. We sincerely hope that the provided answers address the Reviewer's concerns and that the score can be reconsidered.

Best Regards,

Authors of Submission 11258

2024-08-13

Considering the higher scores given by other reviewers, I reread the paper and the rebuttal, and still have serval big questions.

The algorithm part is hard to follow because the symbolic representation is confusing. Maybe you can clarify some cases of them here.

$\langle \mathbf{g_i} \rangle$ denotes the secret sharing of $\mathbf{g_i}$ . So $\mathbf{V}^i=\{v_{jk}^i\}$ is also the secret sharing of $\mathbf{g_i}$ ? In the Algorithm 3 RobustSecAgg, $\{s_{ij} \}$ is the packed secrets of $\mathbf{g_i}$ , and $\{v_{ij} \}$ is the witness.

And again, $cs_k^j$ denotes the $j^{th}$ share of the cosine similarity between $\mathbf{g_k}$ with $\mathbf{g_0}$ ? Here $k$ denotes the index of clients, and then in the next part “Step 1: Secret resharing of partial dot product”, $s_{jk}^i$ denotes the share sent to user $j$ for the $k^{th}$ group of elements. Here , $k$ denote the index of group of elements. Maybe fixing the notations can help the readers.

And another, $\ell$ is the number of elements packed into one polynomial, also in the “Step 1: Secret resharing of partial dot product”, $p$ is used for the number of secrets packed.

It's very hard to follow your algorithm for me.

Efficiency of RFLPA .

The design goals include efficiency that mainly is achieved by packed secret sharing. But As we can see, the clients are burdened with a large number of computational tasks. That includes secret sharing, encryption and decryption of shares, addition, scale multiply and multiply on packed secret sharing. IMPORTANT: The multiply on packed secret sharing will generate a degree- $(d_1+d_2) or (2d)$ results and more operations need to be done to keep a degree- $d$ style. That is the main source of communication cost for MPC operation and this paper didn't discuss it.

You mentioned “ELSA and PBFL and SecureFL rely on the unrealistic assumption of two non-colluding parties”, RFLPA just put those computation tasks on the clients, maybe that is more unfriendly for the clients (mobile or other edge devices). Existed works can be extended to multi-parties using shamir secret sharing.

And this paper didn’t conduct experimental comparison under WAN/LAN setting. 4 round communications per iteration (that did not include the communication incurred by multiply on packed secret sharing also), that would a huge burden for WAN.

评论- Reply to Reviewer's Comment

2024-08-14

Dear Reviewer VWie,

Thank you for your reply and time. Hope our response below could address your additional concern.

Q1: The algorithm part is hard to follow because the symbolic representation is confusing.

Thanks for your comment. We apologize for some notation inconsistencies in Algorithm 3. The secret shares of $g_i$ should be $\mathbf{V}_i=v_{jk}^i=\mathbf{v}_{ij}$ , and the witness is given by $w_{jk}^i=\mathbf{w}_{ij}$ . We will fix the corresponding sentences in Algorithm 3 as followed:

Generate packed secrets $\{\mathbf{v}_{ij}\}_{j\in \mathcal{U}_0}$ , commitments $\mathcal{C}$ and witness $\{\mathbf{w}_{ij}\}_{j\in \mathcal{U}_0}$ for $\mathbf{g}_i$
Recover $(\{\mathbf{v}_{ji}\}_{j\in \mathcal{U}_1\backslash i}, \{\mathbf{w}_{ji}\}_{j\in \mathcal{U}_1\backslash i})=\mathbf{Dec}(\mathbf{c}_{ji},k_{ji})$ , and verify the secret shares $\{\mathbf{v}_{ji}\}_{j\in \mathcal{U}_1\backslash i}$

The reviewer's understanding of $cs_k^j$ is correct. To avoid the confusion in index $k$ , we will fix the notation in Table 3 and Section 4.5 (line 210 to 214) as followed: $cs_k^j$ -> $cs_j^i$ , and $nr_k^j$ -> $nr_j^i$ . In that way, $cs_j^i$ would denote the $i^{th}$ share of the cosine similarity between $\mathbf{g}_j$ and $\mathbf{g}_0$ .

For the interpretation of $l$ and $p$ , we have their description in Table 3. In particular, $l$ denotes the number of secrets packed at a polynomial for gradient, and $p$ denotes the number of secrets packed at a polynomial for shares of partial dot product.

We sincerely hope that the above clarification and fixing the notation issue could help undertand further details in the algorithm.

Q2: Efficiency of RFLPA

Q2-a: The clients are burdened with a large number of computational tasks

Analysis of client's computation cost: Referring to Appendix H, we have provided stage-wise computation analysis in detail. We also summarize the computation cost for those operations the reviewer mentioned in Table R1.

[Table R1. Break down of per-user communication cost. Cosine similarity computation incurred O(N) additional communication cost, lower than the original scale of O(M+N).]

Stage	Communication cost
Create packed secret shares	$O((M+N)\log^2 N)$
Encryption and decryption of shares	$O(M+N)$
Compute shares of partial dot product (addition and multiply on packed secret sharing)	$O(M+N)$
Create packed secret shares of partial dot product	$O(N \log^2 N)$
Derive final secret shares of dot product (addition, scale multiply and decoding)	$O(N^2\log^2 N)$

Therefore, the clients' computation cost is $O((M+N^2)\log^2 N)$ .

Improvement in computation time: It's important to note that our framework focus on improving the efficiency over existing solutions that combines SecAgg with defense strategies against poisonous attacks. RFLPA significantly reduces the total computation cost compared with: (1) HE-based solutions, and (2) BREA that leverages secret sharing method, as demonstrated by our empirical analysis. This is a substantial improvement in the research field that integrate privacy and robustness.

Friendly adaption to device with low computation power: As we will discuss later in Q2-c, our framework can be easily adapted to reduce the computation cost for clients with low resources, such as mobile devices. In that case, the low resource clients only need to generate secret shares, and the subsequent operations will be done by the high resource clients.

Q2-b: The multiply on packed secret sharing will generate a degree-2d results and more operations need to be done to keep a degree-d style.

In RFLPA, there's no need to conduct further operations on the degree-2d results since Dot Product Aggregation inherently converts the degree-2d partial dot product shares into degree-d final product shares.

After computing the partial shares of dot products, each $cs_j^i$ (or $nr_j^i$ ) is a degree-2d share vector. During the Dot Product Aggregation, we conduct secret resharing and local computation on the shares. Referring to Appendix G Explanation of Secret Re-sharing, $Chop_d$ matrix is leveraged to obtain a degree-d share vector $\mathbf{h}_k^i$ of the final dot product. We will emphasize this point in our paper.

评论- Reply to Reviewer's Comment (Continued)

2024-08-14

Q2-c: You mentioned "ELSA and PBFL and SecureFL rely on the unrealistic assumption of two non-colluding parties", RFLPA just put those computation tasks on the clients, maybe that is more unfriendly for the clients (mobile or other edge devices). Existed works can be extended to multi-parties using shamir secret sharing.

Firstly, extending existing works (ELSA, PBFL, and SecureFL) to multi-parties requires significant changes on the algorithm, and that should be another solution/direction to be explored.

Furthermore, our framework can be easily adapted to the case considering the heterogeneous computation power of each client. For example, the server can select the clients with more computation power to participate in the round 2 & 3 & 4. In that way, clients with weaker computing power could merely generate the secret shares of their local gradients and upload them to the server. This can be achieved by simply limiting the participating client in round 2 & 3 & 4 to the computation powerful clients, thereby improving the user experience for edge devices with low computation power.

Q2-d: Experimental comparison under WAN/LAN setting

Firstly, performing experiment under WAN/LAN setting is unnecessary since the communication time is mainly determined by the message size. In WAN, the extra communication time for multiple rounds of communication mainly comes from latency, i.e., the time it takes for a message to travel from the sender to the receiver. The latency is negligible compared to the time it takes to transfer the message. Specifically, the latency in WAN setting ranges from 20ms to 700ms (within 1 seconds) depending on the geographical distance. On the other hand, the time to transfer the message (ignoring latency in WAN setting) is around 6.4 seconds for RFLPA, and 52 seconds for BREA, assuming 100 clients using a 1.6MB classifier and 100 Mbps per client. Therefore, the latency from multiple communication rounds is much smaller than the time RFLPA reduced for smaller message size.

Furthermore, many existing SecAgg algorithms (focusing only on privacy protection) also rely on multiple communication rounds. For instance, [1] and [2] leverages 4 communication rounds per iteration. Our framework is able to integrate defense against poisoning attack into SecAgg using the same communication rounds as [1] and [2].

Overall, we sincerely hope that our explanations could address the reviewer's further concerns.

[1] Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., ... & Seth, K. (2017, October). Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (pp. 1175-1191).

[2] Bell, J. H., Bonawitz, K. A., Gascón, A., Lepoint, T., & Raykova, M. (2020, October). Secure single-server aggregation with (poly) logarithmic overhead. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (pp. 1253-1269).

2024-08-14

About Q2-c: Adaption to heterogeneous computation power of each client is a good idea. But I must point out that selecting clients with more computation power by the server will increase the risk of collusion attack among the server and the clients. If we consider that the server could collude with some clients, the privacy goal of RFLPA will be failed.

About Q2-d: Communication round is a very important factor under WAN setting that you shouldn't ignore. For example, if you have 1000 rounds communication, and the latency will increase 1000*100ms(assume the latency will be 100ms for WAN and ignored for LAN) = 100s, that is also the main cost for the secret sharing-based solutions compare with those based on HE(heavy computation) or Garbled Circuit (heavy communication bytes).

You mentioned that "many existing SecAgg algorithms (focusing only on privacy protection) also rely on multiple communication rounds"[1][2]. Those works use Diffie-Hellman for key negotiation, and their multiple rounds of communication occured in offline phase.

评论- Reply to Reviewer's Comment

2024-08-14

Dear Reviewer VWie,

Thank you for your reply and time. Hope our response below could address your additional concern.

Reply to Q2-a:

Firstly, the encryption and decryption is based on symmetric encryption, much more efficient than the assymetric encryption in HE-based protocol. During our experiment, we found that the encryption & decryption time on the secret is much smaller than the time for other operations on the secret shares, including secret sharing and reconstruction.

Furthermore, even if in the BREA's p2p channel, the symmetric encryption on the secret shares is neccessary to avoid the ear dropper's inference attack. However, their solution omit this encryption & decryption process, posing the user's private value in great risk. The encryption is indispensable for secure communication.

Reply Q2-b:

The dot product aggregation protocol shares some relation to the degree reduction protocol in terms of secret resharing. The degree reduction is included in (3) sending and receiving secret shares of partial gradient norm square and cosine similarity (O(N) messages).

We are pleased that the reviewer recognize the value and contribution of our packed secret sharing with dot product aggregation protocol. The major purpose of dot product aggregation is to address the issue of increased privacy leakage, and we believe that this benefit (in terms of privacy) has been clearly understood and recognized by the readers and reviewers. The degree reduction can be treated as the side benefit (though not the major purpose), and we will highlight this point in our paper.

Reply to Q2-c:

We understand that the inherent tradeoff between the number of participating clients and collusion threshold. At the same time, we would like to make it clear that:

Even if we select a proportion of clients, the privacy risk is much lower than the case in HE's 2 non-colluding parties.
Even if we use the whole set of clients, each client's computation cost is significantly lower than the protocol proposed by BREA with $O(n)$ threshold.
In cross-silo setting, the clients typically host stronger computing power (such as server), and thus there's few need to consider the client selection. In cross-device setting, the heterogeneous issue arises, but typically this setting consists of a large number of clients, and thus select a proportion of them could maintain a certain level of protection.

Reply to Q2-d

We would like to highlight that:

The reduced message size from RFLPA compared with BREA is much larger than the extra latency in WAN setting, as is deomstrated in our previous reply.
The extra latency in more communication rounds is negligible compared with the reduced computation time from RFLPA compared with HE-based methods. As is shown by Appendix L.7.1 and our initial rebuttal, the computation time for RFLPA is hours fewer than the HE-based methods for even a single iteration. Therefore, the total communication latency (100s) is much smaller.

Furthermore, the communication rounds in SecAgg [1][2] is 4 for each iteration. The reviewer can refer to their algorithm for further description.

2024-08-14

About Q2-a: Secret sharing-based operations cost the network resources, the encryption and decryption of shares consume the computation resource, that cannot be treated as same. So RFLPA replaces the p2p channel between clients assumed in BERA by using the sever as a repeater(or forwarder) with encryption of the shares. So the encryption/decryption cost will be introduced to the clients.

About Q2-b: I guess that you use a method like the degree-reduction technology in shamir secret sharing? Every $p$ elements in $cs^i$ is reshared as a degree-d polynomial. So the communication cost is genereated by the resharing operation. Maybe that has been included in " (3) sending and receiving secret shares of partial gradient norm square and cosine similarity (O(N) messages)"?

So using a packed secret sharing with the partial dot product is the core contribution of this paper, and I encourage the authors to improve the representation of the key part. It's not very clear for me, and it's important to understand your work for the researchers in MPC/PPML.

审稿意见

评分: 5置信度: 42024-07-08

The paper proposes a defense mechanism against poisoning attacks on federated learning while maintaining privacy guarantee. The solution is based on secure aggregation and evaluates the trustworthiness of client updates with cosine similarity. Information leakage during the computation of cosine similarity is mitigated by a novel aggregation method for dot products. Theoretical and empirical analysis show that the defense successfully reduces the communication and computation overhead and keeps a competitive accuracy compared to previous work.

优点

The use of packed Shamir secret sharing with the proposed dot-product aggregation sounds novel and reliable.
The security and efficiency of the framework is well supported by theoretical analysis and experiments.
The framework largely reduces the overhead of previous works and maintains good accuracy even when no poisoning attack is considered, making deployment more practical.

缺点

The paper focuses on a specific type of federated learning setup. Its applicability to other federated learning models, such as those involving more dynamic and heterogeneous client populations, is not explored.
The experiments conducted use standard datasets like MNIST, F-MNIST, and CIFAR-10. These datasets, while commonly used, do not fully represent the complexity and diversity of real-world data. The effectiveness of RFLPA in more diverse and complex datasets remains to be tested.
The performance comparison against other state-of-the-art methods might not be exhaustive. There might be other emerging methods or variations of existing ones that were not considered in the evaluation.

问题

In section 4.2 normalization, should the gradients with norm smaller than $||g_0||$ also be normalized? If so, how to verify that they are normalized?

局限性

The paper has adequately addressed its limitations.

作者回复

2024-08-06

Dear Reviewer WU56,

Thank you for your recognition and valuable comments. Hope our response below could address your concern.

W1: Applicability to other federated learning models, such as those involving more dynamic and heterogeneous client populations

Thanks for your comment. Our approach is also applicable in the federated learning setup involving dynamic and heterogeneous client populations. We would address the concern from two perspectives: the heterogeneous clients and dynamic change.

Heterogenous clients. The experiment result is already presented in Appendix L.8 Non-IID Setting, where each client holds a dataset biased towards a certain class. Compared with BREA and FedAvg, RFLPA demonstrates resilient performance against poisoning attacks under the setting where clients hold heterogeneous dataset. For 30% adversaries, the accuracy of RFLPA is over 0.89 and 0.79 on MNIST and F-MNIST dataset, respectively.
Dynamic change. For dynamic settings, we consider the case where the data of the clients change during the federated training with the arrival of new data. We added experiments for this case and will include the analysis in the appendix. To simulate such setting, we leverage Dirichlet Distribution Allocation (DDA)[1] to sample non-iid dataset, and change the distribution for each client every 20 epochs. Table R1 presents the accuracy under the gradient manipulation attack, demonstrating RFLPA's robustness under the dynamic setting.

[Table R1. Accuracy under dynamic client data distribution against gradient manipulation attack.]

Proportions	of attackers	No	10%	20%	30%
FedAvg	MNIST	0.98	0.26	0.29	0.30
	F-MNIST	0.86	0.52	0.21	0.18
RFLPA	MNIST	0.96	0.94	0.92	0.90
	F-MNIST	0.83	0.79	0.79	0.77

W2: The effectiveness of RFLPA in more diverse and complex datasets remains to be tested.

Thanks for your comment. In Appendix L.6 Performance on Natural Language Processing (NLP) Dataset, we present the result under two NLP datasets, Recognizing Textual Entailment (RTE)[2] and Winograd NLI (WNLI)[3]. Table R2 highlights a portion of the results, and the full results can be found in Appendix L.6.

[Table R2. Accuracies on NLP dataset against gradient manipulation attack.]

	RTE				WNLI
Proportions of attackers	No	10%	20%	30%	No	10%	20%	30%
FedAvg	0.599	0.509	0.487	0.462	0.619	0.563	0.437	0.437
RFLPA	0.596	0.582	0.582	0.577	0.619	0.592	0.592	0.563

To test a more complex CV dataset, we further added the experiments on Cifar-100 dataset using ResNet-9. The analysis of these experiments will be included in the appendix. Table R3 presents the accuracy under varying proportions of attackers performing gradient manipulation attacks.

[Table R3. Accuracy on Cifar-100 dataset under gradient manipulation attack.]

Proportions of attackers	No	10%	20%	30%
FedAvg	0.55	0.11	0.10	0.10
RFLPA	0.55	0.54	0.50	0.49

W3: The performance comparison against other state-of-the-art methods might not be exhaustive. There might be other emerging methods or variations of existing ones that were not considered in the evaluation.

Thanks for your comment. We attempt to review the SoTA methods that achieve the goals of privacy protection as well as robustness against poisonous attacks as much as possible. We also discuss the advantages of RFLPA against additional frameworks in the response to reviewer VWie's W1-b. We would be pleased to make further comparisons if you provide additional baselines for us.

Q1: Should the gradients with norm smaller than $\|g_0\|$ also be normalized? If so, how to verify that they are normalized?

That's an interesting question. In principle, the gradients with norm smaller than $\|g_0\|$ should also be normalized. However, the server could simply verify that $\|\bar{g}_i\|\leq \|g_0\|$ for the following reasons:

A malicious user has no motivation to skip the normalization if $\|g_i\|<\|g_0\|$ . For benign users, they would honestly execute the protocol. For malicious user, skipping the normalization would reduce their trust score, and thus the weight on the aggregated gradients. Specifically, the trust score is a ReLU function of the dot product between the normalized local gradient and server model update. Therefore, a smaller scale of $\|\bar{g}_i\|$ leads to lower level of trust score, thus reducing the malicious user's impact on the global model.
The convergence bound in Theorem 5.2 is derived in the case where all gradients are smaller than $\|g_0\|$ . Even if it's un-normalized, we can still obtain the convergence bound in Theorem 5.2 because the derivation only requires that $\|\bar{g}_i\|\leq \|g_0\|$ for each $i$ , rather than $\|\bar{g}_i\|$ being close to $\|g_0\|$ .

[1] Luo, M., Chen, F., Hu, D., Zhang, Y., Liang, J., & Feng, J. (2021). No fear of heterogeneity: Classifier calibration for federated learning with non-iid data. Advances in Neural Information Processing Systems, 34, 5972-5984.

[2] Bentivogli, L., Clark, P., Dagan, I., & Giampiccolo, D. (2009). The Fifth PASCAL Recognizing Textual Entailment Challenge. TAC, 7(8), 1.

[3] Levesque, H., Davis, E., & Morgenstern, L. (2012, May). The winograd schema challenge. In Thirteenth international conference on the principles of knowledge representation and reasoning.

2024-08-10

Thanks for the response. The author has addressed my concern and I will keep my score.

2024-08-10

Dear Reviewer aBFW,

Thanks for your reply. We appreciate your recognition in the novelty, soundness, and effectiveness of our paper. We are pleased to know that we have addressed your concern. If you have any remaining concern, feel free to let us know.

Best Regards,

Authors of Submission 11258

审稿意见

评分: 7置信度: 42024-07-12

This paper aims to enhance the vulnerability of FL when dealing with poisoning attacks. A common strategy used in FL to avoid directly sharing local updates is called secure aggregation(SecAgg), which works with ciphertexts and is incompatible with defense techniques against poisoning attacks. To address this issue, the paper proposes using verifiable packed Shamir's secret sharing to compute the cosine similarity between local and global models for aggregation. Additionally, a novel dot product aggregation method is introduced to reduce the communication overhead caused by Shamir's secret sharing.

优点

The problem of addressing poisoning attacks in FL is timely and challenging, especially with provable security guarantee
The attempt to reduce the communication overhead caused by secret sharing is valuable
Extensive experimental validations and very detailed comparison with existing frameworks, especially for various aggregation methods

缺点

The assumption that the server has trusted root of clean datasets seems a fundamental limitation
The presentation could be improved; there are too many concepts introduced, making it hard to distinguish between what is new and what is from existing literature.

问题

It seems that only the experiments of malicious users are examined. I could not find that case when server conducts an active inference attack, is it included?
If the clean dataset at the server is not available, is it still possible to adjust the proposed protocol to deal with this situation?

局限性

Limitations are adequately addressed

作者回复

2024-08-06

Dear Reviewer WU56,

Thank you for your recognition and valuable comments. Hope our response below could address your concern.

W1 & Q2: Assumption on clean public samples

Thanks for your comment. As discussed in our response to Q1 in global rebuttal to program chair, we proposed some remedies in the absence of such dataset, including comparison with global weights and KRUM. Furthermore, the public samples can be of small size (e.g., 200 samples), and the gobal model is robust even when the root dataset diverges slightly from the overall training data distribution.

W2: Distinguish between new and existing concept

Thanks for your suggestion. We clarify the distinctions as followed.

The cryptographic techniques introduced in Section 3.4 Cryptographic Primitives are from existing literature, including packed Shamir secret sharing, Diffie-Hellman (DH) key exchange protocol,symmetric encryption and UF-CMA secure signature scheme.

The Verifiable Packed Secret Sharing (V-PSS) is our proposed concept, as previous verifiable secret sharing scheme are mainly associated with the vanilla Shamir secret sharing. We leverage the commitment technique in [1] to design the V-PSS algorithm. Furthermore, the Dot Product Aggregation and the related concept partial dot product is proposed by our paper. The secure communication scheme for secret sharing is proposed by our work, building on the aforementioned DH key exchange protocol, symmetric encryption and signature scheme. We will emphasize the new concepts in our paper.

Q1: Experiment when server conducts an active inference attack

Thanks for your question. In activate inference attack, the server would manipulate certain users' messages to obtain the private values of targeted users. In particular, the server may alter the secret shares user sents to others to infer their private value. We didn't include the experiment in our paper since the integrity of the message is protected by the signiture scheme, and thus the server is prevented from forging the message of any user. If the server changes the value of the encrypted message, then the signiture verification would indicate the other user that it's an invalid message. In Appendix C.4, we provide a security analysis for the signature scheme. For a UF-CMA secure signature scheme, no probabilistic polynomial time (PPT) adversary is able to produce a valid signature on an arbitrary message with more than negligible probability.

We added experiments for passive inference attack using the Deep Leakage from Gradients (DLG)[2]. DLG attempts to reconstruct the original image from the aggregated gradients. We conducted an attack on the CIFAR-10 dataset, using the specifications in Appendix L.2. The average PSNR of generated image with respect to original image is 11.27, much lower than the value of 36.5 when no secure aggregation is involved. We also upload an PDF in the global rebuttal section, presenting the original and inferred image under RFLPA. It can be observed that the inferred images are far from the raw images under DLG attack.

We will include the explicit analysis for both inference attacks in our paper.

[1] Kate, A., Zaverucha, G. M., & Goldberg, I. (2010). Constant-size commitments to polynomials and their applications. In Advances in Cryptology-ASIACRYPT 2010: 16th International Conference on the Theory and Application of Cryptology and Information Security, Singapore, December 5-9, 2010. Proceedings 16 (pp. 177-194). Springer Berlin Heidelberg.

[2] Zhu, L., Liu, Z., & Han, S. (2019). Deep leakage from gradients. Advances in neural information processing systems, 32.

2024-08-11

Thank you for the response. My concerns have been addressed, and good to see that the experimental results using DLG align with our expectations. My score remains unchanged.

2024-08-11

Dear Reviewer 4mMR,

Thanks for your reply. We are grateful for your recognition in our paper and additional experiments. We are pleased to know that your concerns have been addressed. If you have any remaining concern, feel free to let us know.

Best Regards,

Authors of Submission 11258

审稿意见

评分: 6置信度: 42024-07-17

This paper presents a framework to address the dual challenges of privacy leakage and poisoning attacks in federated learning. The authors propose RFLPA, which integrates cosine similarity for robust aggregation with verifiable packed Shamir secret sharing to ensure secure aggregation without compromising on robustness. The framework also introduces a new dot-product aggregation algorithm to prevent information leakage. The proposed method is evaluated on 3 benchmark datasets.

优点

The paper is well-written and well-structured. The paper addresses a very important problem in the security of federated learning systems. Thorough experimentation and analysis are undertaken to evaluate the proposed method.

缺点

The paper’s reliance on the assumption that the server has a clean root dataset and that secure communication can be maintained without significant overheads may not always be realistic. Additionally, the code was not revealed and could not be evaluated at the time of this review. Finally, one of the main threats that challenge robust aggregators is backdoor attacks, which are tested on the proposed method.

问题

Could the authors provide more insights into the practical implementation challenges of integrating verifiable packed Shamir secret sharing into existing FL systems?
How does the framework scale with the number of clients and the dimensionality of models, and are there potential bottlenecks for large-scale deployments?
Could the authors elaborate on the potential vulnerabilities in the cryptographic mechanisms used and how they are mitigated?
How realistic is the assumption of the server having a clean root dataset in different FL application domains, and can the framework be adapted for scenarios without such a dataset?

局限性

Yes

作者回复

2024-08-06

Dear Reviewer WU56,

Thank you for your recognition and valuable comments. Hope our response below could address your concern.

W1 & Q4: Assumption that the server has a clean root dataset and that secure communication can be maintained without significant overheads

Thanks for your comment.

For the first assumption, referring to our answer to Q1 in global rebuttal to program chair, the public samples can be of small size (e.g., 200 samples), and the gobal model is robust even when the root dataset diverges slightly from the overall training data distribution. Moreover, we have proposed two remedies in the absence of such dataset, including comparison with global weights and KRUM.

For the second point, we may clarify that we haven't explicitly assumed that secure communication can be maintained without significant overheads. Instead, our paper aims to minimize the communication cost during secure communication by reducing the message size. It should be noted that secure communication is neccessary to protect the secrecy and integrity of the messages transmitted during federated learning. We are pleased to provide further clarification if there's any misunderstanding in your concern.

W2: The code was not revealed

Thanks for your comment. We have sent the code in anonymized link to AC.

W3: Evaluation of backdoor attacks

Thanks for your suggestion. As discussed in our response to Q2 in the global rebuttal, we tested RFLPA's robustness on two backdoor attacks, BadNets, and scaling attacks. We will include the empirical analysis in appendix.

Q1: Insights into the practical implementation challenges of integrating verifiable packed Shamir secret sharing into existing FL systems?

Thanks for your question. The verifiable packed Shamir secret sharing could be vulnerable to the following implementation errors: (a) the client conducts incorrect computation, such as aggregation and dot product, on the secret shares, and (b) the client might collaborate to recover the secret. In our paper, we propose the following mitigations to such challenges: (a) The Reed-Solomon decoding algorithm allows the server to recover the correct computation results in case of certain computation errors. For a degree-d packed Shamir secret sharing with n shares, the Reed-Solomon decoding algorithm could recover the correct result with E errors and S erasures as long as $S + 2E + d + 1 \leq n$ . (b) Formal security guarantee on the packed secret shares suggests that any O(N) shares reveal no information of the secret values.

Q2: How does the framework scale with the number of clients and the dimensionality of models, and are there potential bottlenecks for large-scale deployments?

Thanks for your question. We explain the scalability in terms of communication and computation cost. The theoretical overhead is listed in Section 5.1 Complexity Analysis, and we summarize the cost in Table R1.

[Table R1. Complexity summary of RFLPA for N clients and M dimensional model.]

	Computation	Communication
Server	$O((M+N)log^2NloglogN)$	$O((M + N)N)$
User	$O((M + N^2)log^2N)$	$O(M + N)$

The empirical overhead can be found in Section 6.2.2 and Appendix L.7. We also highlight the per client communication and computation cost under varies client size in Table R2.

[Table R2. Per client overhead of RFLPA using a MNIST classifier (1.6M).]

client size	100	200	300	400
Communication cost (MB)	82.50	82.50	82.51	82.52
Computation cost (minute)	3.41	11.44	24.51	42.60

In our study, we observed that communication costs remain relatively stable as N increases significantly, with message size staying nearly constant with N but linearly scaling with M (when $M\gg N$ ). For computation cost, training time shows an approximate second-order polynomial increase with N and a linear increase with M.

For large-scale deployments, the bottlenecks are as followed: (1) Server communication: the server could suffer heavier communication overhead, which mainly comes from the stage of distributing the user's secret shares. To mitigate this issue, we can allocate the secret distribution task to multiple nodes to relief the server's workload. The signature and encryption schemes prevent the nodes from conducting man-in-the-middle attacks on the secret shares. (2) Synchronization: each training iteration consists of four rounds, raising dropout problem. The server should wait for the result of participating clients to continue. Noted that the secret sharing scheme could inherently mitigate the problem. For a degree-d packed Shamir sharing, the reconstruction can be done on receiving responses from d+1 users.

Q3: Could the authors elaborate on the potential vulnerabilities in the cryptographic mechanisms used and how they are mitigated?

Diffie-Hellman (DH) key exchange protocol: The major vulnerability of DH key exchange protocol is the Man-in-the-Middle attack. The attacker could alter the key exchange messages between the communicating parties. Such vulnerability could be addressed by the signature scheme. In particular, each client could receive their own signing key, as well as the verification keys for all other users from a trusted third party. The third party could be a government agency or certificate authorities that provide digital certificates for secure communication. The signiture scheme prevents the attacker from modifying the messages.

Symmetric Encryption & Signature Scheme: (1) Side-channel attacks. The attacker could infer the sensitive information exploiting side-channel signals, such as timing information. It's important to utilize mature encryption and signature packages robust to side-channel attacks. (2) Key management. Weak key management practices could lead to key leakage or theft. It's crucial to store private keys in secure environments, such as Hardware Security Modules.

2024-08-11

Thank you for the provided answers. My questions are answered, and my decision remains the same.

2024-08-11

Dear Reviewer WU56,

Thanks for your reply. We are happy that the Reviewer found our work addressing a very important problem and the empirical evaluation thorough. We are pleased to know that we have addressed your questions and concerns. If you have any remaining concern, feel free to let us know.

Best Regards,

Authors of Submission 11258

作者回复

2024-08-06

Dear Reviewers,

We want to express our profound gratitude for your insightful comments and valuable suggestions. We address some common questions in this global rebuttal section.

Q1: The paper relies on the assumption that the server has trusted root of clean datasets.

Please note that the required clean root dataset in our approach is of small size, e.g., 200 samples. Suppose we cannot get any clean root dataset even if the required size is small, we propose several remedies:

Cosine similarity with global weights. Existing studies show that global weights could be useful in detecting malicious updates[1]. We can compute the cosine similarity between each local update and the global weights as follows: $cos(W_i^t, W_G^{t-1})=\frac{<W_i^t, W_G^{t-1}>}{\|W_i^t\|_2\|W_G^{t-1}\|_2}$ , and filter out the clients with similarity smaller than a pre-specified threshold. To validate the effectiveness of such approach, we conduct experiments against the Byzantine attack under various proportions of attacks. It can be observed in Table R1 that compared with FedAvg, RFLPA-GW effectively improves the accuracy in the presence of attackers. The communication and computation cost of such an approach is at the same scale of RFLPA’s original level as both compute the cosine similarity with a single baseline.

[Table R1. Accuracy for defense based on global weight under different proportions of attackers. RFLPA-GW replaces the robust aggregation rule in RFLPA with the method based on cosine similarity with global weight.]

Proportions	of attackers	No	10%	20%	30%
FedAvg	MNIST	0.98	0.46	0.40	0.32
	F-MNIST	0.88	0.55	0.51	0.45
RFLPA-GW	MNIST	0.98	0.95	0.92	0.91
	F-MNIST	0.90	0.80	0.77	0.75

KRUM-based method. We can substitute the aggregation module with KRUM when a trusted baseline is missing. Though KRUM incurs greater cost than the original method, we show in Appendix L.7.2 Ablation Study that there’s a notable reduction in communication and computation cost compared with BREA, benefiting from the design of our secret sharing algorithm (see Table 2 for a summary). The accuracy of RFLPA (KRUM) is expected to be the same as BREA, as both utilize the same aggregation rule.

[Table R2. Communication (in MB) and computation cost (in Minutes) with MNIST classifier (1.6 parameters).]

	RFLPA		BREA		RFLPA-KRUM
client size	300	400	300	400	300	400
Communication cost	82.51	82.52	1909.02	2544.45	79.58	82.25
Computation: per-user cost	24.51	42.46	182.27	294.27	46.48	75.78
Computation: server cost	15.00	26.47	216.96	287.22	39.76	62.81

Collection of slightly biased root data. [2] has conducted experiments where the root data is biased towards a certain class. Empirical evidence presented in suggests that the performance of the global model is robust even when the root dataset diverges slightly from the overall training data distribution. Specifically, the accuracy of MNIST-0.5 is at least 0.92 when the bias probability is within 0.4. Note that for MNIST dataset, the bias probability ranges from 0.1 to 1, and a larger bias probability suggests a more imbalanced distribution of the root dataset.

We summarize the pros and cons of three robust aggregation modules in Table R3. We will add a discussion in the appendix.

[Table R3. Pros and cons of different aggregation modules.]

Approach	Pros	Cons
RFLPA-FLTrust	Lower overhead	Collection of validation data
	No need for prior knowledge about # of poisoners
	Robust under slightly biased validation data
RFLPA-GW	Lower overhead	Lack of theoretical convergence guarantee
	No need for prior knowledge about # of poisoners
	No need for validation data
RFLPA-KRUM	No need for validation data	Higher overhead
		Need prior knowledge about # of poisoners

Q2: Experiments on more stealthy attacks.

We added experiments for more stealthy attacks: (1) KRUM attack (untargeted attack)[3], (2) BadNets (backdoor attack)[4], and (3) Scaling attack (backdoor attack)[5]. We will include the experiment results in the appendix.

Table R4 shows the results for the additional attacks on CIFAR-10. For backdoor attacks, the classification accuracy on the triggered dataset is shown in parentheses. Compared with FedAvg, RFLPA not only improves the accuracy on the general dataset, but also the accuracy on the triggered dataset.

[Table R4. Accuracies on CIFAR-10 under varying proportions of attackers. For backdoor attacks, the values are presented as overall accuracy (backdoor accuracy).]

Proportions	of attackers	10%	20%	30%
FedAvg	KRUM attack	0.27	0.12	0.11
	BadNets	0.68 (0.54)	0.67 (0.54)	0.55 (0.28)
	Scaling	0.70 (0.22)	0.68 (0.21)	0.54 (0.19)
RFLPA	KRUM attack	0.71	0.70	0.70
	BadNets	0.71 (0.68)	0.70 (0.68)	0.69 (0.66)
	Scaling	0.70 (0.69)	0.70 (0.69)	0.69 (0.69)

[1] Yaldiz, D. N., Zhang, T., & Avestimehr, S. Secure Federated Learning against Model Poisoning Attacks via Client Filtering. In ICLR 2023 Workshop on Backdoor Attacks and Defenses in Machine Learning.

[2] Cao, X., Fang, M., Liu, J., & Gong, N. Z. (2021, January). FLTrust: Byzantine-robust Federated Learning via Trust Bootstrapping. In ISOC Network and Distributed System Security Symposium (NDSS).

[3] Fang, M., Cao, X., Jia, J., & Gong, N. (2020). Local model poisoning attacks to {Byzantine-Robust} federated learning. In 29th USENIX security symposium (USENIX Security 20) (pp. 1605-1622).

[4] Gu, T., Liu, K., Dolan-Gavitt, B., & Garg, S. (2019). Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, 7, 47230-47244.

[5] Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D., & Shmatikov, V. (2020, June). How to backdoor federated learning. In International conference on artificial intelligence and statistics (pp. 2938-2948). PMLR.

最终决定Accept (poster)

2024-09-25

Existing defenses against poisoning attacks rely on analysis of local updates in plaintext, and thus are incompatible with the well-established security protocol SecAgg, which allows no knowledge of user updates by the server. To reconcile, FLTrust's cosine-similarity-based aggregation is employed on SecAgg. Verifiable packed Shamir secret sharing is used to reduce SecAgg's communication overhead, and the increased risk of privacy leakage with VPSSS is managed with a new dot-product aggregation algorithm. The idea of dot product aggregation is interesting and novel, but the primary advantage comes from the use of existing packed Shamir Secret Sharing. In this sense, the level of contribution is somewhat limited. More comprehensive analysis (provable security, computation overhead), realistic evaluation setups (non-IID settings using Dirichlet distribution, larger datasets like CIFAR-100, larger models, other common attacks like LIE, Min-Max and Min-Sum attacks) and comparison with existing work (like geometric median-based RFA [Pillutla 2022, IEEE Transactions on Signal Processing] which also does not require access to individual updates for defending against adversarial attacks while preserving privacy) would be desirable.