6.8

/10

Rejected4 位审稿人

最低5最高8标准差1.3

3.3

置信度

正确性3.0

贡献度2.5

表达3.0

ICLR 2025

Efficient Privacy-Preserving Federated Learning With Selective Parameter Encryption

Weizhao Jin,Yuhang Yao,Shanshan Han,Jiajun Gu,Carlee Joe-Wong,Srivatsan Ravi,Salman Avestimehr,Chaoyang He

OpenReview PDF

提交: 2024-09-26更新: 2025-02-05

TL;DR

Efficient privacy-preserving federated learning based on selective homomorphic encryption with quantifiable privacy guarantee

摘要

关键词

Federated learningprivacyhomomorphic encryptioninversion attack

评审与讨论

审稿意见

评分: 5置信度: 32024-10-30

The paper addresses the challenge of high overheads associated with using homomorphic encryption (HE) in federated learning (FL) to protect against privacy attacks such as gradient inversion. Traditional HE methods, while providing strong privacy guarantees, are computationally intensive and not scalable for large models like GPT-2.

To tackle this issue, the authors propose an privacy-preserving FL framework called Selective Parameter Encryption. The key idea is to encrypt only the most privacy-sensitive parameters of local models rather than encrypting the entire model.

优点

The originality of this work lies in its combination of selective encryption with homomorphic encryption in the context of federated learning.

缺点

Although the paper proposes an method to reduce overhead in privacy-preserving federated learning through Selective Parameter Encryption, it needs to deepen its theoretical analysis and experimental validation. Specifically, it should provide a detailed explanation of how the assumption that model parameters follow a log-normal mixture distribution is utilized in the privacy quantification framework. Additionally, offering mathematical comparisons with differential privacy would substantiate claims of superiority. The paper should include simulations of privacy attacks, such as gradient inversion attacks, to empirically demonstrate the method's effectiveness. Moreover, an in-depth discussion of potential vulnerabilities introduced by partial encryption is necessary, along with a formal security analysis to ensure the method's robustness against sophisticated attacks.

问题

The theoretical framework used for privatization has not been fully developed in the paper. The author mentioned that most existing models follow the mixed distribution of normal state, but did not explain in detail how this assumptions were integrated into their privacy analysis.
The thesis asserted that the method is largely better than differential privacy. However, there is no comparison of mathematics or empirical to verify this statement. It is necessary to comprehensively analyze the privacy-effective balance of two methods.
In order to prove the effectiveness of the method in defense privacy attacks, the author should include experiments to simulate the opponent's attempt to rebuild the training data from the model update. The lack of such experiments has weakened the claim to enhance privacy.
Reducing the parameters of encryption may expose the unblocked parameter to the new type of attack. The author shall discuss the potential vulnerabilities that selective encryption may be introduced and propose to ease the strategy.
The paper should provide formal security analysis to ensure that only the encrypted parameters will not weaken the overall security of the model aggregation process. Without this analysis, it is difficult to evaluate the steadyness of the method mentioned to fight complex attacks.
In Section 3.2, the description of the threat model is relatively simple. It should be more detailed to explain the attacker's ability, possible attack vector, and why you choose specific security assumptions.
Although the use of Palisade and Tenseal libraries is mentioned, there is no explanation of specific versions and configurations, and why Palisade is selected as the main evaluation object.
Although selective encryption can effectively defend anti -discrimination attacks, it lacks a detailed description of attack experiments. For example, the attacker's ability assumptions, the details of the attack method, and the statistical verification of the defense effect.
It may not be comprehensive enough by reducing attack scores to evaluate privacy protection. Based on the theoretical framework of differential privacy to discuss the possibility and risks of privacy leakage.
It is necessary to clarify whether the attacker's knowledge and ability, and whether the defense effect of selective encryption is still effective under a stronger attack model.
In "A.11 Proof of Base Full Encryption Protocol", the author mentioned that the privacy of the basic agreement was mentioned, but the proof provided was too brief. Lack of detailed steps and logical derivation, it is difficult to persuade readers.
In "A.12 Quantifying Negligible Privacy Value in Full Encryption", the author tries to quantify the privacy budget under the condition of quantification, but the derivation process is not clear enough. It is necessary to provide more detailed calculation steps and explain the basis of the selection of each parameter.

2024-11-15

We appreciate Reviewer pFrT's valuable suggestions! We sincerely hope that below we have addressed all your questions about our paper and that you can reconsider our paper as a strong candidate.

Weaknesse 1, Question 1, Question 2, Question 3, Question 4, Question 5, Question 6, Question 8, Question 10, and Question 11: Although the paper proposes an method to reduce overhead in privacy-preserving federated learning through Selective Parameter Encryption, it needs to deepen its theoretical analysis and experimental validation. Specifically, it should provide a detailed explanation of how the assumption that model parameters follow a log-normal mixture distribution is utilized in the privacy quantification framework. Additionally, offering mathematical comparisons with differential privacy would substantiate claims of superiority. The paper should include simulations of privacy attacks, such as gradient inversion attacks, to empirically demonstrate the method's effectiveness. Moreover, an in-depth discussion of potential vulnerabilities introduced by partial encryption is necessary, along with a formal security analysis to ensure the method's robustness against sophisticated attacks.

Good point! We would like to emphasize that we have provided a detailed privacy quantification in Section 4 where DP also is compared, along with a UC-Security-based security proof in A.11, A.12, and A.13, which covers a formal threat model, adversary composability, and security properties of our method. In Section 5, empirical evaluations prove the effectiveness of our methods both in efficiency and privacy guarantee. We believe that these are adequate to support our claim.

Also please note that our method is generalized without relying on the finding of the log-normal mixture distribution which is an empirical finding that helps us to understand the privacy behaviors of model parameters in practice.

We will improve the organization of our theoretical proofs for better clarity in our final version.

Question 7: Although the use of Palisade and Tenseal libraries is mentioned, there is no explanation of specific versions and configurations, and why Palisade is selected as the main evaluation object.

Good suggestion! We will include the specific versions of the HE library used in our evaluation and the reason of using Palisade primarily which is because Palisade is an optimized library that has faster computing performance (comparison results in Table 10).

Question 9: It may not be comprehensive enough to reduce attack scores to evaluate privacy protection. Based on the theoretical framework of differential privacy to discuss the possibility and risks of privacy leakage.

Good point! Our framework based on DP has given us a theoretical bound of privacy guarantee, where the empirical attack experiments with attack scores help us validate such theoretical bounds. This practice is analogous to how general differential privacy can be empirically evaluated by conducting database recovery attacks. We have included both in our experimental results in Table 2 and Figure 13-19.

Question 12: In "A.12 Quantifying Negligible Privacy Value in Full Encryption", the author tries to quantify the privacy budget under the condition of quantification, but the derivation process is not clear enough. It is necessary to provide more detailed calculation steps and explain the basis of the selection of each parameter.

Good suggestion! Here we provided a quantification process for the full encryption mode. It would definitely help the readers by including more discussion on how each derivation step is conducted including the detailed theorems from the DP papers mentioned.

审稿意见

评分: 6置信度: 42024-11-01

This paper introduces an efficient privacy-protecting federated learning framework that protects machine learning models trained on distributed devices with selective parameter encryption to prevent the disclosure of sensitive personal information through inversion attacks when servers are aggregated.

优点

A selective parameter encryption method is proposed via encrypting only the most privacy-sensitive parameters, which can reduce the costs of HE-based Federated Learning.
A theoretical framework and sufficient experiments are proposed to support their conclusion.

缺点

The article does not fully address the risks posed by malicious clients during the negotiation of homomorphic encryption (HE) keys. While the authors propose using a threshold version to mitigate collusion risks, there remains a significant motivation for colluding clients to target a specific client, particularly in scenarios with a small number of clients.
The method for generating mask maps may lead to inconsistencies in a heterogeneous environment. If clients use the same mask map, it could compromise privacy, especially when sensitive parameters differ across clients.
The use of the Hessian matrix to define privacy sensitivity lacks clarity. The correlation between gradient changes and privacy leakage is not sufficiently supported, particularly regarding the behavior of gradients at different training stages.
There is a notable lack of empirical evidence supporting the fitting of the log-normal mixture distribution model to various parameter models.

问题

How to mitigate the risk of colluded clients stealing a targeted client’s parameters, especially since this is a common issue faced by existing HE-based federated learning solutions?
Are mask maps unified across clients after aggregation? If they are, how do you handle differences in sensitive parameters? If not, what happens to the other parameters post-aggregation?
How do gradient changes relate to privacy risk? Could you provide evidence that gradient changes decrease as the model converges, affecting privacy risk?
What additional evidence can you provide to support the fitting of the log-normal mixture distribution model to different parameter models?

2024-11-15

We appreciate the valuable feedback from Reviewer 3uEj and we hope we have addressed your concerns below:

Weakness 1 & Question 1: The article does not fully address the risks posed by malicious clients during the negotiation of homomorphic encryption (HE) keys. While the authors propose using a threshold version to mitigate collusion risks, there remains a significant motivation for colluding clients to target a specific client, particularly in scenarios with a small number of clients. How to mitigate the risk of colluded clients stealing a targeted client’s parameters, especially since this is a common issue faced by existing HE-based federated learning solutions?

The key collusion risk is captured by our threat model and it is a standard key problem in homomorphic encryption [1]. With a reasonably sized group of clients and a correctly configured threshold in practice, this risk is bounded by the multi-party threshold. Please note that any future advancements in multi-party HE can be easily integrated into our framework.

In the presence of a malicious adversary, the scenario of colluding clients stealing a targeted client's parameter is considered an "output privacy" problem which our defined threat model here does not consider. However, output privacy is solvable by adding primitives like differential privacy [2], which is already supported by our framework.

[1] Aloufi, Asma, et al. "Computing blindfolded on data homomorphically encrypted under multiple keys: A survey." ACM Computing Surveys (CSUR) 54.9 (2021): 1-37.

[2] Truex, Stacey, et al. "LDP-Fed: Federated learning with local differential privacy." Proceedings of the third ACM international workshop on edge systems, analytics and networking. 2020.

Weakness 2 & Question 2: The method for generating mask maps may lead to inconsistencies in a heterogeneous environment. If clients use the same mask map, it could compromise privacy, especially when sensitive parameters differ across clients. Are mask maps unified across clients after aggregation? If they are, how do you handle differences in sensitive parameters? If not, what happens to the other parameters post-aggregation?

Good point! The mask maps are aggregated (also in encrypted fashion) in a way that reflects the collective sensitivity of all clients, while still allowing for local protection of parameters deemed uniquely sensitive by individual clients. This ensures consistency without compromising client-specific privacy.

Weakness 3 & Question 3: The use of the Hessian matrix to define privacy sensitivity lacks clarity. The correlation between gradient changes and privacy leakage is not sufficiently supported, particularly regarding the behavior of gradients at different training stages.How do gradient changes relate to privacy risk? Could you provide evidence that gradient changes decrease as the model converges, affecting privacy risk?

We use the Hessian matrix to define privacy sensitivity because our focus is on the threat posed by gradient inversion attacks. In such attacks, the goal is to recover users' training data by iteratively updating a dummy input to minimize the difference between the generated gradient (computed with the dummy input) and the true gradient. Therefore, the sensitivity of the generated gradient to changes in the input is crucial, motivating us to measure the sensitivity of the gradient with respect to the input (as represented by part of the Hessian matrix) as an indicator of privacy leakage risk.

Regarding why gradient changes decrease as training converges, this is based on the intuitive notion that gradients generally decrease as training gets close to completion. We have demonstrated the performance of attack at different stages during training (taking transformer3 as an example) in Appendix A.28.

Weakness 4 & Question 4: There is a notable lack of empirical evidence supporting the fitting of the log-normal mixture distribution model to various parameter models. What additional evidence can you provide to support the fitting of the log-normal mixture distribution model to different parameter models?

We apologize if our phrasing caused any confusion. Our intention is not to advocate for using a log-normal mixture distribution specifically for fitting. Instead, we use this distribution as an illustrative example to support our theoretical framework. We chose the log-normal mixture distribution here because it performed well across all models included in our experiments on defense effectiveness (Section 5.3). Our conclusions are based on a comparison of fitting results across common distributions. In Appendix A.17, you can see from the figures on the right that the privacy budget ratio curve for the log-normal mixture distribution fits the true distribution most closely. We omitted the fitting curves for other distributions on the left just to keep the plots clear.

审稿意见

评分: 8置信度: 32024-11-02

This paper introduces a privacy-preserving federated learning framework that leverages selective parameter encryption to address the computational and communication overhead associated with large-scale models. By encrypting only a subset of parameters, the proposed method aims to achieve a balance between privacy preservation and efficiency. The authors provide evidence that their approach significantly reduces computational and communication costs while maintaining privacy guarantees. The empirical results further support the efficacy of the method across models of varying scales, with especially pronounced benefits observed for large models.

优点

The paper effectively integrates theoretical analysis with empirical validation, which strengthens the credibility of the proposed approach and enhances the robustness of the findings.

The proposed method demonstrates substantial reductions in encryption and communication overhead, a particularly advantageous outcome for the scalability of large models.

缺点

Limited Comparative Analysis: The manuscript lacks sufficient depth in referencing and comparing existing selective parameter encryption techniques. While some methods are briefly addressed in Table 9, a more thorough and nuanced comparative analysis is necessary to clearly establish the novelty and advantages of the proposed approach.

Lack of Analysis on Parameter Selection Overhead: The introduction of a parameter selection step in the encryption process is not accompanied by a detailed evaluation of the associated overhead. This omission leaves the cost-effectiveness assessment incomplete, particularly regarding practical implementation.

问题

Comparison with Existing Methods: The manuscript would benefit from a more exhaustive comparative analysis of existing selective parameter encryption approaches. This analysis should elucidate the specific advantages of the proposed method, particularly with respect to scalability and efficiency in practical deployment scenarios. For example, some methods simultaneously employ homomorphic encryption for certain parameters while using differential privacy for others, thereby enhancing privacy protection. A detailed comparison of these hybrid approaches, including their respective advantages and trade-offs, would strengthen the overall contribution.

Parameter Selection Overhead: It is crucial to quantify the overhead introduced by the parameter selection mechanism. A detailed examination of this overhead would provide a more comprehensive understanding of the practical trade-offs and facilitate a balanced evaluation of the approach's feasibility.

2024-11-15

We appreciate Reviewer 1Hx6's valuable suggestions! We sincerely hope that we have addressed all your concerns below and that you could reconsider our paper as a strong candidate.

Weakness 1 & Question 1: Limited Comparative Analysis: The manuscript lacks sufficient depth in referencing and comparing existing selective parameter encryption techniques. While some methods are briefly addressed in Table 9, a more thorough and nuanced comparative analysis is necessary to clearly establish the novelty and advantages of the proposed approach.

Thanks for the suggestion! We have provided a description on the selective encryption used by Nvidia FLARE in both Figure 1 and Figure 9, stating that FLARE provides a layer-level-only selection without a provable and quantifiable privacy guarantee. Due to its layer-level nature, the sensitive parameter selection is coarse, which results in encrypting a larger amount of unnecessary model parameters. To the best of our knowledge, our work and FLARE are the first two concurrent papers that proposed the idea of selective encryption in FL where our work provides a more fine-grained parameter-level selective encryption with quantifiable and provable privacy. We plan to highlight this comparison in the final version of our paper.

Additionally, as explained in Algorithm 1, our framework supports using selective parameter encryption with HE and differential privacy at the same time per users' desired privacy level.

Weakness 2 & Question 2: Lack of Analysis on Parameter Selection Overhead: The introduction of a parameter selection step in the encryption process is not accompanied by a detailed evaluation of the associated overhead. This omission leaves the cost-effectiveness assessment incomplete, particularly regarding practical implementation.

Good point! The parameter selection step mainly consists of parameter sensitivity calculation and encrypted global mask aggregation. Parameter sensitivity is calculated using the formula in Section 3.4 (Step 1) and encrypted global mask aggregation can be regarded as one encrypted weighted sum function (Step 2 in Section 3.4) on local masks in the initialization phase.

2024-11-18

Regarding Point 1: Limited Comparative Analysis

Your response provides a comparison with Nvidia FLARE. This effort is commendable. However, I believe that focusing solely on selective parameter encryption schemes within the federated learning context might not fully showcase the advantages of your approach.

To strengthen the discussion, I suggest broadening the scope to include other existing parameter selection schemes (e.g., methods like Sphinx: Enabling Privacy-Preserving Online Learning over Mobile Devices). Additionally, there are various works in the context of selective parameter encryption for neural networks that may also be relevant. While these schemes are not specifically designed for federated learning, analyzing their feasibility or limitations in this context could provide valuable insights. Explicitly clarifying why these methods cannot be directly adapted or how your approach compares in terms of strengths and weaknesses would significantly enhance the persuasiveness and academic depth of the paper.

Regarding Point 2: Overhead Analysis of Parameter Selection Steps

Your explanation of the parameter selection steps is clear. However, I believe that a description alone is insufficient. Direct evaluations of the overhead, such as the time required or resource consumption, are crucial for a complete assessment of the method's practicality and overall cost-effectiveness.

Specifically, I would like to understand how the overhead introduced by the parameter selection step compares to the cost of encrypting a large number of unnecessary parameters, as mentioned in your response. Having a clear, quantitative comparison between these two types of overhead would provide a more intuitive understanding of the trade-offs involved. This comparison is important to justify the practicality of the proposed selective encryption scheme and to better highlight the efficiency gains it offers.

I appreciate the time and effort you have put into addressing these points. Should you address these concerns adequately, I would be happy to reconsider my evaluation.

2024-11-20

Point 1: Limited Comparative Analysis

Thank you for the suggestion! We have included the following content in our updated manuscript (please refer to the newest version of the paper pdf) to have a more in-depth discussion on selective encryption in the context of machine learning:

Selective encryption of model components has been explored in prior work, particularly in single-client-server machine learning setups for training and inference. For instance, Sphinx [1] employs a hybrid approach, utilizing homomorphic encryption for bias parameters while applying differential privacy to the remaining parameters. However, unlike our privacy sensitivity-based method, Sphinx does not easily satisfy the challenges in local client data heterogeneity and model varity in the context of federated learning. Similarly, other approaches [2] face limitations in federated learning due to their reliance on specific model architectures, overly coarse layer-wise selection strategies, and the absence of robust privacy quantification mechanisms.

[1] Tian, Han, et al. "Sphinx: Enabling privacy-preserving online learning over the cloud." 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 2022.

[2] Tian, Jinyu, Jiantao Zhou, and Jia Duan. "Probabilistic selective encryption of convolutional neural networks for hierarchical services." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

Point 2: Overhead Analysis of Parameter Selection Steps

We appreciate your valuable suggestions on providing a clear overhead analysis of parameter selection. We have conducted additional experiements to better reflect the efficiency improvement of our methods for privacy-preserving FL, and have included our results in Appendix A.30 in the updated manuscript.

Privacy Sensitivity Calculation	Encrypted Global Mask Agreement	Full Parameter Encryption Training	Selective Parameter Encryption Training	Overhead Reduction
113.8s	273.6s	85355.3s	6287.1s	79068.2s

In the same setup of Figure 6 on ResNet-50, the two key steps of parameter selection, i.e. privacy sensitivity calculation and encrypted global mask agreement, cost 113.8s and 273.6s, respectively, while selective parameter encryption reduces overhead by 79068.2s during the entire training task (please refer to the updated Figure 27 in Appendix A.30 for more details) compared with full parameter encryption. This result demonstrates that despite the additional overhead introduced by the parameter selection steps, our method still improves the encrypted FL overheads by a substantial margin. Additionally, the global mask can be easily reused in different training tasks for the same model architecture with similar data distribution, and the overhead of parameter selection can be further amortized in practice.

We appreciate your valuable feedbacks and we are more than happy to address any further concerns!

2024-11-20

Thank you for the additional efforts and clarifications. You have clearly demonstrated the significant optimization of training overhead with selective parameter encryption. I noticed that the presentation of Figure 27.a is slightly unclear and did not entirely help me quickly grasp the specifics. However, considering the overall completeness of your response, this is not a major issue. I believe you have adequately addressed this concern, and I appreciate your attention to detail and improvements.

Regarding the question on data heterogeneity and model diversity, you mentioned that your approach is capable of handling these scenarios. However, it seems that the paper does not provide explicit experiments or analyses to fully support this claim.

2024-11-20

We appreciate the positive feedback from Reviewer 1Hx6 and would like to further clarify the specific advantage of our work in the context of general federated learning. We provided experimental results in Figure 4 in addition to Appendix A.17 (Figure 13-20), where we evaluated different models including LeNet, CNN, ResNets, and other transformer-based LMs. Although we love the simplicity of Sphinx's idea of solely encrypting bias parts to already have adequate defense, we notice from our experiments, that the most sensitive parameters (exactly pinpointing which can help determine the minimum amount of encrypted parameters needed) in different models, especially for language models, are not necessarily bias. Results below are the parameter distribution of the top 10% of parameters:

====================

Model transformer3 Sensitivity Masking: Num of bias encrypted: 3.93% Num of others encrypted: 96.07%

====================

Model transformer3f Sensitivity Masking: Num of bias encrypted: 3.15% Num of others encrypted: 96.85%

====================

Model transformer3t Sensitivity Masking: Num of bias encrypted: 1.24% Num of others encrypted: 98.76%

====================

Model transformerS Sensitivity Masking: Num of bias encrypted: 0.47% Num of others encrypted: 99.53%

====================

Model gpt2 Sensitivity Masking: Num of bias encrypted: 0.31% Num of others encrypted: 99.69%

====================

Model CNN Sensitivity Masking: Num of bias encrypted: 0.16% Num of others encrypted: 99.84%

====================

Model resnet18 Sensitivity Masking: Num of bias encrypted: 0.09% Num of others encrypted: 99.91%

====================

Model resnet50 Sensitivity Masking: Num of bias encrypted: 0.15% Num of others encrypted: 99.85%

Our sensitivity-calculation-based solution is able to find more fine-grained important parameters at a minimum amount given various models and data distributions, which provides optimal efficiency via selective parameter encryption while providing the desired privacy guarantee.

2024-11-21

Thank you for your detailed response and for providing additional insights and experimental results. I appreciate the effort to clarify the applicability of your method to diverse models and data distributions. However, I have a few questions:

Clarification on Data Heterogeneity:

My interpretation of "data heterogeneity" primarily pertains to scenarios such as label skew, feature skew, or other variations in data distributions across clients. However, it seems your experiments focus on comparing different datasets (e.g., MNIST, CIFAR-100, WIKITEXT). Could you clarify whether your definition of heterogeneity relates to using the same model across distinct tasks/datasets and observing differences in the encrypted parameters?

Comparison with Explainable Learning Techniques:

Your approach to selectively encrypt the most sensitive parameters is intriguing and well-supported by experimental results. However, could you elaborate on how this differs from explainable learning methods that identify and prioritize important parameters?

Thank you again for your thoughtful response and contributions to this important field.

2024-11-22

Clarification on Data Heterogeneity

We apologize for any confusion. Our experiments involve different models and datasets, demonstrating that our sensitivity-based approach effectively adapts to diverse scenarios to identify the most sensitive parts of the models. With reasonable inference from such results, we believe this method can also address heterogeneous client data distributions.

To explore this further, we have added a preliminary experiment (see Figure 28 in Appendix.31) to analyze how sensitivity can assist in such cases. Our findings indicate that the calculated sensitivity distributions can capture the heterogeneity in client datasets. However, it is important to note that local privacy sensitivity maps across clients may produce specific outlier privacy-sensitive parameters. This observation suggests that alternative global mask aggregation functions, such as maximum-based aggregation, might outperform our current weighted averaging method in terms of privacy protection. Given the interdisciplinary nature of our work spanning federated learning, homomorphic encryption, and differential privacy as well as the extensive validation we have already conducted to cover various aspects related, we feel it is best to address this specific topic in follow-up research to avoid overcomplicating the current paper. We have updated our manuscript to reflect this in the future work part. We sincerely appreciate Reviewer 1Hx6’s insightful comments on efficient privacy-preserving federated learning!

Comparison with Explainable Learning Techniques

Explainable learning techniques aim to improve interpretability by identifying critical parameters or model components that significantly impact predictions. In contrast, our work targets privacy sensitivity, identifying parameters based on their vulnerability to privacy risks. While there will likely be overlap between the parameters selected by both approaches—since privacy sensitivity also involves analyzing how sensitive the model parts react to data, similar to sensitivity analysis in explainable learning—our focus is on preserving privacy across distributed clients. We believe it can be an interesting future work to precisely bridge domains of privacy-preserving ML, efficient ML, and explainable ML.

2024-11-22

Synchronization of New Experimental Code
The new experimental setup in the paper is not entirely clear, and providing the corresponding code would greatly help in understanding and verifying the experiments. Has the code for the newly added experiments been synchronized with as part of the supplementary materials?

Anonymity vs. Reproducibility
I noticed the authors mentioned:
"To maintain anonymity of the submission, here we only provide the core code related to HE in our FL system (the rest of the system will give out too many identifiers which is hard to anonymize)."
Will the authors consider releasing the complete code of the system after the anonymity period ends to facilitate reproducibility?

Appreciate the time and effort you have put into addressing these points.

2024-11-23

To demonstrate the difference in sensitivity distribution, we use two client data distributions constructed from the ImageNet dataset with 100 images from distinct classes sampled at equal intervals. Distribution 1 contains data with labels of 0 to 4 while Distribution 2 contains data whose labels span across the first 400. The above preliminary experiment adopts this setup as a proof of concept. To further investigate this aspect, experimental setups in the previous work [1, 2] for the FL data heterogeneity can be considered in future work on this topic regarding privacy sensitivity calculation. We have updated our manuscript to reflect this discussion.

We plan to open-source the entire codebase (including newly added experiments during this rebuttal) of this paper and integrate it into our existing open-source platform.

[1] Guleria, Arpit, et al. "On Homomorphic Encryption Based Strategies for Class Imbalance in Federated Learning." arXiv preprint arXiv:2410.21192 (2024). [2] Mendieta, Matias, et al. "Local learning matters: Rethinking data heterogeneity in federated learning." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

2024-11-26

We thank Reviewer 1Hx6 for the insightful feedback and suggestions during this rebuttal to improve our paper! We hope that we have addressed all the concerns of Reviewer 1Hx6 and that our paper can be reconsidered as a solid candidate.

2024-11-26

I appreciate the authors' comprehensive responses during the rebuttal phase. The additional clarifications have effectively addressed my concerns regarding the contributions and experimental details of the paper. After careful reconsideration, I find it reasonable to adjust my score to 8.

2024-12-03

I noticed that the final version of this article seems to have altered the page margins. Could this be an oversight by the authors, or am I mistaken?

2024-12-03

Dear Reviewer 1Hx6,

Thank you for your meticulous review of our manuscript and for pointing out that the page margins appear to have been altered. We sincerely apologize for this oversight. Specifically, remnants of a geometry package configuration we used during the drafting phase for internal review during rebuttal were mistakenly left in the final submission. It was not our intention to change the page margins when updating our paper. (Please note that fixing page margins does not affect our compliance with the page limit of 10.)

Due to the constraints of the submission system, we are unable to modify the manuscript at this stage of the review process. However, we assure you that we will fix the margin settings in the final version of the paper after the review process is complete. We are committed to adhering to all formatting requirements to ensure our paper meets the publication standards.

We appreciate your understanding and thank you again for bringing this to our attention.

审稿意见

评分: 8置信度: 32024-11-04

The paper tackles the significant computational and communication overhead associated with using Homomorphic Encryption (HE) to protect personal privacy in federated learning, an issue that has become even more critical with the rise of foundational models. To address this, the authors propose a method that selectively encrypts sensitive parameters, thereby markedly reducing computation and communication costs during training while providing quantifiable privacy guarantees. The paper is presented clearly and concisely, with comprehensive experiments that thoroughly evaluate the proposed method, making it a strong and valuable contribution to the field.

优点

The paper is written with notable clarity and conciseness, with figures and tables placed effectively to enhance readability and comprehension.
The theoretical proofs and experiments are comprehensive, addressing privacy through quantifiable discussions on selective parameter encryption, demonstrating gains in computational and communication efficiency, and validating model effectiveness.
The content is well-structured, effectively conveying the authors’ intended contributions and addressing most anticipated questions or concerns a reader might have.

缺点

The chosen base model, GPT-2, is relatively small, which may limit the generalizability of the results to larger foundational models. Since the paper aims to demonstrate the feasibility of implementing privacy-preserving techniques in foundational models, GPT-2 may not be sufficiently representative of these larger-scale models. I suggest testing the proposed method on a more current, widely used model such as the latest LLaMA 3.2, to provide stronger evidence of its scalability and applicability to modern foundational models.

问题

Have the authors considered testing the proposed method on larger foundational models? Given that GPT-2 is relatively small, it would be helpful to understand whether the method’s efficiency and computational and communication cost reductions hold at scale with more recent models, such as LLaMA 3.2 or similar. Expanding the experiments to include such models could strengthen the paper’s claims about feasibility for broader foundational model applications.

2024-11-15

We thank Reviewer 9HzQ for considering our paper as a strong candidate! We hope we have addressed your point about extending our work to more recent LLMs below:

Weakness 1 and Question 1: The chosen base model, GPT-2, is relatively small, which may limit the generalizability of the results to larger foundational models. Since the paper aims to demonstrate the feasibility of implementing privacy-preserving techniques in foundational models, GPT-2 may not be sufficiently representative of these larger-scale models. I suggest testing the proposed method on a more current, widely used model such as the latest LLaMA 3.2, to provide stronger evidence of its scalability and applicability to modern foundational models.

We acknowledge the point about the generalizability of our results to more recent LLMs. GPT-2 was selected due to its wide recognition and balanced trade-offs between computational requirements and experimental reproducibility. We agree that testing on more recent and larger models, such as LLaMA 3.2, would strengthen the paper's claims. Per our existing experiments, we notice a general trend of privacy/overhead (in Table 1) that similarly larger models might follow. We are considering a follow-up work to analyze the effectiveness of our method on newer/larger models.

2024-11-24

Thank you for acknowledging the importance of newer and larger models, such as LLaMA 3.2, and your plans for future work. However, I firmly believe that incorporating additional experiments in this paper is essential. Assumptions or trend-based extrapolations alone are insufficient to convincingly demonstrate the generality and scalability of the proposed method for modern large-scale models.

I hope the authors will further consider this point to enhance the completeness and rigor of the paper.

2024-11-26

Thanks for the valuable suggestion to improve our paper! We have added additional experiments (please refer to the updated manuscript) using newer LLMs from the Llama-3.2 collection. Figure 29-30 in Appendix A.32 demonstrate how our method performs on modern large-scale models. The experimental results indicate that newer LLMs align closely with the findings observed in our experiments on earlier models.

We hope that we have addressed all the concerns from Reviewer 9HzQ and we are more than happy to further incorporate any suggestions for improvement.

2024-11-26

Thank you for the response. In light of this, I am willing to uphold my original review opinion and recommend the acceptance of this paper.

AC 元评审

2024-12-21

The submission proposes to selectively encrypt sensitive parameters in a homomorphic encryption based secure aggregation model of federated learning. Then, the paper argues that this mechanism could be an alternative to differential privacy, as it can empirically mitigate well gradient inversion attacks. The main concern about this submission is that it lacks a compelling theoretical explanation of why the proposed selective encryption provides a satisfactory privacy guarantee. In fact, the theoretical definitions given in Section 4 are quite confusing as they mix properties of the well-known differential privacy notion with the definition of the proposed privacy guarantee. This section is written in a quite sloppy fashion. E.g., while Definition 4.1 is for information-theoretic differential privacy, Theorem 4.3 seems to be invoking another computational version of differential privacy instead. Moreover, Theorem 4.7 doesn't hold with respect to the information-theoretic definition of differential privacy given in Definition 4.1 due to the use of encryption (instead, it should require a computational version of differential privacy that isn't defined in the paper). See Chapter 10 of the book of Vadhan (https://privacytools.seas.harvard.edu/files/complexityprivacy_1.pdf) for a distinction between the two notions. The paper would need to substantially improve the mathematical rigor of its presentation before being ready for publication.

审稿人讨论附加意见

The reviewers raised multiple concerns including;

The theoretical rigor of the privacy framework
The choice of the foundation model used in the empirical evaluation
The baselines
The empirical evaluation
The threat model
The assumption of the log-normal mixture distribution on the model parameters.

While the author(s) addressed many of these concerns during the discussion, the concern around 1) remains.

最终决定Reject

2025-01-22

Reject