PaperHub
5.7
/10
Poster3 位审稿人
最低5最高6标准差0.5
5
6
6
4.0
置信度
正确性2.7
贡献度2.7
表达3.0
NeurIPS 2024

FedLPA: One-shot Federated Learning with Layer-Wise Posterior Aggregation

OpenReviewPDF
提交: 2024-05-14更新: 2024-11-06
TL;DR

We propose FedLPA to significantly improve the performance via Layer-Wise Posterior Aggregation in one-shot federated learning.

摘要

关键词
One-shot Federated Learning

评审与讨论

审稿意见
5

This paper proposes FedLPA, a novel one-shot federated learning method that uses layer-wise posterior aggregation. It aggregates local models to obtain a more accurate global model without requiring extra datasets or exposing private label information. The key innovation is using layer-wise Laplace approximation to efficiently infer the posteriors of each layer in local models, then aggregating these layer-wise posteriors to train the global model parameters. Extensive experiments show FedLPA significantly improves performance over state-of-the-art methods across several metrics, especially for non-IID data distributions.

优点

  1. It achieves good performance with only a single round of communication between clients and the server, reducing communication overhead and privacy risks. It performs well on heterogeneous data distributions across clients.
  2. FedLPA doesn't need additional public datasets.
  3. The paper provides a convergence analysis showing a linear convergence rate for the global model optimization.

缺点

  1. The method relies on multiple layers of approximation - empirical Fisher to approximate the Hessian, block-diagonal Fisher instead of full, and approximating global model parameters through optimization. Each approximation introduces some error. These compounding approximations could potentially lead to suboptimal global models. However, the paper's empirical results suggest that in many practical scenarios, these approximations still lead to good performance. Nonetheless, a more thorough theoretical analysis of these approximation errors and their impact would strengthen the paper.
  2. Although more efficient than some baselines, FedLPA still requires more computation than simpler methods like FedAvg.
  3. Storing and transmitting the block-diagonal Fisher matrices for each layer increases memory usage and communication costs compared to methods that only share model weights. For very large models or with many clients, the increased communication overhead from sharing Fisher matrices could become significant.

问题

please see the weakness

局限性

N/A

作者回复

Dear Reviewer 4qUr, thanks for your comments, which helped us improve our paper. The answers to all your questions are as follows:

Q1: The method relies on multiple layers of approximation - empirical Fisher to approximate the Hessian, block-diagonal Fisher instead of full, and approximating global model parameters through optimization. Each approximation introduces some error. These compounding approximations could potentially lead to suboptimal global models. However, the paper's empirical results suggest that in many practical scenarios, these approximations still lead to good performance. Nonetheless, a more thorough theoretical analysis of these approximation errors and their impact would strengthen the paper.

Answer: In our paper, note that we have three approximations listed as follows:

(1) empirical Fisher to approximate the Hessian

In our Appendix E, we show the theoretical analysis of this approximation error.

(2) block-diagonal Fisher instead of full Fisher

Paper [1], which we cited in our paper, provides a detailed evaluation and testing of using block-diagonal Fisher to approximate the full one. Firstly, Chapter 6.3.1, "Interpretations of this approximation" in the paper [2] indicates that using a block-wise Kronecker-factored Fisher closely approximates the full Fisher. Although there is a bias term (due to the approximation in our Appendix Equation 30), this term approximates zero when there are sufficient samples. Furthermore, the paper examines the approximation quality of block-diagonal Fisher compared with the true Fisher and suggests that block-diagonal Fisher captures the main correlations, while the remaining correlations have a minimal impact on the experimental results.

(3) approximating global model parameters through optimization

In our Appendix J, we show the convergence analysis of our method.

In summary, the approximations have a negligible effect on the final test accuracy. Some experiment results are shown in Table 22 of Appendix M.12.

We will summarize these in the appendix once accepted.

[1] Ritter, Hippolyt, Aleksandar Botev, and David Barber. "A scalable laplace approximation for neural networks." 6th international conference on learning representations, ICLR 2018-conference track proceedings.

[2] Martens, James. Second-order optimization for neural networks. University of Toronto (Canada), 2016.

Q2: Although more efficient than some baselines, FedLPA still requires more computation than simpler methods like FedAvg.

Answer: Our method FedLPA only introduces 30% more computation time compared to the simple FedAvg while increasing the test accuracy by up to 35% in some settings, as shown in Table 1 and Table 4. As shown in Table 17, the state-of-the-art Dense will use 6.15x times the computation time of our FedLPA, while our method performs better in most cases considering test accuracy. Co-Boosting, another distillation method, will use 10.77x times the computation time of our FedLPA. Our method is also faster than FedProx and FedOV. Our method is compatible with FedOV and Co-Boosting and performs much better than FedProx.

Our FedLPA strikes a balance between the computation overhead and the performances.

Q3: Storing and transmitting the block-diagonal Fisher matrices for each layer increases memory usage and communication costs compared to methods that only share model weights. For very large models or with many clients, the increased communication overhead from sharing Fisher matrices could become significant.

Answer: Our FedLPA strikes a balance between the communication overhead and the performances. As shown in Table 4, FedLPA also has a communication overhead of only 2x times that of FedAvg. The communication overhead is similar between our FedLPA and the popular SCAFFOLD. In some settings, FedLPA could increase the performance up to 35% compared to FedAvg and SCAFFOLD.

In our paper, we also give detailed examples of the communication overhead of our FedLPA in Appendix M.8.

For all the datasets, as the number of clients increases, the communication overhead will also increase linearly. Intuitively, for very large models or with many clients, our communication overhead is solely about two times of FedAvg. Compared with the computation overhead in the one-shot setting, this is acceptable.

评论

Dear Reviewers 4qUr,

We sincerely appreciate your time and efforts in reviewing our manuscript and offering valuable suggestions. Note that there will be no second stage of author-reviewer discussions. As the author-reviewer discussion phase is drawing to a close, we would like to confirm whether our responses have effectively addressed your concerns. We provided detailed responses to your concerns a few days ago, and we hope they have adequately addressed your issues.

If you require further clarification or have any additional concerns, please do not hesitate to contact us. We are more than willing to continue our communication with you.

Best regards,

Authors of Submission #8916

评论

Dear Reviewers 4qUr,

We sincerely appreciate your time and efforts in reviewing our manuscript and offering valuable suggestions.

We understand you are busy viewing multiple papers. As the rebuttal deadline is approaching, we are slightly nervous and look forward to your reply or suggestions. We would be more than grateful if you took some time to confirm whether our responses have effectively addressed your concerns and increased the evaluation of our paper.

Please do not hesitate to contact us if you require further clarification or have any additional concerns. We are more than willing to continue our communication with you.

Thanks so much!

Best regards,

Authors of Submission #8916

评论

Thank you for the detailed explanation regarding my concerns. Some of my concerns have been well addressed, but I still believe that the proposed method may expose more information to attackers which could lead to the privacy leakage. Thus, I will update my score accordingly.

评论

Dear Reviewer 4qUr,

We sincerely appreciate your constructive comments and prompt responses, which help us improve our paper. It is our pleasure to address your concerns during the discussion. Again, thanks for your time and reviews, for evaluating the value of our work, and for improving our scores!

Note that In Appendix L, we discuss privacy concerns related to our FedLPA method and demonstrate that it offers the same level of privacy as baseline methods, effectively countering the iDLG attack. We also highlight that FedLPA is compatible with privacy-preserving technologies like DP. Additionally, a concrete example illustrates how FedLPA maintains data privacy.

Thanks again for improving the rating of our paper, and we are more than grateful for this positive score. Your reviews really help us to polish our paper and make our manuscript more solid. Hope you have a wonderful day!

Warm regards,

Authors of Submission #8916

审稿意见
6

The paper "FedLPA: One-shot Federated Learning with Layer-Wise Posterior Aggregation" introduces FedLPA, a novel one-shot federated learning method that addresses challenges associated with high statistical heterogeneity in non-identical data distributions. The framework uses layer-wise posterior aggregation based on the empirical Fisher information matrix, allowing for the accurate capture and aggregation of local model statistics into a global model. The paper claims that FedLPA improves learning performance significantly over state-of-the-art methods across several datasets without requiring auxiliary datasets or exposing private label information.

优点

  1. Originality: The introduction of layer-wise posterior aggregation using the empirical Fisher information matrix is a novel approach in the context of one-shot federated learning. This method effectively addresses the challenge of non-IID data distributions.
  2. Quality: The paper provides a rigorous theoretical foundation for the proposed method, including convergence proofs and detailed mathematical formulations. The extensive experimental results on various datasets further support the claimed improvements in learning performance.
  3. Clarity: The paper is well-structured and clearly written, with thorough explanations of the methodologies and theoretical concepts. The inclusion of figures and tables helps in understanding the experimental results.
  4. Significance: By improving the performance of one-shot federated learning under non-IID conditions, the proposed method has significant implications for practical applications where data privacy and communication efficiency are critical concerns.

缺点

  1. Implementation Complexity: The use of layer-wise posterior aggregation and the empirical Fisher information matrix introduces significant complexity. The practicality of implementing FedLPA in real-world settings could be challenging without detailed guidelines or simplifications.
  2. Scalability: The paper does not adequately address the scalability of the proposed method to large datasets or a high number of clients. Evaluating the computational and communication overheads in such scenarios is necessary to understand the practical feasibility of FedLPA.
  3. Privacy Considerations: While the method claims to preserve data privacy, the paper lacks a detailed discussion on potential privacy risks and mitigation strategies, which is crucial for federated learning applications.
  4. Experimental Scope: The experiments, although comprehensive, are limited to a few datasets and simple neural network structures. Additional validation on larger and more complex datasets, as well as a variety of neural network architectures, would provide stronger empirical support for the method's generalizability.
  5. Parameter Sensitivity: The performance of FedLPA may be sensitive to the choice of parameters, such as the noise levels for the empirical Fisher information matrix. A detailed analysis of parameter sensitivity and guidelines for selecting appropriate parameter values would enhance the robustness of the method.

问题

I have no question.

局限性

The authors have acknowledged some limitations of their work, but further discussion would enhance the robustness of the paper:

  1. Complexity and Practicality: The complexity of the FedLPA framework may hinder practical implementation. Providing more detailed guidelines or potential simplifications could make the approach more accessible for real-world applications. Additionally, discussing potential trade-offs between complexity and performance would be beneficial.
  2. Scalability: The paper does not sufficiently address the scalability of the proposed methods in environments with many clients or large graphs. Detailed evaluations of the computational and communication overheads involved in scaling the methods would help understand their practical feasibility.
  3. Privacy Risks: A deeper analysis of potential privacy risks and mitigation strategies is necessary, particularly in federated learning settings where data privacy is a major concern. Discussing how privacy-preserving techniques can be integrated into the proposed framework would strengthen the paper.
  4. Experimental Validation: While the experiments are comprehensive, further validation on larger-scale datasets and more diverse GNN architectures would provide stronger empirical support for the proposed methods. Expanding the experimental scope would help demonstrate the applicability of the methods in various real-world scenarios.

Overall, the paper makes valuable contributions to federated learning for graph data, but addressing the mentioned weaknesses would further enhance its robustness and applicability.

作者回复

Dear Reviewer XECp, thanks for your comments, which helped us improve our paper. The answers to all your questions are as follows:

Q1: Implementation Complexity: The use of layer-wise posterior aggregation and the empirical Fisher information matrix introduces significant complexity. The practicality of implementing FedLPA in real-world settings could be challenging without detailed guidelines or simplifications.

And

Complexity and Practicality: The complexity of the FedLPA framework may hinder practical implementation. Providing more detailed guidelines or potential simplifications could make the approach more accessible for real-world applications. Additionally, discussing potential trade-offs between complexity and performance would be beneficial.

Answer:The theoretical analysis of the layer-wise posterior aggregation may be complex, however, the practical implementation is not complicated. Besides, we have provided several functions with APIs to ensure non-experts can adopt our FedFPA framework within a few lines of code. In detail, in our submitted source code zip file, we provide the APIs for the users to use our method in line 541 of experients_our.py. We also include the artifact details in our paper and provide all the script files to reproduce the results of our paper.

In our paper, we also show that our FedLPA strikes a balance between the computation overhead and the performances and incurs solely 30% more computation time compared to the simple FedAvg method while increasing the test accuracy by up to 35% in some settings, as shown in Table 1 and Table 4.

Q2: Scalability: The paper does not adequately address the scalability of the proposed method to large datasets or a high number of clients. Evaluating the computational and communication overheads in such scenarios is necessary to understand the practical feasibility of FedLPA.

Answer:In our paper, our FedLPA shows good performances on MNIST, FMNIST, CIFAR-10, CIFAR-100, SVHN, EMNIST datasets. We further add the experiments on the same experiment setting in the paper using ResNet-18 on Tiny-ImageNet. The results show that our method has the potential to deal with large datasets.

PartitionsFedLPADenseFedAvg
0.117.02±\pm1.4015.88±\pm1.963.72±\pm1.44
0.327.80±\pm2.1024.91±\pm1.658.41±\pm0.87
0.530.14±\pm1.2529.43±\pm0.7212.07±\pm1.92

We also add the experiments with more clients in the same experiment setting of the paper using FMNIST datasets. The results show that our method could support up to 200 clients.

Partitions\Clinet number102050100200
0.155.33±\pm0.0657.37±\pm0.0557.03±\pm0.0054.80±\pm0.1354.17±\pm0.26
0.368.20±\pm0.0471.30±\pm0.0366.70±\pm0.2366.28±\pm0.4564.52±\pm0.08
0.573.33±\pm0.0674.07±\pm0.0071.13±\pm0.0070.72±\pm0.0970.05±\pm0.27

The computational and communication overheads are shown in Table 4 in our paper. As the number of clients increases, the computational and communication overheads increase linearly.

Q3: Privacy Considerations: While the method claims to preserve data privacy, the paper lacks a detailed discussion on potential privacy risks and mitigation strategies, which is crucial for federated learning applications.

Answer:In Appendix L, we give a detailed discussion about privacy concerns for our FedLPA. We show that our method has the same privacy level as the baselines, counteractting the iDLG attack. Our method could also be compatible with existing privacy-preserving technologies (i.e., DP). At the end of Appendix L, we also provide a concrete example of privacy attacks to show how FedLPA preserves data privacy.

Q4: Experimental Scope: The experiments, although comprehensive, are limited to a few datasets and simple neural network structures. Additional validation on larger and more complex datasets, as well as a variety of neural network architectures, would provide stronger empirical support for the method's generalizability.

And

Experimental Validation: While the experiments are comprehensive, further validation on larger-scale datasets and more diverse GNN architectures would provide stronger empirical support for the proposed methods. Expanding the experimental scope would help demonstrate the applicability of the methods in various real-world scenarios.

Answer:In our paper, our FedLPA shows good performances on MNIST, FMNIST, CIFAR-10, CIFAR-100, SVHN, EMNIST datasets. We further did the experiments with Tiny-ImageNet, as shown above.

In Appendix M.10, we also show the results using more complex network structures such as ResNet-18.

In this paper, we conduct the experiments on CV tasks without focusing on the graph neural network.

Q5: Parameter Sensitivity: The performance of FedLPA may be sensitive to the choice of parameters, such as the noise levels for the empirical Fisher information matrix. A detailed analysis of parameter sensitivity and guidelines for selecting appropriate parameter values would enhance the robustness of the method.

Answer: In our paper, we do not have the noise level settings for the empirical Fisher information matrix. We only have one hyper-parameter, λ\lambda from Eq. 33, which controls variances of a priori normal distribution and guarantees AkA_k and BkB_k are positive semi-definite.

All other Laplace Approximations are sensitive to the hyper-parameter λ\lambda based on their experimental results, but Table 3 shows that our approach is relatively robust.

评论

Thanks for the authors' detailed rebuttal. After careful consideration, the original assessment and rating will remain the same.

评论

Dear Reviewers XECp,

We sincerely appreciate your time and efforts in reviewing our manuscript and offering valuable suggestions. Note that there will be no second stage of author-reviewer discussions. As the author-reviewer discussion phase is drawing to a close, we would like to confirm whether our responses have effectively addressed your concerns. We provided detailed responses to your concerns a few days ago, and we hope they have adequately addressed your issues.

If you require further clarification or have any additional concerns, please do not hesitate to contact us. We are more than willing to continue our communication with you.

Best regards,

Authors of Submission #8916

评论

Dear Reviewers XECp,

Thanks for replying to us while you are busy reviewing multiple papers. Although you maintained your rating after careful consideration, we, feeling a little bit of pity, still appreciate your time and efforts in reviewing our manuscript and offering valuable suggestions. We would be grateful if you could increase the rating of our paper during the following discussions among the reviewers and AC.

Again, thanks for your time and reviews. If you have any further concerns, we will be more than happy to continue our communication with you.

Best regards,

Authors of Submission #8916

审稿意见
6

This paper proposes a one-shot Federated Learning (FL) method, denoted as FedLPA, to address heterogeneous data distribution among clients. FedLPA does not demand auxiliary datasets or private label information during aggregation on the server side. To achieve this, FedLPA infers the posteriors by leveraging the Fisher information matrix of each layer in local models using layer-wise Laplace approximation and aggregates these to train the global model. Abundant experiment results demonstrate the efficacy of FedLPA compared to conventional FL methods under the one-shot setting.

优点

  1. The idea of the paper is easy to follow.
  2. Instead of measuring co-relations between different layers, the proposed method only approximates layer-wise Fisher to get a good trade off.
  3. Extensive experiment results show the superiority of FedLPA under the one-shot FL setting.

缺点

To be honest, I do not buy the one-shot Federated Learning (FL) setting. This setting goes against the idea of FL. But this does not affect my rating.

  1. There are some typo problems.
  2. Authors should compare the proposed method with differential privacy (DP) FL or prototype-based methods rather than conventional FL methods. All these approaches address communication security but from different perspectives.
  3. Would the proposed method also show superior performance on more challenging datasets, like tinyimagnet and office-home?
  4. Any proof to show that "computing the co-relations between different layers brings slight improvement"?

问题

As I mentioned in the weakness, the authors should compare the proposed method with DP FL and prototype-based FL rather than conventional FL methods. I believe the experiments in the paper do not provide a fair comparison.

According to the authors in Supplementary Sec.L, the security level of the proposed method is similar to that of FedAVG, meaning it is also vulnerable to attacks targeting FedAVG. Does the only advantage of the proposed method is that attackers have fewer opportunities to strike since they only communicate with the server once?

局限性

The topic discussed within the paper is highly related to privacy protection and the authors show a novel method to deal with it.

作者回复

Dear Reviewer F77H, thanks for your comments, which helped us improve our paper. The answers to all your questions are as follows:

Q1: There are some typo problems.

Answer: Thanks for pointing that out and thanks for your efforts in reviewing our paper. We found the following typos and will revise if we find more.

Sec3.7 We->we

Sec4.1 benchmark->benchmarks

Appendix F in one-shot -> in the one-shot

Appendix M.13 proportion -> proportions

Q2: Authors should compare the proposed method with differential privacy (DP) FL or prototype-based methods rather than conventional FL methods. All these approaches address communication security but from different perspectives.

And

As I mentioned in the weakness, the authors should compare the proposed method with DP FL and prototype-based FL rather than conventional FL methods. I believe the experiments in the paper do not provide a fair comparison.

Answer: Beyond the privacy protection and the reduction on the attack surface, the one-shot FL framework also tries to deal with the substantial communication overhead and higher demand for fault tolerance throughout the rounds. In Appendix L.1, we show the experiment of DP-FedLPA and DP-FedAvg in the one-shot setting. Specifically, since the sensitivity of the data sample distribution after the normalization is 1, we add Laplacian noises with λ=1ϵ\lambda=\frac{1}{\epsilon}. We set ϵ={3,5,8}\epsilon=\{3,5,8\} that provides modest privacy guarantees since normally ϵ(1,10)\epsilon \in (1,10) is viewed as a suitable choice.

Besides, we have added the experiments using the same experiment setting in the paper on the FMNIST dataset to compare DP-FedAvg (multiple-round) as Appendix L.1 with our FedLPA (one-shot), show the round results of how many rounds DP-FedAvg needs to achieve the same test performance.

ϵ\epsilon\Partitions0.10.30.5
811108
51198
31297

The results show that DP-FedAvg needs about 10 rounds of communication to achieve the same test performance, compared to our one-round FedLPA. Combined with our previous results in Table 4 and Table 7, our FedLPA could save the communication and computation overhead and combine with the DP method to mitigate the potential privacy leakage. Based on the above settings, DP-FedAvg needs at least 3x communication overhead and 5x computation overhead. While DP-FedAvg needs multiple rounds to get similar accuracy, DP-FedAvg maybe vulnerable to more privacy attack methods due to the multiple queries, such as curvature-based privacy attacks.

Q3: Would the proposed method also show superior performance on more challenging datasets, like tinyimagnet and office-home?

Answer: In our paper, our FedLPA shows good performances on MNIST, FMNIST, CIFAR-10, CIFAR-100, SVHN, EMNIST datasets. We further add the experiments on the same experiment setting in the paper using ResNet-18 on Tiny-ImageNet. The results show that our method has the potential to deal with large datasets.

PartitionsFedLPADenseFedAvg
0.117.02±\pm1.4015.88±\pm1.963.72±\pm1.44
0.327.80±\pm2.1024.91±\pm1.658.41±\pm0.87
0.530.14±\pm1.2529.43±\pm0.7212.07±\pm1.92

Due to the rebuttal time limit, we leave the implementation on office-home dataset as future work. However, we believe that the FedLPA performs well on office-home dataset, since it shows the potential to deal with challenging data on Tiny-ImageNet dataset.

Q4: Any proof to show that "computing the co-relations between different layers brings slight improvement"?

Answer: Paper [1] we cited in our paper provides a detailed evaluation and testing of using block-diagonal Fisher to approximate the full one. Firstly, Chapter 6.3.1, "Interpretations of this approximation" in the paper [2] indicates that using a block-wise Kronecker-factored Fisher closely approximates the full Fisher. Although there is a bias term (due to the approximation in our Appendix Equation 30), this term approximates zero when there are sufficient samples. Furthermore, the paper examines the approximation quality of block-diagonal Fisher compared with the true Fisher and suggests that block-diagonal Fisher captures the main correlations, while the remaining correlations have a minimal impact on the experimental results.

We will add the above analysis in the appendix on the camera-ready version.

[1] Ritter, Hippolyt, Aleksandar Botev, and David Barber. "A scalable laplace approximation for neural networks." 6th international conference on learning representations.

[2] Martens, James. Second-order optimization for neural networks. University of Toronto, 2016.

Q5: According to the authors in Supplementary Sec.L, the security level of the proposed method is similar to that of FedAVG, meaning it is also vulnerable to attacks targeting FedAVG. Does the only advantage of the proposed method is that attackers have fewer opportunities to strike since they only communicate with the server once?

Answer: In fact, our approach can reduce the attack surface as attackers have fewer opportunities since they only communicate with the server once. Beyond that, in the one-shot setting, the FedLPA could also deal with the substantial communication overhead and higher demand for fault tolerance throughout the rounds.

FedAvg is vulnerable to privacy attacks with huge communication overhead. However, FedLPA is compatible with existing preserving approaches (i.e., DP) to achieve an even higher privacy level and strike a balance between computation, communication overhead, and performance. Note that our paper mainly focuses on the efficiency perspective to improve the performance of one-shot FL with inferior consideration of the security perspective.

评论

Thank you for your detailed response. It addressed most of my concerns. Though I do not buy this setting, I believe every study has its value. Thus, I would like to raise my score.

评论

Dear Reviewer F77H,

We sincerely appreciate your constructive comments and prompt responses, which help us improve our paper. It is our pleasure to address your concerns during the discussion. Again, thanks for your time and reviews. Thanks for evaluating the value of our work!

Warm regards,

Authors of Submission #8916

最终决定

The paper proposes a new one-shot federated learning method. The key is a novel one-shot aggregation method with layer-wise posterior aggregation called FedLPA. It addresses the problem of heterogeneous data distribution among clients. FedLPA infers the posteriors by leveraging the Fisher information matrix of each layer in local models using layer-wise Laplace approximation and aggregates these to train the global model.

Reviewers recognize the importance of the problem, contrition/novelty of the work and the empirical study. Most of the concerns raised by the reviewers were addressed by the authors, and two reviewers raised their scores. The final scores are all favorable albeit with a varying degree of support (5, 6, 6).

4qUr (rating 5) has some reservations regarding the risk of privacy leakage. The same concern is shared by F77H. In the final discussion period XECp states that

Privacy issue is an inherent drawback of federated learning methods. The main contribution is sufficient for this paper. To address the privacy problem, there are numerous techniques focused on privacy protection.

The AC believes that the contribution and novelty of the work is interesting enough to be shared with the community. For the final version please include the points clarified in the rebuttal and add a discussion regarding possible privacy leakage.