6.3

/10

Poster3 位审稿人

最低5最高8标准差1.2

2.7

置信度

正确性3.0

贡献度2.7

表达3.0

ICLR 2025

Enhancing Clustered Federated Learning: Integration of Strategies and Improved Methodologies

Yongxin Guo,Xiaoying Tang,Tao Lin

OpenReview PDF

提交: 2024-09-26更新: 2025-02-24

TL;DR

We propose a unified framework for clustered FL algorithms and improve the techniques within the framework.

摘要

Federated Learning (FL) is an evolving distributed machine learning approach that safeguards client privacy by keeping data on edge devices. However, the variation in data among clients poses challenges in training models that excel across all local distributions. Recent studies suggest clustering as a solution to address client heterogeneity in FL by grouping clients with distribution shifts into distinct clusters. Nonetheless, the diverse learning frameworks used in current clustered FL methods create difficulties in integrating these methods, leveraging their advantages, and making further enhancements. To this end, this paper conducts a thorough examination of existing clustered FL methods and introduces a four-tier framework, named HCFL, to encompass and extend the existing approaches. Utilizing the HCFL, we identify persistent challenges associated with current clustering methods in each tier and propose an enhanced clustering method called HCFL$^{+}$ to overcome these challenges. Through extensive numerical evaluations, we demonstrate the effectiveness of our clustering framework and the enhanced components. Our code is available at https://github.com/LINs-lab/HCFL.

关键词

Federated LearningClustering

评审与讨论

审稿意见

评分: 6置信度: 22024-10-18

This paper introduces HCFL, a holistic framework for Clustered Federated Learning (CFL) that integrates existing methods. HCFL+ builds on this by addressing key challenges in HCFL, improving the effectiveness. Extensive experiments show the effectiveness of the proposed method.

优点

The problem is well motivated and interesting. The algorithm proposed is novel, with solid theoretical analysis and extensive experimental studies. The paper is well structured and written.

缺点

For clustered based FL algorithm, there is a recent work [1] to conduct clustering based on the inferred label distributions. The authors are suggested to discuss about this clustering strategy.

Could the authors provide more experiments on a wider range of beta (e.g., from 0.1 to 1.0), to show the effectiveness on different levels of data heterogeneity?

Also, there are setups such as C=2,C=3 in [1]. Such settings are also suggested to be studied. If the proposed algorithm can perform well across various data heterogeneity partitions, the paper can be stronger.

[1] Diao, Yiqun, Qinbin Li, and Bingsheng He. "Exploiting Label Skews in Federated Learning with Model Concatenation." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 38. No. 10. 2024.

问题

Please see weaknesses. Generally, I think this is a solid work. I will adjust the score based on author response.

评论- Reply to Reviewer KWQ4

2024-11-20

Dear reviewer KWQ4,

Thank you for your thorough review! Below are our responses to the questions you raised.

For clustered based FL algorithm, there is a recent work [1] to conduct clustering based on the inferred label distributions. The authors are suggested to discuss about this clustering strategy.

Thank you for bringing this to our attention. After reviewing the paper, we have included a discussion on this topic in the revised version (lines 178-179). We believe that inferring label distributions to calculate client distances could be a valuable enhancement to Tier 2 of the HCFL framework.

More Experiments

Thank you for the suggestion! We have conducted new experiments on CIFAR10 dataset as per your recommendation. For baseline algorithms, we choose the setting of ICFL and stoCFL which perform the best in Table 1 of our submission.

We add experiments on $\beta = 0.8$ , and $\beta = {0.2, 0.4}$ settings are included in the original submission. Moreover, we added C=2 and C=4 settings. Following [1], only label shifts are considered here.
The results demonstrate that HCFL+ consistently outperforms the baseline algorithms across all settings, highlighting its reliable performance gain.

Algorithms	$\beta = 0.8$ ,Val	$\beta = 0.8$ ,Test	C=2, Val	C=2, Test	C=4, Val	C=4, Test
ICFL	72.94	24.98	83.96	46.00	72.34	72.30
stoCFL	60.0	11.97	86.92	20.00	70.38	32.60
HCFL+(FedEM)	73.68	28.63	95.14	49.00	89.80	62.30

评论- Ack of rebuttal

2024-11-22

Thanks for author's response. My concerns are addressed. I keep my rating towards acceptance.

2024-11-23

Dear Reviewer KWQ4,

Thank you for the positive rating! We truly appreciate the time and effort you dedicated to reviewing our paper, as your suggestions greatly help in improving its quality.

审稿意见

评分: 5置信度: 32024-10-31

The contribution formalizes a general pipeline/framework for clustered federated learning, on the one hand via a cost function, on the other hand via algorithmic choices (as it is not easy to include the objective of an optimum clustering number into costs in a meaningful way). This leads to an improved variant, where the data per client can belong to different clusters, and extensions of at the moment rare soft clustering schemes. The benefit is evaluated in both, comparative studies and ablation studies. A comparably long appendix addresses how to compute an EM scheme based on the costs, how to design data, what happens for linear parts, and some more insight.

优点

The article addresses the important problem of efficient federated learning in the presence of data shift. It provides a very detailed experimental analysis. It also takes some effort to substantiate the observations with theoretical insight. Moreover, the authors promise to release the code open source.

缺点

The proposal left me a bit puzzled, as the specific contribution is somewhat unclear. On the one hand, the contribution promises a general framework/principle how to model FL with clustering. Here, the costs are rather obvious, as is the four-tier modeling, given the existing work how to model clustering; hence I am not sure what exactly is the contribution, is it the specific way of implementation, or specific guarantees which can be given? In how far is this modeling surprising/challenging and what exactly is the contribution, please specify (eg is it the better implementation? It would help if you could either provide specific benefits which arise from there which would not have been possible without this abstraction, or to provide examples where it is not obvious that the method falls under this common framework.

The improved version allows the individual assignment of data of one client. Here an according EM scheme is derived (a bit lengthy but straightforward), soft clustering is considered (which seems also straightforward given the existing work on soft clustering and its algorithms). Personally, I find the definition of a new way to measure distances w.r.t. drift most interesting, albeit very shortly presented. Here references to existing technologies how to deal with drift are missing (such as decomposition of sets of data with drift into some where the drift is homogeneous, e.g. moment trees and Kolmogorov trees). I suggest to have a closer look at the (exhaustive) literature on learning with drift in the incremental setup.

I find the presentation suboptimal as the main part of the work reads almost trivial in wide parts, whereas some important insight seems to be hidden in the appendix. It would help if the main take aways of the appendix would be highlighted. Moreover (as already said before), please more clearly highlight why the holistic framework is not trivial, and benefitial with specific non-trivial results/examples.

问题

What is the overall objective of FL in a learning theoretical sense, i.e. how exactly would the generalization error which is targeted be expressed? What would be naturally occurring drift/shift in such scenarios and how does this match with the shift modelled in experiments? What are results if the clustering itself is evaluated (eg having ground truth on the data distributions)? How personalized are the models? And how does this scale with the required number of data as regards valid generalization?

评论- Reply to Reviewer NXiS (1/2)

2024-11-20

Dear reviewer NXiS,

Thank you for your thorough review. Below, we provide our responses to the questions you raised.

Clarification on contribution

Thank you for the suggestion! We would like to clarify that

To the best of our knowledge, the HCFL framework provides the first unified framework for supervised clustered FL, incorporating (1) a unified clustering objective function that handles both soft and hard clustering and (2) a unified, four-tier clustering procedure paradigm. We believe that the HCFL framework holds significant importance in illustrating the recent progress in this field.
The HCFL framework allows for free combination of existing techniques within each layer, enabling new benefits beyond simply recovering traditional methods. For instance,
- FedRC exhibits good generalization performance but cannot automatically determine the number of clusters, while CFL excels in determining the number of clusters but lacks generalization. The HCFL framework enables the combination of the benefits of FedRC and CFL, achieving both good generalization and personalization performance (Table 1).
- By allowing both cluster removing and cluster addition, HCFL+ achieves comprable or even better performance to baseline algorithms with significantly less number of clusters.
Based on the HCFL framework, we identified several potential improvements to current CFL methods and proposed HCFL+ as a solution. Results demonstrate that HCFL+ achieves superior performance compared to baselines.

A closer look at the literature on learning with drift in the incremental setup.

We would like to clarify that our work focuses on the supervised clustered FL scenario, where clients possess data with varying distributions. To address the issue of drifts, studies employ various techniques to group clients into different clusters. These techniques include:

Grouping clients based on their local loss values [1]. By examining the loss functions of clients, we can identify those with similar performance characteristics and group them accordingly.
Grouping clients based on the distance between their model parameters [2,3]. This approach leverages the similarity of model updates to group clients with similar data distributions.
Grouping clients based on feature norm [4]. This method involves identifying representative samples for each client and grouping clients with similar feature norms.
Grouping clients by solving bi-level optimization problems [5,6]. This approach involves formulating a nested optimization problem where the outer problem optimizes the clustering of clients, while the inner problem optimizes the model parameters for each cluster.

In this paper, we use feature prototypes, feature means, and local loss values (see Section 4.4 and Appendix D) in conjunction with a bi-level optimization method (Equations 2-3) for adaptive client grouping. The results presented in Table 2 demonstrate that this approach can lead to enhanced performance.

main take aways of the appendix.

Thank you for your suggestions! Below are the key takeaways from the appendix, which we have now included at the beginning of the revised paper.

Appendix A: We provide a proof showing how Eq. 4–7 are derived, demonstrating how they solve the EM-like objective function (Eq. 2–3) in practice.
Appendix B: We analyze the HCFL framework in the context of linear representation learning, without assuming an equal number of samples across clusters. Our findings indicate that an imbalance in sample sizes and a higher level of drift can slow down convergence.
Appendix C: A more detailed discussion of related works.
Appendix D: We explain the rationale behind the design and practical implementation of the distance metric. For each pair of clients, we compute two types of distances: first, the distance between their local class prototypes for each class, and second, the distance between their feature means. The final distance metric for the client pair is then determined by selecting the maximum of these two distances and multiplying it by the local loss values.
Appendix D: A detailed version of the HCFL+ algorithm is provided.
Appendix E: We report the experimental settings used in our study in detail.
Appendix E: Ablation studies on the impact of hyperparameters and algorithmic components show that HCFL+ is robust to variations in hyperparameters, and that all proposed components provide individual performance gains.

the overall objective of FL

In this paper, we provide two objectives:

the mean performance/generalization error on clients’ local distribution, evaluating the personalization performance of models.
the performance/generalization error unseen global distribution, evaluating the generalization performance of models.

These two objectives are evaluated as “Val” and “Test”, correspondingly.

评论- Reply to Reviewer NXiS (2/2)

2024-11-20

What would be naturally occurring drift/shift in such scenarios and how does this match with the shift modelled in experiments?

In FL scenarios, three types of naturally occurring shifts can arise [7]:

Label Distribution Shifts: The label distributions P(y) differ among clients.
Feature Distribution Shifts: The feature distributions P(x) differ among clients.
Concept Shifts: The conditional distributions P(y|x) differ among clients.

In our experiments, clients may experience all three types of shifts relative to each other. Further details are provided in Appendix E.1.

Label Distribution Shifts: To model this, we use Latent Dirichlet Allocation (LDA) with α = 1.0 to partition the entire dataset into 100 clients, ensuring each client has a distinct label distribution P(y).
Feature Distribution Shifts: We introduce random augmentations to client samples, with each client applying the same augmentation type consistently. This results in differing feature distributions P(x) across clients.
Concept Shifts: These shifts are introduced by swapping the labels of certain classes, causing the conditional distributions P(y|x) to differ among clients.

What are results if the clustering itself is evaluated (eg having ground truth on the data distributions)?

Thank you for posing such a valuable question. The fundamental truth behind clustering depends heavily on the definition and principles of clustering that are being utilized.

In some studies [1,2,3,4,5], clustering methods are typically designed to group clients with similar data distributions into the same clusters. However, it is important to recognize that in FL settings, all clients' data distributions can be varied, and assigning each client to a separate cluster may not be a practical or effective approach. To address this challenge, clustered FL methods incorporate additional thresholds to determine the similarity of clients within the same cluster, as illustrated in Table 1.
- To assess the effectiveness of the clustering in this case, we evaluate the model's performance on both local and global data distributions, as shown in Table 1. Generally speaking, better local (val) and global (test) accuracy serves as an indicator of better clustering.
In certain scenarios, as outlined in [6], the authors take into account diverse distribution shifts and propose grouping clients without concept shifts into the same clusters. This approach aims to enhance generalization performance by maintaining clients with similar concepts within the same cluster. In such cases, the optimal number of clusters corresponds to the number of unique concepts (in our experiments, this number is 3). Here, the global (test) accuracy becomes particularly crucial in evaluating the clustering effectiveness.

How personalized are the models?

We would like to provide further clarification regarding our submission:

We have reported the personalization performance using the Val metric, which is calculated by averaging the local accuracy on all clients’ local test datasets. The results presented in Table 1 and 3 demonstrate that HCFL+ can achieve comparable or superior performance to the baseline methods, even when using a smaller number of clusters. This underscores the efficiency and efficacy of our proposed method in personalizing the model.
The personalization performance of HCFL+ is influenced by several hyper-parameters, namely tol for CFL , $\alpha^{*}(0)$ for ICFL, $\tau$ for stoCFL, and $\rho$ for HCFL+. It is important to note that there is often a trade-off between personalization and generalization, and adjusting these hyper-parameters allows us to fine-tune the balance between the two, according to the specific requirements of the application.

We hope our responses have sufficiently addressed your concerns. Please do not hesitate to reach out if you have any further questions. Thank you again for your time and effort in reviewing our paper.

[1] An efficient framework for clustered federated learning. NeurIPS 2020.

[2] Multi-center federated learning: clients clustering for better personalization. WWW 2023.

[3] On the byzantine robustness of clustered federated learning. ICASSP 2020.

[4] Edge devices clustering for federated visual classification: A feature norm based framework. TIP 2023.

[5] Federated multi-task learning under a mixture of distributions. NeurIPS 2021.

[6] FedRC: Tackling Diverse Distribution Shifts Challenge in Federated Learning by Robust Clustering. ICML 2024.

[7] Advances and open problems in federated learning. Foundations and Trends in Machine Learning.

评论- Kindly Reminder

2024-11-24

Dear reviewer NXiS,

Thank you once again for your time and effort in reviewing our paper. Your feedback has been invaluable in improving the quality of our manuscript.

As the rebuttal period is coming to a close, we believe we have fully addressed all of your concerns. We kindly request that you reconsider your score.

Please feel free to reach out if you have any additional questions or comments.

评论- Appreciation of reply

2024-11-26

Thanks a lot for the reply. Some of my issues remain problematic,

such as 'general framework' is in this case pretty vague, it seems more like a 'generic pipeline' to me, where I still do not see a specific added value which is surprising.
evaluation of clustering, there are external evaluation measures, so I do not see why you could not do the same - regardless of the fact that clustering is in general ill-posed, of course. That said, I will not fight for the paper, as I think that the benefits of the framework could have been made clearer / crisper; but would also not object if the other reviewers fight for acceptance.

评论- Evaluation of Clustering

2024-11-26

Dear reviewer NXiS,

Thank you for your feedback. For the evaluation of clustering, we present the clustering results of HCFL+ in the C=2 setting (where each client is assigned data from two classes). Specifically, HCFL+ generates 5 clusters in this setting, and we report the number of samples from each class assigned to clusters 1–5. The results show that HCFL+ successfully finds a relatively ideal clustering by:

Assigning all samples with the same label to the same cluster.
Avoiding the generation of a large number of clusters, as seen in stoCFL and ICFL.
Achieving higher validation and test accuracy compared to the baselines.

HCFL+(FedEM, $\rho = 0.1$ )	Cluster 1	Cluster 2	Cluster 3	Cluster 4	Cluster 5
Class 1	4483	0	0	0	0
Class 2	0	4502	0	0	0
Class 3	0	0	4464	0	0
Class 4	0	0	4536	0	0
Class 5	0	4498	0	0	0
Class 6	0	0	0	0	4483
Class 7	0	0	0	4491	0
Class 8	4517	0	0	0	0
Class 9	0	0	0	4509	0
Class 10	0	0	0	0	4517

Clusters Num	Memory (M)	Simulation Time (s/it)	Final Cluster Number	Val	Test
ICFL ( $\alpha(0) = 0.85$ )	4190	124.71	100	83.96	46.00
StoCFL ( $\tau = 0.15$ )	2834	178.53	41	86.92	20.00
HCFL+(FedEM, $\rho = 0.1$ )	2564	176.48	5	95.14	49.00

审稿意见

评分: 8置信度: 32024-11-02

This manuscript proposes a holistic federated learning framework to enhance classification performance by grouping clients into clusters. The framework comprehensively integrates the hard and soft partitional clustering, and clustering with and without automatic cluster number determination. Comphrehensive theoretical and empirical evidence have been provided to illustrate the effectiveness of the proposed method. Moreover, the paper is generally well-written and easy to follow.

优点

A holistic FL framework incorporating different clustered FL methods for more comprehensive FL. The research problem is important, and this work contributes to fixing the shortcomings of the existing related works.
Comprehensive experimental evaluation has been conducted to illustrate the effectiveness of the proposed method.
The paper is well written with a clear demonstration of the motivations and problems for solving.

缺点

This work focuses on using clustering techniques to enhance the classification accuracy of FL. The difference between this type of research and the fully unsupervised federated clustering should be discussed to avoid potential misunderstandings.
The efficiency issue is listed as one of the challenges in Section 4.1. But only the final number of clusters is reported accordingly. More discussions about the time and space complexity of this work, or even corresponding evaluation results are preferable.
The source code is not opened in the current version.

问题

See the weaknesses.

伦理问题详情

N.A.

评论- Reply to Reviewer hh5r

2024-11-20

Dear reviewer hh5r,

Thank you for your detailed review! Below are our responses to the questions you raised.

This work focuses on using clustering techniques to enhance the classification accuracy of FL. The difference between this type of research and the fully unsupervised federated clustering should be discussed to avoid potential misunderstandings.

Thank you for your suggestion! This work following the line of supervised clustered FL, and having the clear distinction to unsupervised clustered FL. We have revised our paper (footnote of page 1) by adding more clarification and related works [1,2]:

In this study, we address the issue of supervised clustered FL, which is differ from the unsupervised clustered FL examined by [1,2].

The efficiency issue is listed as one of the challenges in Section 4.1. But only the final number of clusters is reported accordingly. More discussions about the time and space complexity of this work, or even corresponding evaluation results are preferable.

Thank you for your suggestion! We have conducted additional experiments on the CIFAR10 dataset, with each client containing data from 2 classes (See results below). The hyper-parameters used were chosen based on the optimal configurations outlined in Table 1 of the submission.

In addition to the final cluster number, we also report the memory usage and total simulation time.
Notably, HCFL+(FedEM) achieved superior performance with a significantly reduced number of clusters, which result in reduced simulation time and memory usage.

Algorithms	Memory (M)	Simulation Time (s/it)	Final Cluster Number	Val	Test
ICFL ( $\alpha(0) = 0.85$ )	4190	124.71	100	83.96	46.00
StoCFL ( $\tau = 0.15$ )	2834	178.53	41	86.92	20.00
HCFL+(FedEM, $\rho = 0.1$ )	2564	176.48	5	95.14	49.00

The source code is not opened in the current version.

We have provided the code in the Supplementary Material.

[1] Ding, Shifei, et al. "Horizontal Federated Density Peaks Clustering." IEEE Transactions on Neural Networks and Learning Systems (2023).

[2] Qiao, Dong, Chris Ding, and Jicong Fan. "Federated spectral clustering via secure similarity reconstruction." Advances in Neural Information Processing Systems 36 (2024).

2024-11-26

Thank you for your response, which well addressed my concerns. I have raised my score.

2024-11-26

Dear reviewer hh5r,

Thank you for raising your score! Your detailed review has been invaluable in helping us improve the quality of the paper.

AC 元评审

2024-12-16

This paper has been borderline in the evaluations through most of the review process. The lowest score (NXiS), which is not opposed to acceptance, essentially argues against the notion that the framework proposed is sufficient for acceptance. Another reviewer (hh5r) takes this part in a rather positive light and increased their score.

I am inclined towards thinking that for such a topic, a framework can be a sound basis for a contribution, because FL entails several problems that would be non trivial considered separately (privacy, learning, distribution, etc.). I recognize that such a task is highly non trivial because, as reviewer NXiS points out, the main caveat is to have the reader realize that the contribution passes the "sounds obvious" bar. I believe the paper does a reasonable job of explaining why its contribution is worthy of interest and the authors have done a good job explaining it in their answer to NXiS. I can only encourage them further to polish their paper to make even more clear their contribution, in particular using the discussion with NXiS to perhaps reformat a part of their introduction.

审稿人讨论附加意见

The review of hh5r and exchanges with authors was instrumental in the decision.

最终决定Accept (Poster)

2025-01-22

Accept (Poster)