PaperHub
6.8
/10
Poster4 位审稿人
最低6最高7标准差0.4
6
7
7
7
4.3
置信度
正确性3.0
贡献度3.0
表达3.3
NeurIPS 2024

Bridging Gaps: Federated Multi-View Clustering in Heterogeneous Hybrid Views

OpenReviewPDF
提交: 2024-05-12更新: 2024-12-27

摘要

关键词
Multi-view learningClusteringFederated learning

评审与讨论

审稿意见
6

A variant of horizontal FedMVC is proposed to address more realistic scenarios involving heterogeneous hybrid views. It develops specific strategies and conducts theoretical analyses from the perspective of bridging client and view gaps. The proposed method demonstrates promising experimental results on several datasets.

优点

  1. The scenario of heterogeneous hybrid views assumed in the paper is interesting and merits further exploration.
  2. The appendices are thorough and well-organized, including theoretical proofs and additional experiments.

缺点

  1. Contrastive learning strategies have been widely used in multi-view clustering methods; thus, the synergistic contrast strategy proposed in this paper may not offer significant novelty.
  2. The paper dedicates considerable effort to describing how common semantics H is extracted from the raw data of each client, but it lacks an explanation of why this approach is suitable for clustering tasks.

问题

  1. The experimental results reported in the paper show a large discrepancy for DSIMVC on the MNIST-USPS dataset compared to the original results, while the differences are much smaller for the BDGP and Multi-Fashion datasets. Why does this phenomenon occur?
  2. In the reported experiments, the number of clients varies across different datasets. How were these numbers chosen?

局限性

see the weakness.

作者回复

Response to Reviewer uC6c:

Thank you for your valuable feedback.

Q1: Contrastive learning strategies have been widely used in multi-view clustering methods; thus, the synergistic contrast strategy proposed in this paper may not offer significant novelty.

We would like to emphasize that the local-synergistic contrast strategy comprises both feature contrastive learning and model contrastive learning, aimed at addressing the client gap and mitigating the heterogeneity between single-view clients and multi-view clients.

We acknowledge that feature contrastive learning used in multi-view clients has already been applied in some multi-view clustering methods. However, our goal is that multi-view clients can help single-view clients bridge client gaps. Thus, the innovation of this module does not lie in multi-view clients using feature contrastive learning to train local models, but rather in single-view clients using model contrastive learning to bridge client gaps and discard view-private information detrimental to clustering. By establishing a unified goal of extracting common semantics H in both single-view and multi-view clients, a communication bridge is built. Meanwhile, the extraction of common semantics helps in discovering complementary clustering structures across clients. Furthermore, model contrastive learning allows the local single-view clients to converge towards the global model while amplifying the differences between the reconstruction and consistency objectives in the local models.

Q2: The paper dedicates considerable effort to describing how common semantics H is extracted from the raw data of each client, but it lacks an explanation of why this approach is suitable for clustering tasks.

Thank you for your feedback. As we mentioned in our response to Q1, the extraction of common semantics H aims to unify the training objectives of single-view clients and multi-view clients. This allows single-view clients to use model contrastive learning to bridge client gaps and discard view-private information that is detrimental to clustering. We believe that focusing on common semantics, which eliminates the adverse effects of view-private information, is more conducive to discovering subsequent clustering structures. We will revise our manuscript to make this point clearer.

Q3: The experimental results reported in the paper show a large discrepancy for DSIMVC on the MNIST-USPS dataset compared to the original results, while the differences are much smaller for the BDGP and Multi-Fashion datasets. Why does this phenomenon occur?

Thank you for your detailed observation. We have carefully reviewed the replication code for DSIMVC and confirmed that the experimental results reported in Table 1 are accurate. In DSIMVC, incomplete samples are generated by randomly removing views under the condition that at least one view remains in the sample. In contrast, our comparison strategy considers data from multi-view clients as complete data, while data from single-view clients are regarded as incomplete data. Unlike the random removal in DSIMVC, our approach results in incomplete data that tends to have consecutive same missing views. This inconsistency in handling incomplete scenarios leads to the discrepancy between the reported results and our replicated results.

Q4: In the reported experiments, the number of clients varies across different datasets. How were these numbers chosen?

Figure 3 (b) and Figure 8 show that as the number of clients increases, the performance of FMCSC experiences a slight decline but remains generally stable. However, when the number of clients reaches a certain threshold, the clustering performance of FMCSC drops significantly. We believe this occurs because samples within each client become insufficient, which hinders local model training and negatively impacts clustering performance. Therefore, for different datasets, we aim to ensure that each client has more than 200 samples to maintain the stability of FMCSC during training. In the future, we will further explore high-quality training with fewer samples per client, encouraging more devices to participate in training and collaboration.

评论

The authors have addressed my concerns, and I would like to maintain my previous rating.

评论

Dear Reviewer uC6c,

We sincerely appreciate your quick feedback. Your constructive comments have been instrumental in enhancing our work, and we are grateful for the attention and time you have dedicated to it.

If there are any further aspects you believe could benefit from refinement, please feel free to share your thoughts.

Best wishes,

Authors

评论

Dear Reviewer uC6c,

We sincerely appreciate your time and effort in reviewing our work. We would be grateful for further feedback or confirmation that our rebuttal has adequately addressed your comments.

Thank you again for your time and consideration.

Best regards,

Authors

审稿意见
7

The authors introduce a novel method called Federated Multi-view Clustering via Synergistic Contrast (FMCSC), to simultaneously leverage the single-view and multi-view data across heterogeneous clients to discover clustering structures from hybrid views. This method bridges client and view gaps through a combination of theoretical and experimental analysis to discover the cluster structures in multi-view data distributed across different clients.

优点

Federated multi-view clustering (FedMVC) is a recently proposed and increasingly popular research direction within the multi-view learning community. This paper addresses a novel issue in FedMVC, termed ‘heterogeneous hybrid views,’ where a mixture of both single-view and multi-view clients exhibit varying degrees of heterogeneity.

Through theoretical and experimental analysis, the paper clearly shows how the proposed method bridges client and view gaps. The proposed method performs well across different federated settings, with reproducible code.

The paper is also well-written, with clear explanations, extensive experiments, and detailed theoretical proofs.

缺点

The paper does not mention the data partitioning strategy of the proposed method. The authors need to provide details on how data are distributed among different clients.

问题

I am curious whether the proposed method can be applied to both vertical FedMVC and horizontal FedMVC scenarios?

局限性

Limitations and societal impact have been discussed.

作者回复

Response to Reviewer jVcj:

We thank the reviewer for valuable comments and suggestions that have greatly improved our paper.

Q1: The paper does not mention the data partitioning strategy of the proposed method. The authors need to provide details on how data are distributed among different clients.

Thank you for pointing out this issue. The proposed method adopts the common IID partition, where multi-view data are randomly and uniformly distributed across all clients, ensuring that each client's data distribution is similar to the overall data distribution. In implementation, we achieve this by setting a large Dirichlet distribution parameter, ensuring that the proportion of different classes of data allocated to each client are nearly equal, thereby achieving the effect of independent and identically distributed data.

Q2: I am curious whether the proposed method can be applied to both vertical FedMVC and horizontal FedMVC scenarios?

Thank you for your insightful comments. We believe the proposed method can be applied to both vertical FedMVC and horizontal FedMVC scenarios. In the paper, we mention that existing FedMVC methods usually assume that clients are isomorphic and belong to either single-view clients or multi-view clients. In the heterogeneous hybrid views scenario applicable to FMCSC, vertical FedMVC scenarios can be viewed as having only single-view clients, with the number of clients equal to the number of views; horizontal FedMVC scenarios can be seen as having only multi-view clients. When facing horizontal FedMVC scenarios, FMCSC can be directly applied by setting the number of single-view clients to zero. Additionally, for vertical FedMVC scenarios, where only single-view clients exist, the method cannot bridge gaps with the help of multi-view clients. Therefore, the local-synergistic contrast module in Section 3.3 needs to be frozen before applying FMCSC.

The above analysis demonstrates that FMCSC, suitable for heterogeneous hybrid views scenarios, can still be applied to both vertical FedMVC and horizontal FedMVC scenarios. The heterogeneous hybrid views scenario complements and serves as an alternative to the current FedMVC assumption, making it better aligned with real-world situations.

评论

Thanks. The authors have addressed my concerns, and I will maintain my previous rating.

评论

Dear Reviewer jVcj,

We are truly thankful for your prompt reply. Your valuable feedback has significantly improved our manuscript, and we are grateful for your continued support in this process.

If you have any further suggestions for improvement, please let us know.

Best wishes,

Authors

评论

Dear Reviewer jVcj,

We sincerely appreciate your time and effort in reviewing our work. We would be grateful for further feedback or confirmation that our rebuttal has adequately addressed your comments.

Thank you again for your time and consideration.

Best regards,

Authors

审稿意见
7

This paper proposes a novel method i.e. Federated Multi-View Clustering in Heterogeneous Hybrid Views (FedCSC), which introduces a locally collaborative contrastive learning algorithm to achieve consistency between single-view and multi-view clients, thereby mitigating heterogeneity among all clients. Furthermore, a global-specific aggregation algorithm has been used to address the gaps between different views. Benchmark experiments validate the effectiveness of this approach.

优点

1: The paper introduces a novel global weighted aggregation method, encouraging the global model to learn complementary features from mixed views, demonstrating a certain level of innovation. 2: The paper introduces a novel federated multi-view learning framework that considers the scenario of a mix of single-view and multi-view clients, followed by experimental analysis. 3: The paper conducts a comprehensive theoretical analysis and validation of the proposed method. 4: The code is provided for reproduction

缺点

  1. The dataset size of 10,000 is insufficient; validating the proposed methods requires a significantly larger dataset to ensure robustness and broader applicability.

  2. The computational complexity of global-specific weighting aggregation is estimated to be high. Therefore, it may not be suitable for large-scale data tasks.

问题

1.How were αm\alpha_m and αp\alpha_p derived in Eq. 10 of this paper?

  1. In the experimental results on the MNIST-USPS dataset in Figure 8, why does the performance with 24 clients outperform that of both 16 clients and 50 clients?

局限性

Given.

作者回复

Response to Reviewer s4wh:

Q1: The dataset size of 10,000 is insufficient; validating the proposed methods requires a significantly larger dataset to ensure robustness and broader applicability.

Thanks for the suggestion. We conduct further experiments on the large-scale YoutubeVideo dataset [1], which contains 101,499 samples across 31 classes, where each sample has three views of cuboids histogram, HOG, and vision misc. Below are the clustering results of FMCSC and several comparison methods when the number of multi-view clients and single-view clients are equal (M/S=1:1M/S = 1:1):

MethodIMVC-CBG (2022)DSIMVC (2022)ProImp (2023)FedDMVC (2023)FCUIF (2024)FMCSC (Ours)
ACC18.3215.0122.4521.5223.0426.42
NMI11.838.1117.4816.9618.4620.74
ARI2.041.203.433.423.725.82

The YoutubeVideo dataset is 10 times larger than the Multi-Fashion dataset with 10,000 samples. The results demonstrate that FMCSC adapts well to large-scale datasets and outperforms other methods, ensuring the proposed method's robustness and broader applicability.

[1] Omid Madani, Manfred Georg, and David A. Ross. On using nearly-independent feature families for high precision and confidence. Machine Learning, 92:457–477, 2013.

Q2: The computational complexity of global-specific weighting aggregation is estimated to be high. Therefore, it may not be suitable for large-scale data tasks.

Thank you for raising your concerns. We specifically analyze the computational complexity of global-specific weighting aggregation as follows.

Eq. (10) shows the detailed process of global-specific weighting aggregation performed by the server. During this process, the server receives MM local model parameters capable of handling multi-view data and (VM+S)(VM + S) local model parameters capable of handling a single specific view. We define the number of these local model parameters as NmN_m and NpN_p, respectively. When aggregating to obtain fg(;w)f_{g}(\cdot; \mathbf{w}), the server performs a weighted sum of the MM multi-view client parameters, resulting in a computational complexity of O(NmM)O(N_m M). For VV models fgv(;wv)f_{g}^v(\cdot; \mathbf{w}^{v}), the server performs a weighted sum of the single specific view models, with a complexity of O(VNmM+NpS)O(VN_m M + N_p S). Therefore, the total computational complexity is O(NmM+VNmM+NpS)O(N_m M + VN_m M + N_p S).

Through this calculation, we observe that the computational complexity is related to the number of local model parameters and the number of participating clients. Table 5 presents the number of parameters per client for different datasets, such as 3.4M-10.1M for the Multi-Fashion dataset, corresponding to our definitions of NmN_m and NpN_p, while the number of participating clients is typically below 100. The computational complexity derived from the above analysis is considered acceptable and confirms that our proposed method applies to large-scale data tasks, as addressed in the response to Q1. Additionally, Table 5 reports the running time of the proposed method on all datasets, e.g., 763.8s for the MNIST-USPS dataset, indicating low overall time complexity.

Q3: How were αm\alpha_m and αp\alpha_p derived in Eq. 10 of this paper?

Theorem 1 shows that optimization objectives in multi-view and single-view clients can be measured by different mutual information metrics, with proximity to these objectives reflecting model quality. Based on this, we use mutual information to evaluate the quality of models from multi-view and single-view clients. These metrics are calculated locally and sent to the server with the model parameters. The server assigns aggregation weights based on these values. Specifically, in Eq. (10), αm\alpha_m are derived by normalizing v=1VI(H,Hv)\sum_{v=1}^{V}I\left(\mathbf{H}, \mathbf{H}^{v}\right), which are calculated by fm(;wm)f_{m}\left(\cdot; \mathbf{w}_m\right) from different multi-view clients.

αp\alpha_p are derived by normalizing I(H,Hg)I(H,Zv)I\left(\mathbf{H}, \mathbf{H}^{g}\right) - I\left(\mathbf{H}, \mathbf{Z}^{v}\right), which are calculated by fp(;wpv)f_{p}\left(\cdot; \mathbf{w}_p^{v}\right) from different single-view clients. Higher mutual information values indicate better model quality, leading to higher weights during aggregation and achieving high-quality global aggregation.

Q4: In the experimental results on the MNIST-USPS dataset in Figure 8, why does the performance with 24 clients outperform that of both 16 clients and 50 clients?

Thank you for your detailed observation. We believe that clustering performance is closely related to the number of clients, but it does not exhibit a simple linear relationship. For example, results on the BDGP dataset in Figure 8 show that the performance with 12 clients outperforms that of both 8 clients and 16 clients.

This phenomenon has two main reasons. First, a moderate increase in the number of clients promotes diversity in local models, benefiting the high-quality aggregation of the global models. Second, with a fixed number of samples, significantly increasing clients makes the data for each client sparse, negatively affecting local model training and clustering performance. The experimental results in Figure 8 highlight this issue: when the number of clients reaches a critical point, the severe insufficiency of samples within each client leads to a significant decline in FMCSC's clustering performance. This indicates that global clustering performance is highly correlated with the training quality of local models. This finding also motivates us to further explore high-quality training with few samples per client, encouraging more devices to participate in training and collaboration.

评论

The authors have addressed the proposed weaknesses and questions; therefore, I will raise my score.

评论

Dear Reviewer s4wh,

Thank you for your prompt response and for reconsidering your evaluation. We genuinely appreciate your insightful feedback and the effort you have put into helping us refine our work. Your contributions have been invaluable in improving the quality of our paper.

If there is anything further you believe could be refined or enhanced, please let us know.

Best wishes,

Authors

评论

Dear Reviewer s4wh,

We sincerely appreciate your time and effort in reviewing our work. We would be grateful for further feedback or confirmation that our rebuttal has adequately addressed your comments.

Thank you again for your time and consideration.

Best regards,

Authors

审稿意见
7

This paper proposes a novel Federated Multi-View Clustering method capable of handling heterogeneous hybrid views. By designing local-synergistic contrastive learning and global-specific weighting aggregation, the proposed method explores clustering structures across different clients. The effectiveness of the proposed method is demonstrated both theoretically and empirically.

优点

  1. The motivation behind the paper is clear, and the proposed heterogeneous hybrid view scenario is more applicable to real-world situations compared to other FedMVC methods.
  2. The paper conducts extensive experiments, demonstrating the effectiveness of the proposed method.

缺点

  1. Several key observations are presented in Section 3.2 on cross-client consensus pre-training; however, the purpose of these observations is not very clear.
  2. The paper describes the transfer of multiple model parameters between the client and the server, but does not analyze the communication overhead.
  3. There are some inaccuracies in the descriptions, such as "select 10 state-of-the-art methods" in Lines 239 and 578, which should be 9?

问题

Please see ‘Weaknesses’.

局限性

The authors addressed the limitations and potential negative societal impact of their work.

作者回复

Response to Reviewer 14Lq:

We sincerely appreciate your constructive comments and suggestions.

Q1: Several key observations are presented in Section 3.2 on cross-client consensus pre-training; however, the purpose of these observations is not very clear.

Thank you for your feedback. In Section 3.2, we have two key observations: (a) the presence of single-view clients exacerbates the issue of model drift; (b) the absence of uniformly labeled data across all clients leads to the reconstruction objective of autoencoders optimizing from multiple different directions, resulting in model misalignment.

For observation (a), we use the local-synergistic contrastive learning designed in Section 3.3, which helps single-view clients bridge client gaps and mitigate model drift. Theorem 2 demonstrates the effectiveness of this strategy by analyzing the generalization bounds of the proposed method. For observation (b), we propose cross-client consensus pre-training to align the local models on all clients and avoid their misalignment. Figure 2 visualizes the model outputs, further quantifying the impact of model misalignment and the effectiveness of our proposed strategy. Additionally, the ablation study results in Table 2 show that the proposed strategy plays a crucial role in our training process, facilitating consensus among clients during pre-training, effectively alleviating model misalignment, and accelerating convergence. We will revise our manuscript to make this point clearer.

Q2: The paper describes the transfer of multiple model parameters between the client and the server, but does not analyze the communication overhead.

Below, we use the MNIST-USPS dataset as an example to calculate the total communication overhead required by FMCSC. Table 5 reports the number of parameters per client by FMCSC. Suppose the data are distributed among 24 clients, with an equal number of multi-view and single-view clients. In this case, in each communication round, the total data volume that all clients need to transmit to the server is 485.6MB, and the data volume that the server needs to transmit to all clients is 54MB. Due to the proposed cross-client consensus pre-training strategy accelerating convergence, the communication rounds are set to 5. Therefore, the total communication overhead required to reach convergence for this dataset is calculated to be 2.6GB.

Similarly, we further calculate the total communication overheads required for FMCSC to reach convergence on other datasets. The results indicate that these overheads are acceptable, holding for both clients with powerful computing capabilities (such as large institutions) and lightweight clients (such as mobile devices).

DatasetMNIST-USPSBDGPMulti-FashionNUSWIDE
Communication overhead2.6GB1.5GB6.4GB4.4GB

Q3: There are some inaccuracies in the descriptions, such as "select 10 state-of-the-art methods" in Lines 239 and 578, which should be 9?

Thank you. We will correct the mistakes and further polish our manuscript.

评论

Dear Reviewer 14Lq,

We sincerely appreciate your time and effort in reviewing our work. We would be grateful for further feedback or confirmation that our rebuttal has adequately addressed your comments.

Thank you again for your time and consideration.

Best regards,

Authors

评论

Dear authors, Thanks for your work! Besides the reviews, I have the following concerns. There are extensive studies on heterogenous hybrid views in multi-view clustering. The authors should demonstrate the superity over the exisitng methods.

评论

Dear AC,

We sincerely appreciate your time and effort in reviewing our work.

FedMVC is an emerging field that has developed over the past two years, driven by the need for privacy-preserving and the challenges of working with unlabeled multi-view data. Compared to existing MVC methods, FedMVC enables collaborative training of consistent clustering models across clients without exposing private data. For instance, our proposed method ensures privacy through two key mechanisms: 1) sharing model parameters among participating clients, with private data remaining locally stored; 2) differential privacy techniques are employed to further enhance privacy protection, as shown in Figure 9. It is important to note that FedMVC methods generally assume that multi-view data are distributed across multiple clients, allowing them to collaborate in learning. In contrast, traditional MVC methods require data to be collected and stored in a single entity, which incurs significant costs and privacy risks.

We consider heterogeneous hybrid views as a scenario present in FedMVC. The "heterogeneity" refers to the differences among clients, where single-view and multi-view clients coexist, corresponding to the client gaps mentioned in the paper. The "hybrid views" indicate the uncertainty in the number and quality of views involved in training, corresponding to the view gaps. As discussed in the related work, existing FedMVC methods typically assume that clients are isomorphic and belong to either single-view clients or multi-view clients, which makes them incapable of handling heterogeneous clients. Additionally, most centralized multi-view clustering methods assume that multi-view data are stored within a single entity, thus lacking the concept of heterogeneous clients. In summary, current research on multi-view clustering only addresses the "hybrid views" scenario we mentioned.

In our experimental setting, we define the ratio of multi-view clients to single-view clients to reflect this heterogeneity among clients. However, since existing solutions are not designed to accommodate the heterogeneous hybrid views scenario, we adopt the comparative strategy mentioned in Appendix D.3. In this approach, we simplify the heterogeneous hybrid views scenario in our paper into a hybrid views scenario. Specifically, we concatenate the data distributed among the clients and use them as input for centralized methods. The data from multi-view clients can be considered complete, while the data from single-view clients can be regarded as incomplete. We compare our method with seven state-of-the-art incomplete multi-view clustering methods: HCP-IMSC (2022), IMVC-CBG (2022), DSIMVC (2022), LSIMVC (2022), ProImp (2023), JPLTD (2023), and CPSPAN (2023). The results, shown in Table 1, demonstrate that our proposed method outperforms these approaches across multiple datasets. It's worth noting that in the reported results, our method operates under the heterogeneous hybrid views scenario, whereas the other comparative methods operate under the hybrid views scenario. Although existing solutions can bypass the challenge of heterogeneous clients by simply concatenating data, the exposure of raw data, due to privacy concerns, may cause more data owners to refuse to participate in collaborative training. In contrast, our method can extract complementary clustering structures across clients without exposing their raw data, offering better privacy protection and performance improvement than current state-of-the-art methods.

Thank you again for your time and review. We hope the above response addresses your concerns.

Best regards,

Authors

最终决定

This paper proposes a federated multi-view clustering method for handling heterogeneous hybrid views. It introduces local-synergistic contrastive learning to achieve consistency between single-view and multi-view clients, and eliminates the gaps between views via a global-specific weighting aggregation strategy. Experimental results and theoretical analysis demonstrate its effectiveness and strengthes. After rebuttal, all reviewers give postivie and recongrize its novelty, good-structure, extensive experiments, and detailed proof. Based on these, I recommend accepting this paper.