Unsupervised Federated Graph Learning
摘要
评审与讨论
In this paper, the authors proposed a novel method for unsupervised federated graph learning. Specifically, the proposed FedPAM uses a set of anchors to explore the global rep resentation space, then use them to align the local representation spaces. To capture the high-order correlation among multiple clients, the low-rank tensor decomposition is adopted to enhance the consistency between multiple clients. Finally, extensive experiments are conducted to verify the effectiveness for the proposed FedPAM.
优缺点分析
Strengths:
- The idea is interesting, using a set of learnable anchors to locate the global space, then adopting them to align local representation spaces, which is appealing.
- The motivation is well demonstrated by Fig. 1, which verify that the local representation spaces are not aligned and the conventional federated aggregation is ineffective.
- The paper is well written and the readers can easily follow.
Weaknesses:
- The motivation of using low-rank tensor decomposition is not clarified.
- Additionally transfer the anchors can increase the communication burden.
- The analysis for the experimental results are insufficient.
问题
- The authors should explain why use the low-rank tensor decomposition for obtaining the global model weights.
- Fig. 4 shows that the proposed FedPAM will degrade when the number of anchors increase. However, according to my understanding, more anchors, more accurate for the global space. Hence, the reason should be elaborated.
- For an algorithm, the complexity analysis is essential. Hence, the complexity analysis should be provided.
局限性
Yes
最终评判理由
This paper proposed a novel method for unsupervised FGL, the proposed RSA and AGPL modules guaranteed the effective of FedPAM by a large number of experiments. The added generalization and complexity analysis provided a comprehensive understanding for the proposed FedPAM. Likewise, the added experiments further verify the superiority and flexibility for FedPAM. Hence, it's suggested to be accepted.
格式问题
None
R1 to Q1: Thank you for this valuable comment. There are two main reasons for using low-rank tensor decomposition. First, most federated aggregation methods perform parameter aggregation at the matrix level and cannot capture high-order correlations between different clients. The so-called high-order correlation refers to the consistency and complementary information across clients, and low-rank tensor decomposition is used to enhance information fusion between clients. Second, the client weight calculation method based on the number of samples often cannot truly reflect the actual degree of client contribution. For example, although some clients have a large number of nodes, they might be extremely heterogeneous and cannot play a positive role in the optimization direction of the global model. Therefore, an adaptive weight learning method is needed.
R2 to Q2: Thanks for this insightful comment. We expect to learn a set of representative anchors and reduce the running cost as much as possible, so the number of anchors should be small. Since we do not impose any constraints between anchors, for example, different anchors should be as dissimilar as possible. Therefore, too many anchors may cause the collapse of the anchor space, which is not conducive to the alignment of the representation space across clients.
R3 to Q3: Thanks for this meaningful comment. The computation complexity mainly originates from two aspects: the FGW-OT for aligning representation spaces and the low-rank tensor optimization for adaptively learning global model parameters. For the FGW-OT, the computation of (an intermediate variable) takes , where denotes the number of edges, denotes the number of anchors, and denotes the number of nodes. Notably, the adjacency matrices are often sparse, so the computational complexity can be significantly decreased. Likewise, the OT based on Sinkhorn algorithm takes . Then, the FGW-OT takes . For the low-rank tensor optimization, the update of with occupies the dominant computational complexity. First, the FFT and the inverse FFT require , where and are the dimensions for the -th layer's parameters, is the number of clients, and is the higher dimension. Second, the T-SVD costs . Then, the low-rank tensor optimization takes . Overall, the computational complexity for the proposed FedPAM is .
Thanks to the authors' reply, my question has been answered. Furthermore, the other reviewers provided many insightful and interesting comments, which have been basically addressed by the authors, in my opinion. Therefore, I decide to raise the score.
Thanks for your valuable comments and checking our responses. We will further improve the quality in the future!
This paper addresses unsupervised settings in graph federated learning and defines two main challenges: 1) the misalignment of local representations due to the lack of a common signal (i.e., a supervised label), and 2) the difficulty of addressing heterogeneity across clients when aggregating local parameters on the server. The authors propose a global anchor set to align the representation space across clients, and a low-rank-based weighting mechanism for parameter aggregation. The authors provide a theoretically solid method supported by well-designed experiments.
优缺点分析
Strengths.
S1. The paper is well-written and easy to follow, with the authors presenting their ideas clearly.
S2. The paper correctly identifies representation space misalignment as a critical challenge in unsupervised graph federated learning. The proposed solution for this, which leverages a global anchor set, is intuitive and theoretically sound.
S3. The experimental design is well-organized.
Weaknesses.
W1. The computational cost of the proposed method is a potential concern, given its reliance on both optimal transport and tensor decomposition, which are known to be computationally intensive operations.
W2. Since can be considered as a non-common semantics across other clients' parameter tensor set, it would be more informative if the authors address the potential privacy leak issues from this point.
W3. It would be more helpful to address the potential privacy risks that arise when a client's weight becomes extremely low. Since a low weight signifies that the client is a distinct outlier, this information could potentially be used to infer specific properties about that client.
Questions.
Q1. The parameter sensitivity study in Figure 5 shows a strong positive correlation between performance and the hyperparameter , which weights the alignment loss . This steep trend raises an interesting question about the trade-off between the representation learning from and the representation alignment from . While is fundamentally necessary to prevent representation collapse in an unsupervised setting, the results suggest its relative contribution can be small as long as a strong alignment signal exists. It would be insightful if the authors could provide further analysis on this trade-off. For example, is there a performance plateau or even a decrease at much higher values? More importantly, what is the minimum relative weight of the SSL loss required to maintain stable and non-trivial representations?
Q2. The AGPL module is presented as a novel method for adaptive parameter aggregation that tackles the shortcomings of standard federated averaging. To better isolate and demonstrate its effectiveness as a generalizable component, it would be highly compelling to see an experiment where AGPL is applied as an "add-on" to other baseline methods, such as FedAvg. Such an experiment would provide clearer evidence of AGPL's standalone contribution, independent of the RSA module, and strengthen the paper's claims about its utility.
问题
See Weaknesses and Questions above.
局限性
yes
最终评判理由
The rebuttal has addressed my concerns well. My primary concern was the need to demonstrate the isolated effectiveness of the AGPL module, and the authors have successfully shown this in their response, particularly in relation to Table 3. This clarification was crucial. As my concerns are now resolved, I lean towards accepting this paper.
格式问题
N/A
R1 to W1: Thank you for this good comment. The computation complexity mainly originates from two aspects: the FGW-OT for aligning representation spaces and the low-rank tensor optimization for adaptively learning global model parameters. For the FGW-OT, the computation of (an intermediate variable) takes , where denotes the number of edges, denotes the number of anchors, and denotes the number of nodes. Notably, the adjacency matrices are often sparse, so the computational complexity can be significantly decreased. Likewise, the OT based on Sinkhorn algorithm takes . Then, the FGW-OT takes . For the low-rank tensor optimization, the update of with occupies the dominant computational complexity. First, the FFT and the inverse FFT require , where and are the dimensions for the -th layer's parameters, is the number of clients, and is the higher dimension. Second, the T-SVD costs . Then, the low-rank tensor optimization takes . Overall, the computational complexity for the proposed FedPAM is .
In addition, we also report the running time on Cora, CiteSeer, and PubMed in Table 1. We can see that the proposed FedPAM does indeed have lower runtime efficiency than most comparison algorithms, but in order to achieve satisfactory results, a longer runtime is necessary. At the same time, it is worth noting that compared to FedU, FedPAM is still acceptable in terms of runtime.
Table 1: Comparison of running time (seconds) on three datasets.
| SSL | Method | Cora | CiteSeer | PubMed |
|---|---|---|---|---|
| Simsiam | FedAvg | 47.72 | 49.23 | 48.32 |
| FedProx | 58.98 | 59.72 | 60.50 | |
| MOON | 67.28 | 67.39 | 66.84 | |
| FedU | 2040.83 | 2041.00 | 2086.69 | |
| FedPAM | 710.12 | 711.36 | 712.73 | |
| SimCLR | FedAvg | 41.77 | 42.00 | 41.43 |
| FedProx | 50.16 | 50.18 | 50.26 | |
| MOON | 66.03 | 66.41 | 67.23 | |
| FedX | 103.54 | 92.16 | 105.75 | |
| FedU | 2092.78 | 2095.67 | 2102.21 | |
| FedPAM | 582.62 | 602.36 | 605.89 | |
| BYOL | FedAvg | 62.14 | 62.02 | 62.37 |
| FedU | 62.10 | 63.78 | 62.70 | |
| FedEMA | 63.33 | 62.45 | 63.00 | |
| Orchestra | 90.61 | 86.62 | 155.13 | |
| FedU | 2137.52 | 2136.59 | 2207.17 | |
| FedPAM | 828.67 | 826.81 | 830.01 |
R2 to W2: Thank you for this interesting suggestion. Privacy protection is an important issue in federated learning, where contains client-specific information. Once the centralized server is attacked, this may cause client privacy leakage. The current tensor decomposition method has an explicit bias tensor . Therefore, in order to protect the client's privacy information from being exposed, we can use the tensor Tucker decomposition to decompose a core (low-rank) tensor and a set of factor (bias) matrix. The mathematic form is written as
,
where , , and denote the original tensor, the low-rank tensor, and a set of factor matrix. Since the factor matrices integrate the information of multiple clients, they will not expose client-specific information. Limited by the rebuttal time, we will further investigate a novel method for unsupervised FGL with strict privacy protection in the future.
R3 to W3: Thanks for this insightful comment. In federated learning, low client weights can lead to privacy leaks (for example, by inferring client data distribution or sample information through reverse engineering). To avoid this problem, low-weight clients can be clustered with similar clients based on their data distribution or model parameter similarity to form virtual "super clients." Parameters are first aggregated within the group, and then the group participates in global aggregation, masking individual contributions. Clients within the group share the aggregated weights, preventing individual clients from having too low a weight. This is a very interesting research topic that we will further explore in future work.
R1 to Q1: Thank you for this interesting question. We continue to increase the value of to test whether FedPAM will encounter a performance bottleneck. The experimental results are reported in Table 2. We can see that when rises to 500 or 600, FedPAM encounters a performance bottleneck. However, it is worth noting that a larger value of is conducive to achieving satisfactory performance, thanks to the global anchor-guided representation space alignment. From the experimental results, to achieve satisfactory results, the weight of relative to SSL weight should be at least 10.
Table 2: Performance comparison with different values of .
| Dataset | |||||||
|---|---|---|---|---|---|---|---|
| Cora | 62.83 | 63.29 | 65.92 | 64.96 | 67.65 | 67.56 | 37.10 |
| CiteSeer | 43.90 | 46.59 | 50.64 | 52.65 | 53.24 | 36.30 | 36.52 |
R2 to Q2: Thank you for this good advice. In essence, the proposed AGPL module is a plug-and-play component. To verify its effectiveness, we stitch the AGPL module with the existing federated learning algorithm. Notably, we focus on testing the federated learning algorithm without special aggregation methods on the server side. The experimental results are reported in Table 3. We can see that when the existing federated learning algorithm is combined with the proposed AGPL module, the performance will be improved, which illustrates the effectiveness of the AGPL module.
Table 3: Performance comparison when the proposed AGPL is combined with different FL methods.
| SSL | Method | Cora ACC | Cora Fscore | CiteSeer ACC | CiteSeer Fscore | Ogbn-Arxiv ACC | Ogbn-Arxiv Fscore |
|---|---|---|---|---|---|---|---|
| Simsiam | FedAvg+AGPL | 57.38 | 47.21 | 42.27 | 30.42 | 35.58 | 21.32 |
| FedProx+AGPL | 58.25 | 48.02 | 42.75 | 30.68 | 35.22 | 21.01 | |
| MOON+AGPL | 56.70 | 46.56 | 41.02 | 28.83 | 35.58 | 21.33 | |
| FedPAM | 61.40 | 52.85 | 49.53 | 38.73 | 35.97 | 23.07 | |
| SimCLR | FedAvg+AGPL | 56.65 | 44.18 | 41.35 | 27.80 | 47.65 | 35.45 |
| FedProx+AGPL | 56.86 | 44.43 | 41.58 | 27.93 | 46.97 | 35.86 | |
| MOON+AGPL | 55.81 | 43.83 | 42.10 | 28.12 | 47.80 | 36.67 | |
| FedX+AGPL | 57.53 | 44.96 | 42.86 | 30.23 | 48.01 | 36.95 | |
| FedPAM | 58.62 | 45.21 | 43.90 | 31.61 | 48.19 | 37.25 |
Thank you for the rebuttal, which effectively addresses my concerns, particularly regarding Table 3. I hope this clarification will be incorporated into the final version of the paper. I have raised my score.
Many thanks for provding a lot of insightful suggestions and checking our responses. Thank you for your recognition of our work, which is the greatest motivation for us to move forward! We will add Table 3 in the revised version.
This paper focuses on unsupervised federated graph learning. The authors propose a tailored framework named FedPAM consisting of two components: Representation Space Alignment (RSA) and Adaptive Global Parameter Learning (AGPL). Experiments on eight graph datasets to demonstrate the superiority of the proposed model compared with other baselines.
优缺点分析
Strengths
S1: The studied problem is important and promising.
S2: The authors conduct experiments on eight graph datasets to evaluate the performance of the proposed method.
Weaknesses
W1: GNN formulation. The computation of a GNN in Eq. (1) looks questionable. The authors may check how previous studies formulate the forward pass of GNNs.
W2: Using the product to measure the similarity between the anchors and nodes seems problematic. The product will depend on ’s and 's norms as well. A better metric is cosine similarities.
W3: Notations. A letter should not denote different variables in the paper. For example, in Eq. (8) and Eq. (16); to denote the adjacency matrix, the relationships between anchors, and the optimal global GNN parameters.
W4: Self-supervised learning strategies. The authors adopt Simsiam, SimCLR, and BYOL. It is quite weird not to use graph self-supervised learning methods.
问题
Q1: What is the meaning of in Eq. (17)?
Q2: Could the authors explain the intuition of Eq. (17)?
Q3: How do we compute ?
Q4: What is the value of in the experiments? How does it affect the performance of the proposed method?
Q5: In Table 1 and Table 2, only some baselines are included. Could the authors explain why?
局限性
Yes
最终评判理由
After the rebuttal, my concerns are addressed by the authors' response. The authors are expected to modify the submission accordingly in the revised version of their paper.
格式问题
N/A
R1 to W1: Thank you for this rigorous advice. We found that the current form of writing is somewhat unclear. After checking previous work, we changed the forward propagation formula to the following form:
,
where denotes the activation function, denotes the aggregation method, denotes the neighborhood node set for the -th node.
R2 to W2: Thanks for this good question. Using cosine similarity is a more common method. In Eq. (11), we also use cosine similarity to calculate the similarity between \mathbf{P\}_{k} and , and then compare the experimental results of different calculation methods in Table 1. The difference in results between using the product and using cosine similarity is not significant. This is because entropy regularization smooths the solution space in the OT calculation, which has a certain normalization effect. Therefore, using the product calculation can still guarantee satisfactory results.
Table 1: Performance conparison with different similarity calculation methods.
| SSL | Method | Cora (ACC) | Cora (Fscore) | CiteSeer (ACC) | CiteSeer (Fscore) |
|---|---|---|---|---|---|
| Simsiam | FedPAM-Product | 61.40 | 52.85 | 49.53 | 38.73 |
| FedPAM-Cosine | 61.85 | 53.12 | 49.89 | 39.31 | |
| SimCLR | FedPAM-Product | 58.62 | 45.21 | 43.90 | 31.61 |
| FedPAM-Cosine | 58.93 | 45.83 | 43.90 | 31.61 |
R3 to W3: Thanks for this rigorous comment. We will further improve the notations in the following revisions.
R4 to W4: Thank you for this good comment. It is worth noting that Simsiam, SimCLR, and BYOL are the three most widely used self-supervised learning methods, applicable in almost all fields. In the field of graphs, the augmented view is achieved by perturbing the edges and masking some feature elements, while the contrast loss is no different, such as work [1] and work [2]. Of course, there are some more complex graph self-supervised learning methods, however, our focus is on how to achieve unsupervised joint modeling in distributed graph scenarios, and local contrast loss is not the focus of research.
[1] Deep Graph Contrastive Representation Learning
[2] Deep Graph Infomax
R1 to Q1: Thanks for this rigorous comment. is the -th slice of low-rank tensor corresponding to the parameters of the kth client.
R2 to Q2: Thanks for this good comment. The original intention behind the optimization objective of Eq. (17) comes from two aspects: The original intention behind the optimization objective of Eq. (17) stems from two aspects: First, matrix-based aggregation methods cannot capture higher-order correlations between different clients, and the complementary information between clients cannot be fully utilized. Therefore, we stack the parameters of different clients into a third-order tensor and use low-rank tensor decomposition in high-dimensional space to explore the complementarity between clients. Second, traditional federated aggregation performs weighted aggregation based on sample size, which cannot adaptively learn optimal parameters. Therefore, we aim to learn a set of adaptive client weights while adaptively aggregating model parameters from different clients to obtain optimal global model parameters. Based on these two aspects, we establish a joint optimization loss, as shown in Eq. (17).
R3 to Q3: Thanks for the good question. When is updated, the other variables are viewed as constants, the subproblem is formulated as
For the -norm subproblem, we adopt the solution proposed in the work [3], then obtain the update rule for , the details can be referred in [3].
[3] The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices
R4 to Q4: Thanks for this valuable comment. is a constant to smooth the distribution of client weights, the larger the value of , the smoother the client weights, otherwise the sharper they are. it is set to 2 on all datasets. We test the FedPAM's performance with differen in Table 2, it can be seen that most of the experimental results are best when , which shows that a slightly smooth client weight distribution is conducive to achieving considerable results. A smooth client weight distribution is conducive to balancing the information of multiple clients, while a sharp client weight distribution will amplify the influence of some clients, resulting in biased fusion.
Table 2: Performance compasiron (ACC %) with different values of .
| SSL | Dataset | r=0.5 | r=1 | r=2 | r=5 | r=10 |
|---|---|---|---|---|---|---|
| Simsiam | Cora | 58.82 | 61.56 | 61.40 | 60.21 | 57.89 |
| CiteSeer | 46.91 | 48.52 | 49.53 | 47.21 | 47.10 | |
| SimCLR | Cora | 57.92 | 58.35 | 58.62 | 58.02 | 58.02 |
| CiteSeer | 41.92 | 44.05 | 43.90 | 42.12 | 41.56 | |
| BYOL | Cora | 64.92 | 65.56 | 66.01 | 65.21 | 64.23 |
| CiteSeer | 47.10 | 48.32 | 48.81 | 47.95 | 46.92 |
R5 to Q5: Thanks for this valuable comment. First, unsupervised federated learning is still an unexplored field, with limited works and few available comparison methods. Second, we refer to the baseline setting method of unsupervised federated learning work [5] and conduct three types of comparisons based on model architecture. When using different SSL losses, the comparison methods used are also different, which is determined by the model design. For example, FedU, FedEMA, and Orchestra are based on the architecture with online network and target network. Therefore, when the SSL loss is Simsiam or SimCLR, these methods are not applicable. MOON uses the output of the global model to calibrate the output of the local model, and is also not applicable to the online network and target network architecture. In summary, when using different SSL losses, only some baselines are available.
[5] Rethinking the representation in federated unsupervised learning with non-iid data
Thanks for your response. The new experimental results are helpful. I am willing to raise my score to borderline accept. However, I still have the following unsolved concerns.
-
W1: It seems that there should be a learnable layer before the activation function in Equation (1); otherwise, the representations at each layer of a GNN will have to be of the same dimension.
-
W2: Thanks for the new results. Double-check if the values in Table 1 are not mistaken.
-
W4: I still think graph self-supervised learning methods should be used in the experiments since they can be seamlessly adopted on graph data. I understand running experiments are onerous during the intensive rebuttal period, so I encourage the authors to run them in the future.
BTW, the authors mentioned several times that my comments are rigorous. I am not sure if I have an overhigh expectation for a NeurIPS submission. Everytime I start to read a paper, I always assume that the authors are trying to make it readable. I think correct preliminaries, proper notations, and clear definitions are the basic requirements for a research paper. They encourage readers to read and follow your methodology.
Best,
Reviewer pPE4
We are very grateful for your valuable comments and review of our responses. These comments are crucial to improving the quality of the manuscript, we will continue to improve the formulation, notation, and presentation of the paper to provide readers with a better reading experience. The following is the responses to your new questions.
Response to Q1: Thank you very much for the inspired comment. In the code implementation, the calculation of the learnable layer before the activation function is necessary, but it is not reflected in the general forward calculation formula. With your reminder, we found it was not sufficiently accurate and rewrite it as
where denotes the activation function, denotes the -th network layer parameterized by , the is the -th node's representation at the -th layer, is the neighborhood node set for node , denotes the embedding fusion manner.
Response to Q2: Thank you very much for this comment. We have checked the experimental results. The reason why the improvement brought by using cosine similarity is not high is that in the OT calculation, due to the introduction of entropy regularization, the solution space of the learnable anchors has a certain regularization effect, so the improvement effect is limited. However, we also have to admit that cosine similarity is indeed better than product. In subsequent improvements, we will still give priority to using the cosine similarity.
Response to Q3: Many thanks for this constructive suggestion. We recognize that it is necessary to use specialized graph self-supervised learning methods as local losses. Therefore, we use the classic graph self-supervised learning method, DGI (Deep Graph Infomax [1]), for testing. DGI encodes node features using a GNN to generate local node embeddings, aggregates node embeddings through a readout function to generate a global graph representation, and learns node embeddings by contrasting the consistency of local node representations with the global graph representation. Table 1 reports the experimental results using DGI on two datasets. It can be seen that using DGI is beneficial to improving the results of most federated learning algorithms. However, the proposed FedPAM is still in the leading position, which further illustrates the effectiveness of the proposed method.
Table 1: Performance comparison using DGI as SSL method.
| SSL | Method | Cora (ACC) | Cora (Fscore) | CiteSeer (ACC) | CiteSeer (Fscore) |
|---|---|---|---|---|---|
| DGI | FedAvg | 56.74 | 43.47 | 38.34 | 24.03 |
| FedProx | 58.49 | 45.26 | 39.50 | 25.89 | |
| MOON | 57.79 | 44.53 | 37.97 | 23.69 | |
| FedU² | 59.88 | 49.84 | 41.85 | 28.58 | |
| FedPAM | 62.06 | 50.86 | 44.40 | 33.16 |
[1] Deep Graph Infomax
This paper proposes a new framework, FedPAM, to tackle unsupervised federated graph learning (FGL). The authors identify two major challenges in this setting: (1) the difficulty in aligning the representation spaces across clients due to lack of shared semantics, and (2) the limitations of conventional averaging in aggregating heterogeneous local models. To address these, FedPAM integrates two core components: (1) Representation Space Alignment (RSA): Uses learnable global anchor points and fused Gromov-Wasserstein Optimal Transport to align local client embeddings into a shared global space. (2) Adaptive Global Parameter Learning (AGPL): Stacks local model parameters as tensors and applies low-rank tensor optimization to adaptively fuse them, capturing higher-order correlations among clients. The framework is evaluated on eight graph datasets with three types of self-supervised losses. The results show consistent performance improvements over existing federated and unsupervised baselines.
优缺点分析
Strengths:
-
The paper focuses on a novel unsupervised federated graph learning setting, which is a real need for privacy-preserving, scalable graph learning.
-
Experiments are conducted on eight diverse graph datasets, using three SSL losses, and compared against a wide range of baselines. Ablation studies, parameter sensitivity, and visualizations are included to support the method’s effectiveness and robustness.
-
The paper is overall easy to follow and well-organized.
Weaknesses:
-
Both FGW-OT and tensor decomposition with ADMM are computationally expensive. This limits the implementation of FedPAM to large-scale FGL systems.
-
The proposed challenges of “Inability to align the representation spaces of various clients” and “Unable to adaptively learn the optimal parameters of global model” fail to capture the unique characteristics of the unsupervised FGL problem. The experimental settings also follow the normal graph partitioning strategies in previous supervised FGL problems. It would be helpful if the authors could validate the challenges in the experiment section.
-
It seems that both the proposed alignment and aggregation techniques are independent with the unsupervised learning nature of the problem. Can we use them in normal supervised FGL systems?
-
It would be helpful to add experimental results of the complexity or time cost of FedPAM.
-
The results are reported as point estimates without error bars or statistical testing, which limits confidence in the observed gains.
问题
Please see Weaknesses
局限性
Yes
最终评判理由
The additional experiments and clarifications helped address some of my concerns, particularly regarding scalability and the broader applicability of the proposed components. While I still find the motivation somewhat generic to the unsupervised FGL setting.
格式问题
NA
R1 to W1: Thank you for this valuable comment. We test the proposed FedPAM with more clients. The experimental results are reported in Tables 1 and 2. We can see that FedPAM can still maintain its leading position in large-scale client scenarios. Although FGW-OT and low-rank tensor decomposition bring a large amount of computation, it is worth it.
Table 1: Performance comparison with 50 clients, where the SimCLR is employed in local training.
| Method | PubMed (ACC) | PubMed (Fscore) | OA (ACC) | OA (Fscore) |
|---|---|---|---|---|
| FedAvg | 73.57 | 65.15 | 43.59 | 31.57 |
| FedProx | 70.42 | 59.40 | 40.83 | 28.98 |
| MOON | 70.44 | 59.45 | 42.78 | 30.44 |
| FedX | 70.43 | 59.43 | 47.81 | 34.83 |
| FedU | 72.65 | 63.60 | 44.07 | 30.62 |
| FedPAM | 74.29 | 67.31 | 49.85 | 38.34 |
Table 2: Performance comparison with 100 clients, where the SimCLR is employed in local training.
| Method | PubMed (ACC) | PubMed (Fscore) | OA (ACC) | OA (Fscore) |
|---|---|---|---|---|
| FedAvg | 68.21 | 56.76 | 37.90 | 26.01 |
| FedProx | 68.23 | 56.78 | 36.18 | 24.13 |
| MOON | 68.08 | 56.62 | 38.84 | 26.95 |
| FedX | 69.27 | 58.25 | 47.04 | 33.18 |
| FedU | 71.82 | 62.66 | 38.08 | 26.23 |
| FedPAM | 72.36 | 65.22 | 48.58 | 37.32 |
R2 to W2: Thank you for this insightful comment. First, due to the problem of data heterogeneity, the problem of representation space misalignment exists in both supervised and unsupervised scenarios. In the unsupervised scenario, this problem is more prominent. We centrally train a GNN model on the Cora dataset to output a representation space close to the ideal. Then we train a global model through FedAvg in the supervised and unsupervised scenarios respectively. Through wasserstein distance (WD) calculation, we found that the WD between the global representation output by the supervised FedAvg and the centrally trained representation is 3.67, while the WD between the global representation output by the unsupervised FedAbg and the centrally trained representation is 9.05. Therefore, in the unsupervised scenario, the representation space misalignment is much worse. At the same time, we also illustrate the problem of representation space misalignment in the unsupervised scenario in Figure 1(a). Second, the inability to adaptively learn optimal global model parameters is a problem common in both supervised and unsupervised federated graph learning. The proposed AGPL is a plug-and-play component that can also be used in supervised federated graph scenarios. In the response to weakness 3, we conducted experiments demonstrating its effectiveness in supervised scenarios. In summary, the proposed FedPAM addresses both the specific challenges of unsupervised federated graphs and the general challenges of federated graph learning, making it a significant work.
R3 to W3: Thanks for this interesting question. In fact, both of the proposed Representation Space Alignment (RSA) and Adaptive Global Parameter Learning (AGPL) can be used in supervised FGL systems. We compared the supervised federated graph learning algorithms with the proposed FedPAM and the experimental results are reported in Table 3. In the local training, the supervised CE loss is added into training loss. Notably, RSA is used to align the representation spaces of different clients, which is especially important in unsupervised scenarios. In supervised scenarios, since the label semantics of different clients are shared, this in itself is a strong alignment signal. Therefore, when only the RSA module is used (FedPAM-RSA), the effect is only slightly improved. AGPL aims to explore high-level correlations between clients and adaptively learn optimal global parameters, which is applicable in both unsupervised and supervised scenarios. Therefore, its performance is superior to FedPAM-RSA. When using the complete FedPAM, FedPAM achieves remarkable results, even surpassing SOTA unsupervised FGL on the CiteSeer dataset.
Table: 3 Performance comparison with various supervised FGL methods.
| Method | Cora (ACC) | Cora (Fscore) | CiteSeer (ACC) | CiteSeer (Fscore) | PubMed (ACC) | PubMed (Fscore) |
|---|---|---|---|---|---|---|
| FedAvg | 73.59 | 72.04 | 68.85 | 68.09 | 82.42 | 80.20 |
| FedProx | 74.29 | 72.89 | 69.65 | 68.75 | 82.43 | 80.21 |
| MOON | 74.22 | 71.88 | 68.78 | 68.18 | 82.49 | 80.41 |
| FGSSL | 74.47 | 72.94 | 70.02 | 69.19 | 82.38 | 79.88 |
| FedPUB | 75.35 | 73.08 | 65.36 | 62.66 | 82.67 | 80.43 |
| FedTAD | 74.29 | 71.57 | 82.72 | 80.26 | ||
| FedATH | 77.90 | 76.93 | 70.32 | 69.25 | 84.06 | 82.71 |
| FedPAM-RSA | 74.10 | 72.69 | 69.22 | 68.37 | 82.56 | 80.32 |
| FedPAM-AGPL | 71.22 | 70.10 | 83.03 | 81.21 | ||
| FedPAM | 75.52 | 73.40 | 71.63 | 70.52 |
R4 to W4: Thank you for this valuable comment. The computation complexity mainly originates from two aspects: the FGW-OT for aligning representation spaces and the low-rank tensor optimization for adaptively learning global model parameters. For the FGW-OT, the computation of (an intermediate variable) takes , where denotes the number of edges, denotes the number of anchors, and denotes the number of nodes. Notably, the adjacency matrices are often sparse, so the computational complexity can be significantly decreased. Likewise, the OT based on Sinkhorn algorithm takes . Then, the FGW-OT takes . For the low-rank tensor optimization, the update of with occupies the dominant computational complexity. First, the FFT and the inverse FFT require , where and are the dimensions for the -th layer's parameters, is the number of clients, and is the higher dimension. Second, the T-SVD costs . Then, the low-rank tensor optimization takes . Overall, the computational complexity for the proposed FedPAM is .
Furthermore, we report the running time of compared methods on three datasets in Table 4. We can see that the proposed FedPAM does indeed have lower runtime efficiency than most comparison algorithms, but in order to achieve satisfactory results, a longer runtime is necessary. At the same time, it is worth noting that compared to FedU, FedPAM is still acceptable in terms of runtime.
Table 4: Comparison of running time (seconds) on three datasets.
| SSL | Method | Cora | CiteSeer | PubMed |
|---|---|---|---|---|
| Simsiam | FedAvg | 47.72 | 49.23 | 48.32 |
| FedProx | 58.98 | 59.72 | 60.50 | |
| MOON | 67.28 | 67.39 | 66.84 | |
| FedU | 2040.83 | 2041.00 | 2086.69 | |
| FedPAM | 710.12 | 711.36 | 712.73 | |
| SimCLR | FedAvg | 41.77 | 42.00 | 41.43 |
| FedProx | 50.16 | 50.18 | 50.26 | |
| MOON | 66.03 | 66.41 | 67.23 | |
| FedX | 103.54 | 92.16 | 105.75 | |
| FedU | 2092.78 | 2095.67 | 2102.21 | |
| FedPAM | 582.62 | 602.36 | 605.89 | |
| BYOL | FedAvg | 62.14 | 62.02 | 62.37 |
| FedU | 62.10 | 63.78 | 62.70 | |
| FedEMA | 63.33 | 62.45 | 63.00 | |
| Orchestra | 90.61 | 86.62 | 155.13 | |
| FedU | 2137.52 | 2136.59 | 2207.17 | |
| FedPAM | 828.67 | 826.81 | 830.01 |
R5 to W5: Thanks for this good comment. In fact, we repeated each experiment five times and reported the mean values. To save space, we did not report the variances. Here, we supplemented the variances for the Cora in Table 5. It can be seen that the variance of the FGL algorithm is relatively small, and these algorithms are generally stable.
Table 5: Performance comparison with standard deviations
| SSL | Method | Cora ACC | Cora Fscore |
|---|---|---|---|
| Simsiam | FedAvg | 54.380.45 | 39.270.37 |
| FedProx | 55.350.42 | 40.830.47 | |
| MOON | 54.030.68 | 39.810.81 | |
| Fed | |||
| FedPAM | 61.400.67 | 52.850.61 | |
| SimCLR | FedAvg | 54.650.51 | 39.950.45 |
| FedProx | 55.260.67 | 40.680.66 | |
| MOON | 54.560.44 | 39.810.47 | |
| FedX | 56.730.26 | 43.110.28 | |
| Fed | |||
| FedPAM | 58.620.66 | 45.210.57 | |
| BYOL | FedAvg | 63.010.77 | 50.510.69 |
| FedU | 64.060.81 | 51.920.77 | |
| FedEMA | 63.530.68 | 51.230.65 | |
| Orchestra | 54.471.11 | 39.630.98 | |
| Fed | |||
| FedPAM | 66.010.87 | 55.230.85 |
Dear Reviewer,
Thank you for reviewing the paper. Please respond to the authors' rebuttal and indicate whether your concerns have been addressed or not, rather than simply pressing the acknowledgement button. Your explicit feedback is important for improving the quality of the paper.
Best regards,
AC
Thank you for the detailed and thoughtful rebuttal. The additional experiments and clarifications helped address some of my concerns, particularly regarding scalability and the broader applicability of the proposed components.
Many thanks again for your valuable comments and recognition for our rebuttal, which greatly encourage us to move forward.
This paper addresses the unsupervised Federated Graph Learning (FGL) problem and proposes a novel framework named FedPAM, which consists of two key modules: (1) Representation Space Alignment (RSA), which aligns the representation spaces of local clients using learnable anchors and Fused Gromov-Wasserstein Optimal Transport (FGW-OT); and (2) Adaptive Global Parameter Learning (AGPL), which fuses local model parameters in a low-rank tensor space via tensor decomposition. The authors validate the method through extensive experiments on eight graph datasets using various self-supervised learning (SSL) strategies.
优缺点分析
Strengths:
- Novel and well-motivated problem setting: The paper focuses on unsupervised FGL, a relatively under-explored yet practically important setting where label scarcity and data heterogeneity present serious challenges.
- Well-organized methodology: The proposed method is intuitive and reasonable, including both RSA and AGPL from two perspectives.
- Clear writing and presentation: Despite the technical depth, the paper is generally well-written with clear motivation, formulation, and visual illustrations (e.g., Figure 1–3).
Weaknesses:
- Limited theoretical support: While the paper proposes technically rich modules, it lacks formal analysis (e.g., convergence, generalization bounds, or even complexity analysis in the main text), which would have strengthened the methodological contribution.
- Computational overhead and scalability unclear: The main paper lacks discussion on training cost, communication overhead, or scalability in practical federated environments.
- Out-dated baselines: Only one baseline, FedU, was proposed after 2023. More advanced baselines should be included to demonstrate the advantage of proposed method. Moreover, some remarkable related works, such as [1-3], are missing, which need to be added and discussed.
[1] Federated Contrastive Learning of Graph-Level Representations.
[2] Federated Self-Explaining GNNs with Anti-shortcut Augmentations.
[3] Subgraph Federated Unlearning.
问题
See weaknesses.
局限性
The author should include the limitation section to discuss limitations and potential negative societal impact of this work.
最终评判理由
In the rebuttal period, the authors made good clarification and improved this paper, such as the theoretical analysis. After consideration, I give a boardline acc to this work.
格式问题
None
R1 to W1: Thank you for this constructive advice. For a comprehensive understanding of the proposed FedPAM, we provide the analysis for the generalization bounds and complexity analysis. Given a FL system, its generalization bound for the global model is defined as Lemma 1.
Lemma 1. Given a FL system with global distribution and local distribution , the generalization error for any hypothesis with the probability at least () is
For the -divergence , we further have
Assume , , it can obtain
Then, we have following Theorem
Theorem 1. Given a FGL system with global distribution and local distribution , the generalization error for any hypothesis with the probability at least () is
According to the GNN's generalization analysis in [1], we can obtain the upper bound of :
where , , denote relevant constants, and measure the divergences between different topologies and features, respectively. Then, we can see that when the topological and feature differences of different client subgraphs are reduced, the generalization ability of the global model will be enhanced. The proposed FedPAM uses FGW-OT to achieve the optimal transport between the global anchor graph and each subgraph, which objectively reduces the differences between different subgraphs. Therefore, the generalization ability of the proposed FedPAM will be improved.
[1] Towards understanding generalization of graph neural networks.
Analysis for Computational Complexity: The computation complexity mainly originates from two aspects: the FGW-OT for aligning representation spaces and the low-rank tensor optimization for adaptively learning global model parameters. For the FGW-OT, the computation of (an intermediate variable) takes , where denotes the number of edges, denotes the number of anchors, and denotes the number of nodes. Notably, the adjacency matrices are often sparse, so the computational complexity can be significantly decreased. Likewise, the OT based on Sinkhorn algorithm takes . Then, the FGW-OT takes . For the low-rank tensor optimization, the update of with occupies the dominant computational complexity. First, the FFT and the inverse FFT require , where and are the dimensions for the -th layer's parameters, is the number of clients, and is the higher dimension. Second, the T-SVD costs . Then, the low-rank tensor optimization takes . Overall, the computational complexity for the proposed FedPAM is .
R2 to W2: Thanks for this good comment. We have explained the computational complexity in R1. For the proposed FedPAM, its communication overhead is , where denotes the model parameter volume, denotes the dimension of anchors. Compared to other FGL methods, FedPAM additionally transmit a set of anchors, which are essentially multiple feature vectors and do not take up much communication traffic. For the scalability in practical federated environments, we test FedPAM with large-scale clients, e.g. 50 and 100 clients. The experimental results are reported in Tables 1 and 2. We can see that FedPAM still achieves superior performance.
Table 1: Performance comparison with 50 clients, where the SimCLR is employed in local training.
| Method | PubMed (ACC) | PubMed (Fscore) | OA (ACC) | OA (Fscore) |
|---|---|---|---|---|
| FedAvg | 73.57 | 65.15 | 43.59 | 31.57 |
| FedProx | 70.42 | 59.40 | 40.83 | 28.98 |
| MOON | 70.44 | 59.45 | 42.78 | 30.44 |
| FedX | 70.43 | 59.43 | 47.81 | 34.83 |
| FedU | 72.65 | 63.60 | 44.07 | 30.62 |
| FedPAM | 74.29 | 67.31 | 49.85 | 38.34 |
Table 2: Performance comparison with 100 clients, where the SimCLR is employed in local training.
| Method | PubMed (ACC) | PubMed (Fscore) | OA (ACC) | OA (Fscore) |
|---|---|---|---|---|
| FedAvg | 68.21 | 56.76 | 37.90 | 26.01 |
| FedProx | 68.23 | 56.78 | 36.18 | 24.13 |
| MOON | 68.08 | 56.62 | 38.84 | 26.95 |
| FedX | 69.27 | 58.25 | 47.04 | 33.18 |
| FedU | 71.82 | 62.66 | 38.08 | 26.23 |
| FedPAM | 72.36 | 65.22 | 48.58 | 37.32 |
R3 to W3: Thank you for this valuable suggestion. We added a new unsupervised FL algorithm, namely FLPD [1]. The experimental results are shown in Table 3. It can be seen that the proposed FedPAM achieves more superior performance. The differences between FedPAM and the three works [2][3][4] are from two aspects. First, we propose a new framework for unsupervised federated subgraph learning. To the best of our knowledge, we are the first to study this task. Second, the two proposed modules, RSA and AGPL, are novel and can well address the challenges of unsupervised federated graphs.
Table3: Performance comparison with a new unsupervised FL method FLPD.
| SSL | Method | Cora ACC | Cora Fscore | CiteSeer ACC | CiteSeer Fscore |
|---|---|---|---|---|---|
| Simsiam | FLPD | 57.27 | 45.43 | 40.63 | 26.41 |
| FedPAM | 61.40 | 52.85 | 49.53 | 38.73 | |
| SimCLR | FLPD | 56.95 | 43.22 | 38.91 | 22.87 |
| FedPAM | 58.62 | 45.21 | 43.90 | 31.61 | |
| BYOL | FLPD | 65.37 | 54.01 | 44.82 | 32.34 |
| FedPAM | 66.01 | 55.23 | 48.81 | 38.69 |
[1] Prototype similarity distillation for communication-efficient federated unsupervised representation learning
[2] Federated Contrastive Learning of Graph-Level Representations
[3] Federated Self-Explaining GNNs with Anti-shortcut Augmentations
[4] Subgraph Federated Unlearning
Thank you for your response. After careful consideration, I decide to remain my score, i.e., boardline acc.
Thank you again for provding many valuable comments and reviewing our responses!
Dear Reviewers,
Please review the authors' rebuttal and indicate whether your concerns have been adequately addressed or not. Thank you for your efforts.
Best regards,
AC
The work proposed a federated graph learning method via incorporating FGW and tensor decomposition. The experiments showed the effectiveness in comparison to a few baselines. I summarize the strengths and weaknesses below.
Strengths
- The idea of using FGW and tensor decomposition for graph federated learning is novel.
- The paper is easy to follow and well-organized.
- The baselines and datasets in the experiments are sufficient. The results demonstrated the superiority of the proposed method over a few baselines.
Weakness
- The theoretical analysis is limited, though the authors tried to analyze the generalization bounds during the rebuttal.
During the rebuttal, the authors addressed most concerns of the reviewers, such as the computational complexity, running time, and more baselines. During the discussion, most reviewers supported accepting the paper.
In addition to the reviews given by the reviewers, I have the following comment. During the rebuttal, the authors tried to use the theoretical results of domain adaptation to derive a generalization bound for the proposed model. However, the theory usually requires an independence assumption, which does not hold for the nodes on a graph, because they are connected. In addition, using to measure the topological divergence makes no sense, since they may have different numbers of nodes and are permutation invariant. So, if the paper is accepted, the authors should be careful about the added theoretical results from the rebuttal.
Given the novel idea, the impressive numerical performance, the recognition of the reviewers, and the fact that the theoretical result is not compulsory for the current work, I recommend accepting the paper.