PaperHub
6.6
/10
Poster4 位审稿人
最低2最高4标准差0.9
4
4
4
2
ICML 2025

EAGLES: Towards Effective, Efficient, and Economical Federated Graph Learning via Unified Sparsification

OpenReviewPDF
提交: 2025-01-23更新: 2025-07-24

摘要

Federated Graph Learning (FGL) has gained significant attention as a privacy-preserving approach to collaborative learning, but the computational demands increase substantially as datasets grow and Graph Neural Network (GNN) layers deepen. To address these challenges, we propose $EAGLES$, a unified sparsification framework. EAGLES applies client-consensus parameter sparsification to generate multiple unbiased subnetworks at varying sparsity levels, reducing the need for iterative adjustments and mitigating performance degradation. In the graph structure domain, we introduced a dual-expert approach: a $graph sparsification expert$ uses multi-criteria node-level sparsification, and a $graph synergy expert$ integrates contextual node information to produce optimal sparse subgraphs. Furthermore, the framework introduces a novel distance metric that leverages node contextual information to measure structural similarity among clients, fostering effective knowledge sharing. We also introduce the $Harmony Sparsification Principle$, EAGLES balances model performance with lightweight graph and model structures. Extensive experiments demonstrate its superiority, achieving competitive performance on various datasets, such as reducing training FLOPS by 82% $\downarrow$ and communication costs by 80% $\downarrow$ on the ogbn-proteins dataset, while maintaining high performance.
关键词
Federated LearningGraph LearningSparsification

评审与讨论

审稿意见
4

This paper introduces a unified framework that jointly considers graph-level and parameter-level sparsification. It incorporates dual experts and consensus-based sparsification to ensure a stable sparsification process. Extensive experiments demonstrate that the proposed method is effective, efficient, and economical.

给作者的问题

(1)Could the authors clarify how the Optimal Transport (OT) method adapts to the federated setting when client graphs have significant structural variations? Would this method still function efficiently if the number of clients increased substantially? (2)How does WgateW_{gate} impact parameter sparsification when the number of GSEs increases? Would it lead to an excessive amount of additional parameters?

论据与证据

The claims are supported by extensive experiments across datasets (Cora, Ogbn-Proteins) and metrics (FLOPS, ROC-AUC). Reductions in computational costs (82%↓ FLOPS) and communication (80%↓ bytes) are validated against baselines like FedAvg and ACE-GLT. However, claims about mitigating structural heterogeneity rely on qualitative arguments (e.g., "similar clients share knowledge via OT distance") without quantitative analysis of heterogeneity reduction.

方法与评估标准

The methods are well-suited for FGL challenges. Parameter sparsification avoids iterative pruning via dynamic masking, and graph sparsification addresses structural overfitting through multi-criteria experts. Evaluation on diverse datasets (small to large-scale) and metrics (FLOPS, ROC-AUC) is comprehensive.

理论论述

The manuscript’s mathematical formulation is generally free from notable errors; however, it lacks an analysis of computational complexity, which would provide a clearer understanding of the scalability and practical applicability of the proposed methods.

实验设计与分析

Experiments are thorough, covering multiple datasets, sparsity levels, and baselines. Ablation studies (Table 2) validate parameter-graph sparsity interplay. However, the impact of expert count (Figure 7b) is under-discussed.

补充材料

No supplementary material.

与现有文献的关系

EAGLES makes a significant contribution by introducing a unified sparsification framework. The dual-expert approach, which builds upon MoE methods, adapts them for federated graph learning, an area that has seen limited exploration.

遗漏的重要参考文献

The authors discuss and compare a wide range of related methods.

其他优缺点

Strengths: (1) The paper effectively identifies a critical challenge in federated graph learning (FGL): the high computational cost and communication overhead when training GNNs on large-scale federated datasets. By introducing EAGLES, a unified sparsification approach, the authors provide a clear solution that addresses both graph and parameter sparsification, ensuring efficiency without sacrificing model performance. (2) The extensive set of experiments conducted across various benchmark datasets, including ogbn-proteins and Pubmed, demonstrates the practical effectiveness of the proposed method. The substantial reductions in training FLOPS and communication costs, achieved while maintaining or even improving model accuracy, provide strong empirical evidence of the method’s efficiency and scalability.

Weaknesses: (1)While the method demonstrates significant improvements in computational efficiency, a clear computational complexity analysis would help contextualize the performance gains.

其他意见或建议

The computational complexity of EAGLES could be better articulated, particularly regarding how its sparsification techniques scale with increasing data size or client count. This would provide a clearer view of the system’s scalability in large federated environments.

作者回复

Dear Reviewer h7Za

We sincerely thank you for your insightful feedback and have provided detailed responses to your questions.

W1: Without quantitative analysis of heterogeneity reduction.

We provide a quantitative analysis of heterogeneity reduction at this link.

W2: Lack an analysis of computational complexity & S1: How its sparsification techniques scale with increasing data size or client count.

We analyzed the computational complexity from three aspects:

  1. Parameter Sparsification Module:

    • Forward/Backward FLOPs are O(spd)O(s_p \cdot d), where sps_p is the parameter sparsity rate and dd is the parameter dimension.
    • Communication costs are reduced to O(spd)O(s_p \cdot d) via bit-wise mask compression (Section 4.2).
    • Mask alignment requires O(KL)O(K \cdot L) operations per round, but with small constants for KK (clients) and LL (layers), its impact is negligible.
  2. Graph Sparsification Module:

    • With TT GSEs, the local computation is O(TE)O(T \cdot |E|), where E|E| is the number of edges.
    • The message passing process has a complexity of O(sgEd)O(s_g \cdot |E| \cdot d), with sgs_g as the graph sparsification rate.
    • The gating mechanism adds O(ND)O(N \cdot D) operations, but since DD is typically small, its overhead is minimal.
  3. OT-based Similarity Computation:

    • Standard OT complexity is O(n3)O(n^3) for nn-node graphs, but we reduce this to O(nlogn)O(n \log n) using the sliced Wasserstein distance.

Overall, the computational complexity of EAGLES is given by:

O(spd+TE+sgEd+nlogn)O\Big(s_p \cdot d + T \cdot |E| + s_g \cdot |E| \cdot d + n \log n\Big)

Ignoring smaller constants, this simplifies to:

O(d+E+nlogn)O(d + |E| + n \log n)

In summary, EAGLES scales linearly with the data size, and the number of clients has minimal impact on the computational complexity.

W3: However, the impact of expert count (Figure 7b) is under-discussed.

As shown in Figure 7b (Appendix D.3), performance improves consistently as the number of experts increases, reaching its peak around 4 experts due to the benefit of richer structural perspectives. Beyond this point, the marginal gains diminish. While Section 5.4 briefly touches on this point, we agree that a more in-depth analysis would further strengthen the discussion.

Q 1.1: Could the authors clarify how the OT method adapts to the federated setting when client graphs have significant structural variations?

In FL settings with significant structural variations, our OT adaptation relies on two key mechanisms. First, the graph synergy expert encodes node contextual features in the WgateW_{\text{gate}} matrix, forming a structure-aware semantic space via hard concrete distribution sampling. This enables OT to assess similarity based on learned semantics instead of raw topology. Second, by treating each client’s structural distribution as a probability measure over this space, we derive client-specific transport plans and similarity weights (Eqs. 20 and 22), which automatically assign lower weights to structurally dissimilar clients. Importantly, only the compact WgateW_{\text{gate}} parameters are transmitted, preserving privacy while allowing the server to compute OT plans with O(nlogn)O(n \log n) complexity through entropic regularization.

Q 1.2: Would this method still function efficiently if the number of clients increased substantially?

EAGLES remains efficient even as the number of clients grows significantly. Our framework reduces communication and computation through parameter sparsification (dynamic mask consensus) and OT-based similarity aggregation, minimizing redundant interactions. Experiments (Appendix Fig. 6a & 6b) show that scaling to 100 clients on Ogbn-Proteins and Cora results in only a minor performance drop, while achieving an 18% reduction in Training FLOPS and a 20% reduction in Communication Bytes compared to baselines.

Q 2.1: How does $W_{gate}$ impact parameter sparsification when the number of GSEs increases?

WgateW_{\text{gate}} is a learnable gating parameter. Additional GSEs introduce richer structural diversity, and by integrating sparsified subgraphs obtained from multiple criteria through WgateW_{\text{gate}}, the robustness of graph sparsification is enhanced. This reduction in structural redundancy allows for the allocation of different gradient update weights to model parameters during backpropagation, thereby influencing parameter sparsification.

Q 2.2: Would $W_{gate}$ lead to an excessive amount of additional parameters?

The gating parameter matrix WgateW_{\text{gate}} (Eq. (12)) is designed as a lightweight mapping layer with low dimensionality. Specifically, its parameter size is D×TD \times T, where DD is the input feature dimension and TT denotes the number of experts. Since the number of experts is typically a small constant, WgateW_{\text{gate}} scales linearly with DD, thereby not introducing an excessive number of parameters.

审稿人评论

I have carefully reviewed the rebuttal and also checked the feedback from other reviewers. The authors' further explanation of computational complexity is convincing, and my questions have been well addressed. The work may have a potential impact and will accordingly increase my score.

作者评论

Dear reviewer h7Za

Thank you for your thoughtful feedback and for reconsidering our work. Your comments helped us refine the presentation and strengthen the manuscript. We truly appreciate the opportunity to clarify our approach and the time you spent reviewing our submission.

Best regards,

Authors

审稿意见
4

The paper introduces EAGLES, a unified sparsification framework designed to enhance FGL by addressing computational and communication challenges. EAGLES optimizes both graph structures and model parameters through client-consensus parameter sparsification, which generates multiple unbiased subnetworks at various sparsity levels. The method also employs a dual-expert approach with graph sparsification and synergy experts, which improve the efficiency of message passing and reduce data overfitting. The comprehensive experimental results validate the effectiveness of the proposed method.

给作者的问题

1.How does the proposed method perform in scenarios where clients have vastly different computational capabilities (e.g., edge devices versus more powerful systems)?

2.In the code, the authors only perform data partitioning using the Louvain method. Can the proposed approach still be effective under other non-iid partitioning methods, such as Metis?

论据与证据

The paper provides a relatively clear explanation of its claims. FGL faces significant computational challenges when handling large-scale graph data. Figure 1 effectively illustrates this phenomenon. However, additional empirical studies could further corroborate this analysis and strengthen the claims presented in the paper.

方法与评估标准

The proposed methodology and evaluation criteria align well with the problem of optimizing federated graph learning. The dual-expert sparsification approach appears to be a reasonable solution, and the chosen evaluation metrics (FLOPS and communication costs) are directly applicable to the problem at hand.

理论论述

The theoretical section of the manuscript is relatively detailed. In particular, the Harmony Sparsification Principle and its impact on federated graph learning are interesting and well-reasoned, providing concrete theoretical guidance and practical reference for the design of sparsification frameworks.

实验设计与分析

Extensive experiments across six datasets and multiple backbones (GCN, GraphSAGE, DeeperGCN) strengthen validity. Ablation studies on sparsity rates and client numbers (Figures 4–7) convincingly demonstrate resilience.

补充材料

No supplementary material.

与现有文献的关系

EAGLES builds on federated learning (FedAvg, FedProx) and graph sparsification (DSpar [1]). The integration of MoE for graph pruning is novel, advancing prior work on MoE [2]. 

[1] Liu Z, Zhou K, Jiang Z, et al. DSpar: An Embarrassingly Simple Strategy for Efficient GNN Training and Inference via Degree-based Sparsification. arXiv preprint arXiv:2307.02947, 2023.

[2] Shazeer N, Mnih A, Ranzato M, et al. Outrageously large neural networks: The sparsely gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538, 2017.

遗漏的重要参考文献

The key contribution of the paper is the unified sparsification approach for FGL, but it only references a graph sparsification technique, DSpar, that sparsifies graph structures based on node degree. However, there is also a relevant method, DropEdge, introduced by [3], which applies random edge dropout to improve deep graph convolutional networks for node classification. This technique is particularly important for reducing computational costs while preserving graph structure and can be considered an essential reference for addressing graph sparsification challenges in the context of FGL, especially in comparison to the single-criterion sparsification discussed in the paper.

[3] Rong Y, Huang W, Zhang Y, et al. DropEdge: Towards Deep Graph Convolutional Networks on Graphs with Sparse Edge Features. arXiv preprint arXiv:2006.10616, 2020.

其他优缺点

Strengths:

  • This paper introduces the first unified framework for both graph and parameter sparsification in FGL.
  • The motivation behind this paper is explained with great clarity.
  • This paper presents a novel use of Optimal Transport (OT) to measure client similarity, which is an interesting approach.

Weaknesses:

  • There is a typo on page seven in Section 5 where "comprehensively" is misspelled as "omprehensively."
  • Experiments focus primarily on academic citation and biological networks. There is no validation on social network graphs. Including relevant experiments would strengthen the generalizability and applicability of the proposed method.

其他意见或建议

The manuscript specifies the split ratios for each dataset but does not describe the splitting strategy. The authors should include details on the splitting approach in the manuscript.

作者回复

Dear Reviewer Fguv

We sincerely thank you for taking the time to evaluate our work and have adressed your concerns as follows:

W1: Additional empirical studies addressing the significant computational challenges faced by FGL will further strengthen this analysis.

In the theoretical model, the message passing mechanism in GNNs causes the neighborhood size to expand exponentially with the number of layers. For a graph with an average degree of dd, 1-hop neighborhood covers dd neighbors, a 2-hop neighborhood covers d2d^2 neighbors, and an LL-hop neighborhood covers dLd^L neighbors [1].

We measured the k-hop receptive fields (k=1,2,3,4) for the amz-photo and Ogbn-arxiv datasets. The results are as follows:

datasets1-hop2-hop3-hop4-hop
amz-photo32.13802.862519.354681.62

The results show that the receptive fields exhibit a clearly super-linear growth trend, confirming that GNNs indeed face significant computational challenges.

[1]: Xu, K.; Hu, W.; Leskovec, J.; and Jegelka, S. (2019). How Powerful are Graph Neural Networks? arXiv preprint arXiv:1810.00826.

W2: Supplementary experiments addressing the omitted DropEdge.

We conducted experiments on DropEdge on the PubMed, with some of the results presented below (we adopted the original 0.8 retention rate for GCN).

Pubmed

MethodsTop-1 AccuracyMax Training FLOPSCommunication BYTES
FedAvg85.651x (2.49E9)1x (6.19E9)
DropEdge85.780.90x (↓0.10x)1.00x (↓0.00x)
EAGLES86.970.48x (↓0.52x)0.37x (↓0.63x)

W3: A typo on page seven in Section 5 where "comprehensively" is misspelled as "omprehensively."

Thank you for your careful reading. We have corrected the typo and carefully proofread the manuscript to fix similar issues.

W4: Validate the proposed method on social network graph datasets.

We conducted experiments on the Flickr dataset to validate the effectiveness of the proposed method on social network graph datasets:

Flickr

MethodsTop-1 AccuracyMax Training FLOPSCommunication BYTES
FedAvg50.151x (8.49E9)1x (4.67E9)
FGGP49.78(↓0.37)1.23x (↑0.23x)1.33x (↑0.33x)
PruneFL47.45(↓2.70)0.77x (↓0.23x)1.00x (↓0.00x)
FedDIP50.33(↑0.17)0.67x (↓0.33x)0.83x (↓0.17x)
EAGLES50.89 (↑0.74)0.48x (↓0.52x)0.37x (↓0.63x)

S1: The manuscript specifies the split ratios for each dataset but does not describe the splitting strategy.

Thank you for pointing that out. We will include additional details on the splitting strategy in the revised manuscript.

Q1: How does the proposed method perform in scenarios where clients have vastly different computational capabilities (e.g., edge devices versus more powerful systems)?

Clients with limited resources can opt for higher sparsity to reduce memory and computation, while more capable machines may choose lower sparsity for better accuracy. Additionally, consensus-based parameter masks and a multi-expert graph sparsification framework ensure that all clients benefit from an efficient, robust model.

Q2: Can the proposed approach still be effective under other non-iid partitioning methods, such as Metis?

We conducted experiments on Cora and ogbn-arxiv, and the results are shown below:

Cora

MethodsTop-1 AccuracyMax Training FLOPSCommunication BYTES
FedAvg70.621x (6.72E8)1x (6.02E9)
FGGP69.58 (↓1.04)1.42x (↑0.42x)1.18x (↑0.18x)
PruneFL67.94 (↓2.68)0.57x (↓0.43x)1.00x (↓0.00x)
FedDIP70.38 (↓0.24)0.61x (↓0.39x)0.59x (↓0.41x)
EAGLES71.27 (↑0.65)0.48x (↓0.52x)0.37x (↓0.63x)

Ogbn-arxiv

MethodsTop-1 AccuracyMax Training FLOPSCommunication BYTES
FedAvg55.301x (1.58E10)1x (6.14E9)
FGGP55.09(↓0.21)5.66x (↑4.66x)1.45x (↑0.45x)
PruneFL52.34 (↓2.96)0.69x (↓0.31x)1.00x (↓0.00x)
FedDIP55.32 (↑0.02)0.48x (↓0.52x)0.59x (↓0.41x)
EAGLES56.89 (↑1.59)0.35x (↓0.65x)0.47x (↓0.53x)

The results show that under Metis partitioning, EAGLES still exhibits superior performance.

审稿意见
4

EAGLES introduces a framework designed to reduce computational and communication costs in federated graph learning by jointly sparsifying both model parameters and graph structures. It employs client-consensus pruning to generate subnetworks at different sparsity levels and utilizes a mixture of experts for graph sparsification. This approach achieve substantial reductions in FLOPs and communication overhead across various datasets, all while preserving accuracy. Results show improved performance over baselines in node classification tasks.

给作者的问题

Can the authors provide additional ablation experiments regarding λ2\lambda_2 and λ3\lambda_3 in Eq (19)?

论据与证据

The claims are largely supported by experiments across six datasets (Cora, Pubmed, OGB benchmarks) and comparisons with 14 baselines. Evidence includes:

  1. Table 1 shows EAGLES outperforms FedAvg/FedProx in accuracy (e.g., +1.32% on Pubmed) while reducing FLOPs (52%) and communication (63%).

  2. Ablation studies (Table 2, Figure 5) validate the impact of sparsity levels.

  3. Theoretical grounding via the Harmony Sparsification Principle (Eq. 3) aligns with empirical results.

方法与评估标准

  1. Parameter Sparsification: Dynamic threshold optimization with STE and consensus masking (Eq. 4-7) is novel and suitable for federated settings.

  2. Graph Sparsification: Dual experts (GSE/GSyE) with hard concrete distribution (Eq. 13-17) effectively address structural heterogeneity. Evaluation:

  3. Metrics (Top-1 Accuracy, ROC-AUC, FLOPs, communication bytes) are standard and comprehensive.

理论论述

The theoretical claims are basically correct. However, what WgateW_{gate} refers to in eq (12) lacks the necessary explanation in the text.

实验设计与分析

The framework is validated across 6 datasets with diverse backbones (GCN, GraphSAGE) and compared against 14 baselines, including the state-of-the-art federated graph learning method (FedTAD) and pruning approaches like ACE-GLT.

补充材料

There is no supplementary material.

与现有文献的关系

The contributions of the paper are related to the broader scientific literature of following areas.

  1. FGL: Improves FedAvg/FedProx by addressing graph/parameter redundancy.

  2. MoE: Adapts mixture-of-experts to graph sparsification (novel).

  3. Pruning: Unifies model/graph pruning, unlike DSpar/ACE-GLT.

遗漏的重要参考文献

This manuscript compares and discusses quite a few baseline methods.

其他优缺点

Strengths:

  1. The paper creatively bridges federated learning and graph sparsification, addressing both computational and structural challenges in FGL. This dual focus (parameter + graph sparsification) is novel and addresses a critical gap in federated graph learning literature.

  2. The framework’s ability to handle large-scale graphs (e.g., ogbn-proteins with 132,534 nodes) demonstrates real-world applicability.

Weaknesses:

  1. Experiments focus on homophilic graphs (e.g., Cora, OGB). Performance on heterophilic graphs (e.g., arXiv) remains unvalidated.

  2. Whether the proposed method can speed up the experiments was not explored.

其他意见或建议

It is suggested to supplement experiments on training time to further verify the efficiency of the method.

作者回复

Dear Reviewer jagF

We sincerely appreciate your detailed review and invaluable feedback. In the response below, we provide a thorough reply to address your concerns and offer a clearer explanation of our method.

W1: what $W_{gate}$ refers to in eq (12) lacks the necessary explanation in the text.

W_gateW\_{\text{gate}} is a learnable weight matrix in the Graph Synergy Expert (GSyE) module that performs gating. It projects the node feature matrix XX into a latent space for Hard Concrete sampling, determining which graph sparsification experts to activate. In essence, W_gateW\_{\text{gate}} generates gating vectors zz that, after thresholding, indicate the significance of each edge in the final sparsified subgraph. We will introduce WgateW_{\text{gate}} in the revised manuscript.

W2: Performance on heterophilic graphs (e.g., arXiv) remains unvalidated.

We conducted experiments on heterophilic graphs using ogbn-arxiv-TA [1] from the HeTGB (Heterophilic Text-attributed Graph Benchmark):

MethodsTop-1 AccuracyMax Training FLOPSCommunication
FedAvg64.241x (1.78E10)1x (6.16E9)
PruneFL61.370.58x1.00x
FedTAD64.9232.42x1.11x
EAGLES65.890.33x (↓0.68x)0.48x (↓0.52x)

The results show that on heterophilic graphs using ogbn-arxiv-TA, EAGLES also demonstrates superiority.

[1]: Li, S.; Wu, Y.; Shi, C.; and Fang, Y. (2025). HeTGB: A Comprehensive Benchmark for Heterophilic Text-Attributed Graphs. arXiv preprint arXiv:2503.04822.

W3 & S1: Whether the proposed method can speed up the experiments was not explored, and supplement experiments on training time to further verify the efficiency of the method.

We measured the time required for clients to reach the target accuracy on the ogbn-arxiv dataset across different methods, and the accuracy achieved by different methods at the same number of epochs on the ogbn-proteins dataset. The results are presented below:

TIME TO REACH TARGET ACCURACY

MethodsTime to reach 70% accuracyTime to reach 80% accuracyTime to reach 90% accuracy
FedAvg54.23 s223.42 s376.23 s
FedTiny35.28 s149.08 s281.23 s
FedDIP36.44 s155.68 s278.62 s
PruneFL40.52 s177.21 s319.53 s
EAGLES14.04 s85.94 s162.79 s

ROC-AUC UNDER THE SAME EPOCH

MethodsEPOCH: 50EPOCH: 150EPOCH: 300
FedAvg70.3480.3881.49
FedTiny71.3677.6879.90
FedDIP70.9878.5981.33
PruneFL69.2276.2378.69
EAGLES73.9781.8582.32

The results show that EAGLES can train the model to a target accuracy in a shorter amount of time, and also achieve higher performance at the same number of epochs. This further verifies that our proposed method can accelerate model training.

Q1: Can the authors provide additional ablation experiments regarding $λ_2$ and $λ_3$ in Eq(19)

We conducted ablation experiments on λ2λ_2 and λ3λ_3 on the ogbn-arxiv dataset, and the results are presented below:

Ogbn-arxiv (fix λ3λ_3 = 1e-6)

datasetλ2λ_2=0.1λ2λ_2=0.05λ2λ_2=0.2
Pubmed86.9786.23 (↓0.74)86.96 (↓0.01)
photo92.3192.75 (↑0.44)91.85 (↓0.46)
Ogbn-arxiv65.3764.92 (↓0.45)65.33 (↓0.04)

Ogbn-arxiv (fix λ2λ_2 = 0.1)

datasetλ3λ_3=1e-6λ3λ_3=1e-5λ3λ_3=1e-7
Pubmed86.9784.22 (↓2.75)86.89 (↓0.08)
photo92.3190.56 (↓1.75)92.15 (↓0.16)
Ogbn-arxiv65.3764.38 (↓0.99)65.42 (↑0.05)

Increasing λ2λ_2 enforces stricter GSyE constraints, balancing expert contributions and enhancing subgraph homogeneity. In heterogeneous scenarios, raising λ2λ_2 from 0.05 to 0.2 leads to a clear accuracy boost. Conversely, while a larger λ3λ_3 speeds up sparse parameter identification, it may cause significant accuracy loss; a smaller λ3λ_3 permits finer-grained sparsification.

审稿人评论

I have read the rebuttal and appreciate the authors' response. The additional experiments further validate the effectiveness of the proposed method. A minor suggestion is to include the explanation for weakness 2 in the paper if it has not already been added. I maintain my score and support the acceptance of the paper.

作者评论

Dear Reviewer jagF,

We sincerely appreciate your invaluable support for our research. Your insightful suggestions regarding the scalability and flexibility of EAGLES have significantly contributed to improving the depth and precision of our manuscript. It has been an honor to incorporate your comments and strengthen our work accordingly. Thank you once again for your time, expertise, and constructive review.

Best regards,

Authors

审稿意见
2

This work introduces EAGLES, a framework for Federated Graph Learning (FGL) that reduces computational demands while maintaining high performance. By unifying graph and model sparsification, it simplifies graph structures and prunes model parameters efficiently. EAGLES uses multi-criteria experts to sparsify graphs and integrates results using a synergy expert, ensuring better knowledge sharing across clients with diverse data.

给作者的问题

The paper introduces the Graph Synergy Expert (GSyE) to integrate sparsified subgraphs from multiple experts, but a more detailed, step-by-step explanation of its process would enhance clarity. Specifically, describing how the GSyE optimizes the hard concrete distribution and how the gating mechanism selects and integrates key structural information for each node would provide valuable insights. Including examples, pseudocode, or visualizations could further illustrate the functionality and importance of this component.

The consensus-based parameter sparsification strategy is briefly mentioned, but its implementation could benefit from greater depth. A detailed explanation of how dynamic masking thresholds are computed, how client-specific masks are aligned, and how the rollback pruning strategy ensures pruning stability would make the approach more comprehensible.

Additionally, elaborating on how the framework balances trade-offs between structural similarity, computational efficiency, and model performance in the optimization process (Equation 3) would offer a stronger understanding of its effectiveness.

论据与证据

Yes

方法与评估标准

Yes

理论论述

Yes

实验设计与分析

Yes

补充材料

Yes

与现有文献的关系

Yes

遗漏的重要参考文献

Yes

其他优缺点

Strengths:

  • EAGLES introduces a unified sparsification framework that simultaneously sparsifies graph structures and model parameters. It uses multi-criteria graph sparsification experts and a synergy expert to reduce graph size while preserving critical structural information.

  • By addressing key challenges like data heterogeneity, computational inefficiency, and communication overhead, EAGLES provides a scalable and economical solution for Federated Graph Learning. Consensus-based parameter sparsification further ensures efficient pruning without iterative adjustments, addressing computational and communication overhead.

  • The evaluation shows resilience under varying sparsification rates and client distributions, making it effective for large-scale federated graph learning.

Weaknesses:

  • While the paper introduces the Graph Synergy Expert to integrate sparsified subgraphs from multiple experts, it would benefit from a more detailed step-by-step explanation of how the GSyE processes and combines the outputs of various sparsification experts.

For instance, describing how the hard concrete distribution is optimized and how the gating mechanism selects and integrates key structural information for each node would enhance clarity.

  • The consensus-based parameter sparsification strategy is described briefly, but its implementation could use more depth. A detailed breakdown of how the dynamic masking thresholds are computed, how client-specific masks are aligned, and how the rollback pruning strategy ensures pruning stability would be valuable.

  • Elaborating on how the trade-offs between structural similarity, computational efficiency, and model performance are balanced in the optimization process (Equation 3) would strengthen the framework.

其他意见或建议

N.A

作者回复

We sincerely appreciate for taking the time to review our manuscript and hope our response will address your concerns and contribute to an improved score.

W1: How the GSyE processes and combines the outputs of various sparsification experts.

In our method, GSyE (Graph Synergy Expert) is used to fuse and refine different subgraphs generated by multiple GSE (Graph Sparsification Expert, aiming to alleviate structure information overfitting during the graph sparsification process. The specific steps are as follows:

  1. Multiple GSE generate various versions of subgraphs based on different criteria (Eq. (11)).
  2. For the target node v, its node features X\mathbf{X} are projected using a learnable matrix Wgate\mathbf{W}_{gate} to obtain gating scores z\boldsymbol{z}, which are then used to generate continuous approximations ψ(z)\psi(\boldsymbol{z}) of binary gates, thereby enabling gradient-based optimization (Eq. (13)).
  3. The HardStep function (Eq. (14)) is applied to hard-threshold ψ(z)\psi(\boldsymbol{z}) to either 0 or 1, determining whether to activate the corresponding expert's recommendation for the edge eije_{ij}.
  4. For each edge eije_{ij}, if at least one expert recommends retaining it, the edge is marked as a candidate edge; otherwise, it is pruned directly.

Q1: How the hard concrete distribution is optimized?

The hard concrete distribution is applied to the raw gating scores z\boldsymbol{z} (Eq. (12)) to produce continuous probabilities ψ(z)\psi(\boldsymbol{z}). The HardStep function further binarizes ψ(z)\psi(\boldsymbol{z}) into discrete gates—0 and 1. During backpropagation, we employ the Straight-Through estimator [1] to approximate gradients and address the optimization problem.

[1]: Bengio, Y.; Léonard, N.; and Courville, A. (2013). Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432.

Q2: How dynamic masking thresholds are computed? + Q3: How client-specific masks are aligned?

In consensus-informed parameter sparsification, dynamic masking thresholds are computed through a layer-wise adaptive process. For the ll-th layer’s parameter matrix W(l)W^{(l)}, a threshold vector κ_0(l)\kappa\_{0}^{(l)} is dynamically optimized using straight-through estimators (STE) to bypass non-differentiability during backpropagation. Specifically, parameters W(l)W^{(l)} are pruned if their absolute values fall below κ_0(l)\kappa\_{0}^{(l)}, where thresholds are updated via a loss function (Eq. (6)) to maximize sparsity while maintaining performance. Client-specific masks are aligned through a consensus mechanism where overlapping “1”s in binary masks across clients form a unified sparse subnetwork, enabling parameter sharing and communication efficiency.

Q4: How the rollback pruning strategy ensures pruning stability?

The rollback pruning strategy ensures pruning stability by designating, at each predefined pruning checkpoint (for example, every 10% increment in pruning rate), the highest-performing subnetwork within an acceptable accuracy range (±3%). Before moving on to a deeper level of pruning, the method reverts (rolls back) to this optimal subnetwork. This rollback step prevents the accumulation of errors from continuous pruning and mitigates sudden drops in performance, thereby maintaining overall model stability and ensuring effective deep pruning.

Q5: How the framework balances trade-offs between structural similarity, computational efficiency, and model performance in the optimization process (Equation 3)?

Our weighted loss function combines a structural similarity term (enforcing alignment of sparse subnetworks via mask consensus), a computational cost penalty (promoting parameter sparsity to reduce FLOPs), and a task-specific performance loss (e.g., cross-entropy for accuracy). Hyperparameters λ_1λ\_1 and λ_2λ\_2 dynamically adjust the trade-offs: higher λ_1λ\_1 prioritizes mask consistency across clients, while λ_2λ\_2 controls sparsity-intensity.

最终决定

Three reviewers gave it a score of 4. The paper makes a clear and compelling case for the need to jointly consider parameter-level and data-level sparsification in FGL systems, a perspective that is both original and timely given the increasing deployment of GNNs in resource-constrained federated environments. The paper is backed by strong motivation and a novel problem setting and the authors provided additional analysis on computational complexity and clarified certain architectural details.Therefore, this paper can make it a valid contribution to ICML, and the decision is to recommend Acceptance.