5.0

/10

withdrawn4 位审稿人

最低5最高5标准差0.0

3.3

置信度

正确性2.8

贡献度2.3

表达2.5

ICLR 2025

GC4NC: A Benchmark Framework for Graph Condensation on Node Classification with New Insights

Shengbo Gong,Juntong Ni,Noveen Sachdeva,Carl Yang,Wei Jin

OpenReview PDF

提交: 2024-09-28更新: 2024-12-25

TL;DR

We establish a benchmark, exploring diverse aspects including scalability, NAS, data initialization, privacy preservation, graph property preservation, and robustness of recent graph condensation methods.

摘要

关键词

Graph condensationDataset distillationDataset condensationGraph neural network

评审与讨论

审稿意见

评分: 5置信度: 52024-10-27

This paper presents GC4NC, a benchmark for evaluating graph condensation methods on node classification. GC4NC assesses performance, efficiency, privacy, denoising, NAS effectiveness, and transferability, revealing key insights into design choices like trajectory matching and structural synthesis. It provides a robust benchmark to support fair comparisons and future advancements in GC methods.

优点

The paper addresses a significant gap by providing a unified and multi-dimensional evaluation protocol. It allows systematic comparisons and insights into GC methods, including under-explored aspects such as privacy preservation and denoising ability.
The findings offer novel insights, such as the trade-offs between GC performance and efficiency, the privacy benefits of GC methods, and how certain GC methods enable better transferability across GNN architectures.
The paper promises an open-source codebase for GC4NC, promoting reproducibility and further experimentation.

缺点

From Table 6, in comparison with the existing works GCondenser [1] and GC-Bench [2], this paper's GC4NC is less comprehensive than GC-Bench. The advantage of GC4NC over GC-Bench lies in its inclusion of more Coreset & Sparsification methods; however, this is not a primary research focus for graph condensation.
Insufficient discussion of graph-level methods. Numerous studies focus on graph-level datasets, and it is essential to incorporate these works into the comparison framework.
Insufficient inclusion of representative baselines. The paper lacks comparisons with several key methods, such as Mirage [3]. As GC4NC categorizes methods from a structural perspective, Mirage tackles the problem using computation trees. Omitting Mirage may limit the comprehensiveness of this work.
The paper categorizes the evaluated methods as structure-based and structure-free, which contrasts with the more detailed classifications typically seen in recent surveys. While this choice may emphasize the importance of structural elements, structure is just one component of an effective condensed graph. This simplified categorization overlooks a deeper examination of key technical strategies in graph condensation, such as gradient matching, trajectory matching, and distribution matching, that are essential for a comprehensive understanding and meaningful comparison of these methods.

[1] GCondenser: Benchmarking Graph Condensation, Liu., et.al.

[2] GC-Bench: An Open and Unified Benchmark for Graph Condensation, Sun., et al.

[3] Mirage: Model-Agnostic Graph Distillation for Graph Classification, Gupta M., et al.

问题

Please see the weaknesses.

评论- Authors' Rebuttal 1

2024-11-20

Thank you for your comments. We believe there may have been some misunderstandings, as several of the issues raised might not be applicable. We would like to clarify these points as outlined below.

Q1: Explain comprehensiveness compare to GC-Bench and why it includes more Coreset & Sparsification methods

Coreset selection and sparsification provide essential baseline performance for assessing the effectiveness of GC methods while being more efficient. In some datasets, these methods outperform GC approaches; for example, Averaging achieves higher performance on the Yelp dataset. Similar instances are observed in Table 3 of GC-Bench. Therefore, including coreset and coarsening methods as baselines can provide a comprehensive scope for identifying methods that achieve an optimal balance between efficiency and effectiveness.
As we highlighted in Section 2.2, recent developments in coreset selection [1] and coarsening methods [2] have shown high potential in preserving GNN performance while being more efficient compared to GC methods. These methods are indispensable baselines for comparing GC techniques. Moreover, they can serve as data initialization strategies for GC, as explored in Section 4.7. Studying GC in isolation without considering other graph reduction methods would provide a limited perspective.
Since coreset and coarsening methods often preserve specific graph properties (e.g., centrality and spectral features), if GC methods cannot outperform them, it suggests a need for more explicit designs to retain crucial information.
Uniqueness Compared to GC-Bench:

Expanding Applications of GC: Privacy and Robustness

Existing graph condensation (GC) methods and benchmarks primarily focus on continual learning [4, 5]. Our benchmark broadens this scope by being the first to empirically prove GC’s potential in privacy preservation and denoising, thereby highlighting its significance to a wider audience. The results showcase the promise of GC methods in enhancing privacy, encouraging researchers to explore these new applications.

Comprehensive Evaluation of NAS

Unlike GC-Bench, which does not evaluate NAS—a crucial application of GC—our benchmark conducts a more comprehensive assessment across all current GC methods. While previous work [3] provided preliminary evaluations, our extensive analysis offers deeper insights into the effectiveness of GC in accelerating NAS, underscoring its practical value.

Graph Properties Preservation

We evaluate whether GC methods consistently preserve specific graph properties, an assessment vital for designing future methods that explicitly maintain these properties. This preservation is also crucial for ensuring property privacy for data releasers who wish to keep the statistical properties of their graph datasets confidential. Our benchmark provides detailed analyses of how well GC methods maintain these essential graph characteristics.

By highlighting these unique aspects, our benchmark by focusing on a mature area and providing both board and deeper insights

Q2: More discussion of graph-level methods.

We appreciate your comment and would like to clarify the scope of our benchmark. As highlighted by our benchmark name (GC4NC) and paper title (Graph Condensation on Node Classification), we focus exclusively on the Node Classification task. We have several reasons for prioritizing this area over graph classification:
- First, graph-level tasks differ significantly from node-level tasks. The condensation process for graph-level tasks involves unique steps in initialization, mini-batch versus full-batch training, and other technical aspects that are not directly applicable to node classification.
- Second, Graph-level condensation generates multiple condensed graphs, complicating the application of consistent attack models and graph property evaluations. In contrast, node-level condensation synthesizes a single condensed graph, allowing for more straightforward and unified evaluation methods.
- Lastly, although we attempted to incorporate graph classification methods, they necessitated substantial modifications to work effectively within our framework. These adaptations were beyond the scope of our current benchmark.
Moreover, we chose to focus on node classification because the majority (approximately 90%) of condensation papers, including [7-13], have concentrated on this task. Please see a recent survey [6] for more details.

2024-11-20

Thank you to the authors for the detailed response, which has resolved most of my doubts. Your work is excellent, but unfortunately, the concurrent work GC-Bench has provided a more comprehensive evaluation of methods in the graph condensation field. While this paper focuses on node classification and includes a more thorough node-level evaluation (e.g., privacy, etc.), I believe it is more of an incremental contribution compared with GC-Bench. My ratings remain unchanged.

评论- Authors' Rebuttal 2

2024-11-20

Q3: More methods like MIRAGE

We appreciate your suggestion to include additional methods like MIRAGE. However, we must clarify the following points (taking MIRAGE as example, similar for other GC-for-graph-classification methods):

MIRAGE is designed specifically for graph classification tasks, whereas our benchmark is centered exclusively on Node Classification. The fundamental differences between these tasks necessitate distinct approaches in graph condensation methods.
MIRAGE does not support the direct generation of graphs with the specific ratios required for our node classification benchmark. This limitation makes it unsuitable for seamless integration into our current evaluation framework.
Although MIRAGE authors have suggested that transitioning to node-level tasks is straightforward, they have not provided the necessary code or guidelines to facilitate this adaptation. Implementing MIRAGE for node classification would require significant modifications and non-trivial efforts to ensure it performs effectively within our benchmark.

Given these considerations, incorporating graph classification methods like MIRAGE into our benchmark is currently out of scope.

Q4: Better to categorize by gradient matching, trajectory matching, and distribution matching

We believe there may be some misunderstandings, as we had already categorized the existing methods based on their matching schemes in our initial submission:

In Section 2.1 of our paper, we have explicitly categorized graph condensation (GC) methods based on gradient matching, trajectory matching, and distribution matching. This aligns directly with your recommendation, ensuring that our categorization is both clear and comprehensive.
Our observations, such as Observations 1, 6, and 8 in Section 4.2, are derived from these primary matching strategies. For instance, Observation 1 highlights the critical role of trajectory matching (TM) methods in enhancing Neural Architecture Search (NAS) effectiveness and transferability.
We kindly note that structure-free and structure-based approaches represent a subcategorization rather than the main categorization, as discussed in Sections 4.2. This sub-categorization is based on clear trends related to graph structure preservation, providing additional insights without deviating from the primary matching strategy framework.

We sincerely thank the reviewer for their valuable advice. We will incorporate the above discussions into our paper to enhance its clarity.

References:

Ding, Mucong, et al. "Spectral Greedy Coresets for Graph Neural Networks." arXiv preprint arXiv:2405.17404 (2024).
Cao, Linfeng, et al. "Graph-Skeleton: 1% Nodes are Sufficient to Represent Billion-Scale Graph." In WWW 2024.
Jin, Wei, et al. "Graph condensation for graph neural networks." In ICLR 2021.
Liu, Yilun, Ruihong Qiu, and Zi Huang. "GCondenser: Benchmarking Graph Condensation." arXiv preprint arXiv:2405.14246 (2024).
Liu, Yilun, Ruihong Qiu, and Zi Huang. "Cat: Balanced continual graph learning with graph condensation." In ICDM 2023.
Hashemi, Mohammad, et al. "A comprehensive survey on graph reduction: Sparsification, coarsening, and condensation." In IJCAI 2024.
Shichang Zhang, Atefeh Sohrabizadeh, Cheng Wan, Zijie Huang, Ziniu Hu, Yewen Wang, Jason Cong, Yizhou Sun, et al. A survey on graph neural network acceleration: Algorithms, systems, and customized hardware. arXiv preprint arXiv:2306.14052, 2023.
Jian Gao and Jianshe Wu. Multiple sparse graphs condensation. Knowledge-Based Systems, 278:110904, 2023.
Beining Yang, Kai Wang, Qingyun Sun, Cheng Ji, Xingcheng Fu, Hao Tang, Yang You, and Jianxin Li. Does graph distillation see like vision dataset counterpart? In NeurIPS, 2024.
Xin Zheng, Miao Zhang, Chunyang Chen, Quoc Viet Hung Nguyen, Xingquan Zhu, and Shirui Pan. Structure-free graph condensation: From large-scale graphs to condensed graph-free data. In NeurIPS, 2024.
Lin Wang, Wenqi Fan, Jiatong Li, Yao Ma, and Qing Li. Fast graph condensation with structure-based neural tangent kernel. arXiv preprint arXiv:2310.11046, 2023.
Sariel Har-Peled and Akash Kushal. Smaller coresets for k-median and k-means clustering. In Proceedings of the twenty-first annual symposium on Computational geometry, pages 126–134, 2005.
Linfeng Cao, Haoran Deng, Chunping Wang, Lei Chen, and Yang Yang. Graph-skeleton:˜ 1% nodes are sufficient to represent billion-scale graph. arXiv preprint arXiv:2402.09565, 2024.

评论- Reply to Reviewer's Comment

2024-11-21

Thank you for acknowledging that our work is "excellent" and our rebuttal has resolved "most of your doubts". We appreciate your feedback and would like to further highlight the importance of our study. As you noted, our work includes "a more thorough node-level evaluation (e.g., privacy, etc)", we would still argue that our work presents very different contributions to the field of graph condensation. While GC-Bench is undoubtedly a valuable contribution, we believe our work complements it, and both are critical to advancing this field, particularly as they are concurrent efforts.

GC4NC: Our focus is on a comprehensive evaluation of node-level methods to examine how node-level information is preserved in graph condensation and how methods can benefit different scenarios (e.g., privacy, robustness, and NAS). These aspects are not covered in GC-Bench.
- Graph condensation is often met with skepticism, with some questioning its practicality: "If one of the main motivations for graph condensation is to accelerate GNN training, what is the point if the condensation process itself is computationally expensive, often requiring the training of a GNN on the full dataset?" We believe our work plays a crucial role in answering these questions and demonstrating the value and potential of graph condensation beyond speeding up GNNs.
GC-Bench: It focuses on a comprehensive evaluation of GC methods on both graph classification and node classification tasks. It provides valuable insights into which methods deliver the best condensation performance, while not exploring practical applications of GC methods, such as NAS and other use cases.

We sincerely believe that both works are highly interesting and valuable, offering complementary perspectives for evaluating GC methods. We would greatly appreciate your further feedback and insights.

审稿意见

评分: 5置信度: 22024-11-02

This paper establishes a comprehensive benchmark framework for evaluating GC methods across multiple dimensions including performance, privacy preservation, denoising ability, scalability, and NAS effectiveness. The authors present a novel evaluation scheme that integrates a variety of metrics to assess the efficacy and robustness of GC methods under different operational scenarios.

优点

The framework, GC4NC, provides a standardized protocol that allows for the fair comparison of various GC methods on node classification across multiple critical dimensions such as performance, scalability, transferability. This systematic approach to evaluation is both comprehensive.
By being the first to systematically benchmark privacy preservation and denoising capabilities across various GC methods, the paper provides valuable insights that could lead to the development of more robust and privacy-aware graph processing techniques.
The manuscript is well-written with a logical flow that systematically presents the research questions, methodology, experimental setup, and findings.

缺点

The authors have adopted a fair evaluation protocol that includes training a 2-layer GCN with 256 hidden units on the reduced graph. While this setup might seem standardized, it could fail to represent the true predictive performance of the GCN model, as different hyperparameters such as the number of layers, dropout, and hidden units can significantly impact the node classification results on different datasets.

In light of this, it is recommended that the authors consider adjusting the hyperparameters based on more generalized settings, as detailed in Table A3 from [1], to ensure that the evaluation of the GC methods can be reliably generalized across different graph types and setups.

[1] GC-Bench: An Open and Unified Benchmark for Graph Condensation.
The paper addresses privacy preservation as one of the evaluation metrics for GC methods. However, the methods utilized to evaluate privacy do not seem to cover a broad spectrum of attack scenarios. This limitation might restrict the applicability of the results to real-world conditions, where adversarial attacks can be far more sophisticated and complex.

问题

Can the authors provide additional insights or results on how varying these parameters (such as the number of layers, dropout rates, and hidden units) might affect the outcomes of GCN models on both the whole and the condensed graphs?

评论- Authors' Rebuttal 2

2024-11-20

Q3: The paper addresses privacy preservation as one of the evaluation metrics for GC methods. However, the methods utilized to evaluate privacy do not seem to cover a broad spectrum of attack scenarios.

A3: Thank you for mentioning the absence of diverse privacy attack scenarios in our submission. We focused on a fundamental privacy attack, confidence-based membership inference attack (MIA), for the following reasons:

We are not merely benchmarking the privacy-preserving properties of existing GC methods but are also broadening the scope of GC research to encompass critical areas such as privacy and robustness. This expansion aims to demonstrate the potential of GC methods, inspiring more researchers to recognize their promise and contribute to this emerging field. As existing applications of GC predominantly target Neural Architecture Search (NAS) [2,7] and continual learning [8], we aim to shift the conversation by highlighting their broader applicability.
To the best of our knowledge, no prior work has empirically validated the privacy-preserving claims associated with GC. By targeting one of the most fundamental and well-studied privacy attacks, MIA, our work provides essential, empirical evidence for assessing and understanding the privacy capabilities of GC. This serves as a preliminary yet foundational step toward establishing a systematic and rigorous framework for evaluating the privacy guarantees of GC methods.
We appreciate your suggestions regarding additional privacy attacks. However, we have chosen to omit some others for the following reasons:

a. Model Inversion Attack (MIvA) [9]:

MIvA aims to reconstruct the original graph and assess attack performance via link prediction tasks. In the context of GC, the condensation process significantly reduces the number of nodes and reindexes all synthetic nodes. This reduction diminishes the granularity necessary for accurate link reconstruction, making it difficult for an attacker to determine specific node connections. Additionally, reindexing disrupts any direct correspondence between condensed and original nodes, further obfuscating the true link structure.

Instead, we evaluate graph properties in Section 4.8, demonstrating that condensation alters most graph properties. This suggests that the privacy of graph properties is maintained through the condensed graph.

b. Attribute Inversion Attack (AIA) [10]: AIA typically requires datasets with sensitive attributes, which diverges from the standard datasets in mainstream GNN studies [10,11]. As a benchmark requiring unifying all baseline methods and datasets, incorporating AIA would thus fall outside the scope of our current work.

We have newly added the discussions about privacy attacks in Appendix A.5. We believe that our focused approach provides an essential first step toward understanding the privacy implications of GC methods. We plan to explore additional attack scenarios in future work to further validate and extend our findings. If the reviewer would like to see more experimental results on these scenarios at this stage, we would be happy to provide them.

References:

Wu, Felix, et al. "Simplifying graph convolutional networks." In ICML 2019.
Jin, Wei, et al. "Graph condensation for graph neural networks." In ICLR 2021.
Jin, Wei, et al. "Condensing graphs via one-step gradient matching." In SIGKDD 2022.
Yang, Beining, et al. "Does graph distillation see like vision dataset counterpart?." In NeurIPS 2024.
Liu, Yilun, Ruihong Qiu, and Zi Huang. "GCondenser: Benchmarking Graph Condensation." arXiv preprint arXiv:2405.14246 (2024).
Sun, Qingyun, et al. "GC-Bench: An Open and Unified Benchmark for Graph Condensation." arXiv preprint arXiv:2407.00615 (2024).
Ding, Mucong, et al. "Faster hyperparameter search on graphs via calibrated dataset condensation." NeurIPS 2022 Workshop: New Frontiers in Graph Learning. 2022.
Liu, Yilun, Ruihong Qiu, and Zi Huang. "Cat: Balanced continual graph learning with graph condensation." In ICDM 2023
Zhang, Yi, et al. "A survey on privacy in graph neural networks: Attacks, preservation, and applications." TKDE 2024.
Zhang, Zaixi, et al. "Model inversion attacks against graph neural networks." TKDE 2022.
Gong, Neil Zhenqiang, and Bin Liu. "Attribute inference attacks in online social networks." ACM Transactions on Privacy and Security (TOPS) 21.1 (2018): 1-30.

2024-11-23

Thank you for your response. It has addressed some of my concerns regarding hyperparameters. However, after carefully reviewing the other reviewers' comments, I believe the main issue lies in distinguishing this work from GC-Bench. As the authors mentioned, GC4NC explores how GC methods can benefit different scenarios (e.g., privacy, robustness, and NAS). The problem is that most of these specific applications, such as privacy and NAS, have only been validated on a limited number of datasets, such as Cora and Citeseer. This raises the question of whether these applications are representative of real-world scenarios. If not, I would argue that this work is more of an extension of node classification tasks rather than a thorough investigation into these aspects.

评论- Authors' Rebuttal 1

2024-11-20

We first thank this insightful review. Next, we provide details to clarify your concerns.

Q1 : While this setup (training a 2-layer GCN with 256 hidden units on the reduced graph) might seem standardized, different hyperparameters can significantly impact the node classification results on different datasets. The authors should consider adjusting the hyperparameters based on more generalized settings, as detailed in Table A3 from GC-Bench.

A1: Thank you for your kind suggestions. We acknowledge that hyperparameter search is crucial for a comprehensive evaluation. Before we perform additional experiments on more hyperparameters (details in A2), we would like to first clarify a few things in our current setup:

In Table 1, we adopted a fixed architecture (2 layers, 256 hidden units) 1 but tuned hyperparameters in Figure 4, Section 4.6. We chose not to vary the number of GCN layers in Table 1 because a 2-layer GCN has been empirically validated as effective across widely used datasets by existing GC works [2,3,4]. This setup of a fixed architecture is also adopted by concurrent benchmark studies [5,6].
We dedicated Section 4.6 (Transferability) to present the results after conducting extensive hyperparameter searches for GCN and other models (including the number of layers and hidden units). This section includes comparisons of the GCN model's performance across different condensation methods, as illustrated in Figure 4.
In Section 4.5 (NAS), we explored a diverse set of architectures by varying the number of layers and hidden units. This section evaluates the correlation of hyperparameter sensitivities between condensed graphs and the full graphs. The high correlation scores demonstrate that the hyperparameter sensitivities on the condensed graphs closely mirror those on the full graphs, indicating that the condensed graphs maintain similar responsiveness to hyperparameter changes as the original graphs.
We would also like to kindly point out that although Table A3 in GC-Bench includes the setup for both node classification and graph classification, they fix the GCN architecture to hidden units of 256 and adopt 2 layers for node classification. This is evidenced by their provided GitHub code for configurations and their statement “We adhere to the setting in GCond and use 2 graph convolutional layers for node classification..."

Q2: Additional experiments on hyperparameter search and insights on how varying these parameters (such as the number of layers, dropout rates, and hidden units) might affect the outcomes of GCN models on both the whole and the condensed graphs?

A2: In response to your suggestions in Q1, we have added additional experimental results to include more hyperparameter combinations for Table 1. The revised sections are highlighted in the updated version of our manuscript.

The new hyperparameter search space is detailed in Table 11. This includes varying the number of layers (2, 3, 4) across more datasets.
We have added a new table (Table 10 in appendix) to illustrate the performance of GCN after hyperparameter tuning. We compare the winning times differences before and after tuning.
We have included a sensitivity analysis on both the whole and the condensed graphs in Appendix A.6.2, supported by Figures 9 and 13, addressing your first question. We provide additional insights below:

These figures show that condensed and whole graphs exhibit similar sensitivity patterns across the Cora, Ogbn-arxiv, and Reddit datasets, suggesting a consistent response to hyperparameter tuning.
- Both condensed and whole graphs show low sensitivity to dropout and weight decay, with minimal variations in accuracy, indicating these hyperparameters have a limited impact on performance.
- The hidden layer size positively influences accuracy in both condensed and whole graphs, with larger sizes generally improving performance, highlighting the importance of hidden layer capacity in model effectiveness.
- Learning rate sensitivity is also comparable between condensed and whole graphs; a higher learning rate (0.01) tends to perform better in both cases, though with slight dataset-specific variation (i.e. whole graph of Ogbn-arxiv).
- Notably, the number of layers impacts both graph types similarly, as accuracy consistently declines with an increase in layers, suggesting that deeper architectures do not benefit either condensed or whole graphs in three datasets.
Thus, condensed and whole graphs have parallel sensitivity trends, where optimizing hidden layer size and learning rate while managing network depth is likely to enhance performance across both representations.

We have added the above insights in Appendix A.6.2 Hyperparameters Sensitivity Analysis.

审稿意见

评分: 5置信度: 32024-11-02

This paper has introduced a framework, GC4NC, for testing graph condensation (GC) methods on node classification from various aspects. The insights they reported include that 1) trajectory matching methods generally obtain better results, 2) some methods can preserve privacy while keeping condensation ratio, 3) GC methods demonstrate their denoising ability against structural noise, and so on. The authors have shared their anonymous code in which the results in the paper are reproducible.

优点

The categorizations of existing GC methods are well summarized. Also, an evaluation protocol and design choices are well-structured, e.g., structure-free vs. structure-based and graph property preservation.
This paper provides a framework in which researchers and developers can test various GC methods in a unified setting. On the framework, the authors conducted comprehensive experiments with 15+ methods on 7 datasets.
The authors discuss 11 observations in total through the experiments from various aspects. While each observation is relatively shallow, the observations are helpful for the community.

缺点

While the paper provides experimental results from various aspects, some descriptions are insufficiently explained and insights are shallow. For example,
1. In lines 288-291, what factors of the datasets affect the performance? Why does Averaging achieve the best performance on Yelp? Why do Arxiv and Reddit have significant room for improvement?
2. In lines 350 and 351, the paper describes “GC methods can still perform well when the Instance Per Class (IPC) is as low as 1” but I found the significant drops of most methods at the leftmost plots in Figure 3.
It is hard to see the legends and characters in all figures.
1. Figure 1 requires an effort to understand the correspondence between plots and methods.
2. In Figure 2, the legend overlaps the bar plots.
3. In Figures 3, 4, and 5, characters are very small.

问题

Can you clarify unclear points I raised in the Weakness section?

评论- Authors' Rebuttal 1

2024-11-20

Q1: More analysis on experimental results

Thank you for your detailed questions. We agree that a deeper analysis of the datasets is essential. Here are our responses:

Factors Affecting Performance in Arxiv and Reddit:
- Imbalanced label distributions negatively impact performance.
- Arxiv and Reddit datasets have a larger number of classes and exhibit significant class imbalance compared to others. Consistent with most GC works, our implementation ensures at least one instance per class, guaranteeing representation for each class. However, this approach can cause distribution shifts. In contrast, datasets like Cora, Citeseer, and Pubmed have balanced training sets, leading to more stable performance. This observation highlights the need for improved initialization methods in the GC field to effectively handle datasets with numerous and imbalanced classes.
Why Averaging Achieves the Best Performance on Yelp:

Yelp is an anomaly detection dataset evaluated using F1-macro. Averaging methods only use the average representations of the normal instances and anomalies. This will always create a simple decision boundary. However, for GC methods, the unbalanced class initialization may create an overfitted boundary for the anomalies.

We have newly added the above analysis in Appendix A.4.2.

Q2: Figures clarification.

We appreciate the reviewer’s feedback regarding the clarity of the legends and characters in the figures. We will enlarge the legends and characters to make them easier to read. Additionally, we will address specific issues with individual figures as follows:

Figure 1: Correspondence between plots and methods

To improve the correspondence between plots and methods, we will reposition the method names closer to the marker on each dashed curve. Below is a detailed explanation of Figure 1, which we plan to include in Appendix A.4.1 to assist readers in understanding the figure:

This figure compares test accuracy (y-axis) and total time (x-axis) for various graph condensation methods applied to the Arxiv dataset. The methods are distinguished by different marker shapes and colors: blue stars represent structure-free methods, red circles represent structure-based methods, and green triangles represent distribution-based methods. The size of each marker indicates the reduction rate, with smaller markers representing a reduction rate of 0.05%, medium markers 0.25%, and larger markers 0.50%. Dashed lines connect markers corresponding to the same method across different reduction rates, illustrating the method's behavior under varying levels of graph condensation. To enhance clarity, the name of each method will be positioned near the marker for its respective curve, ensuring easy identification of methods and their corresponding performance trends.
Figure 2: Overlapping legend and bar plots

For Figure 2, we will reposition the legend to the blank area near the middle of the plot to prevent it from overlapping with the bar plots.
Figures 3, 4, and 5: Small characters

We will enlarge the characters in Figures 3, 4, and 5 to improve readability.

These adjustments will ensure that all figures are clear and accessible to readers.

Thank you again for your valuable suggestions! We have uploaded a revised version and highlighted the differences per your suggestions!

2024-11-22

I appreciate the detailed response that clarifies some of unclear points in the paper (W1.2 is not resolved in the author response). However, I did not find a significant misunderstanding in my initial review, thus my rating remains.

2024-11-22

We sincerely apologize for overlooking this important comment and appreciate the opportunity to clarify our statement that "GC methods can still perform well when the Instance Per Class (IPC) is as low as 1."

First, please note that not all lines in Figure 3 represent GC methods; the orange, green, and blue lines correspond to three baseline coreset selection methods. This distinction is crucial for accurately interpreting the performance drops observed. Relative Performance Comparison:

Second, this claim is under the Oberservation 3 such that it is based on a comparison with dataset condensation methods in the image domain, where "perform well" signifies relatively better performance.

Arxiv (Figure 3 left): For GC methods, the accuracy drop when IPC=1 is approximately 20%, except for GCond.
Reddit (Figure 3 right): For GC methods, the accuracy drop is around 10%, except for MSGC.
In contrast, recent findings from DC-BENCH [1] (Table 1) indicate that dataset condensation methods in the image domain experience significant performance drops of 30% to 60% when IPC is reduced to 1. This comparison demonstrates that GC methods outperform dataset condensation methods in scenarios with extremely low IPC.

We will revise our statement to: "GC methods outperform dataset condensation methods in the image domain when the Instance Per Class (IPC) is as low as 1," and include a more detailed analysis to support this comparison. We apologize for any confusion caused and thank the reviewer for highlighting this important aspect.

References:

Cui, Justin, et al. "DC-BENCH: Dataset Condensation Benchmark." In NeurIPS 2022.

审稿意见

评分: 5置信度: 32024-11-02

In this paper, the authors propose a benchmark for graph condensation, i.e., compressing a large graph into a much smaller graph while preserving valuable information. Specifically, the proposed benchmark covers many aspects, such as unified evaluation protocols, efficiency, privacy preservation, denoising abilities, transferability, application to neural architecture search (NAS), etc. Many graph condensation methods are compared, and some insights for graph condensation are also observed.

优点

The proposed benchmark compares graph condensation methods from many aspects besides predictive accuracy.
The paper is well-written and easy to follow in general.
Source codes are available to evaluate the effectiveness of the benchmark.

缺点

My major concern is how significant the proposed benchmark contributes to the graph machine learning community. Specifically, as explained in the paper, there are some earlier or concurrent works for graph benchmarking. As shown in Table 6 in the Appendix, these benchmarks seem to have their own advantages, e.g., this paper claims it has compared more methods, GC-Bench has considerably more datasets and broad tasks, and GCondenser has features such as continual learning and impact of validators, etc. The paper claims their unique advantages include comparing robustness and privacy preservation. However, the paper does not provide sufficient explanations for why these two factors are most important or challenging for graph condensation.
As a benchmark, I would also encourage the authors to provide more usage information, e.g., if users would like to add a new dataset, implement their own method and compare with baselines, propose a new setting, etc., how they could use the proposed method, and what are some potential difficulties, etc.
As I am not particularly familiar with graph condensation, it would also be valuable if the authors could discuss how this benchmark and the discovered insights could benefit general graph machine learning researchers.

问题

See Weaknesses above

评论- Authors' Rebuttal 4

2024-11-20

References:

Liu, Yilun, Ruihong Qiu, and Zi Huang. "GCondenser: Benchmarking Graph Condensation." arXiv preprint arXiv:2405.14246 (2024).
Liu, Yilun, Ruihong Qiu, and Zi Huang. "Cat: Balanced continual graph learning with graph condensation." In ICDM 2023.
Gao, Xinyi, et al. "Graph condensation: A survey." arXiv preprint arXiv:2401.11720 (2024).
Hashemi, Mohammad, et al. "A comprehensive survey on graph reduction: Sparsification, coarsening, and condensation." In IJCAI 2024.
Ding, Mucong, et al. "Spectral Greedy Coresets for Graph Neural Networks." arXiv preprint arXiv:2405.17404 (2024).
Cao, Linfeng, et al. "Graph-Skeleton:~ 1% Nodes are Sufficient to Represent Billion-Scale Graph." In WWW 2024.
Ding, Mucong, et al. "Faster hyperparameter search on graphs via calibrated dataset condensation." NeurIPS 2022 Workshop: New Frontiers in Graph Learning. 2022.
Jin, Wei, et al. "Graph condensation for graph neural networks." In ICLR 2021.
Ma, Yao, et al. "Is Homophily a Necessity for Graph Neural Networks?." In ICLR 2021.
Fang, Junfeng, et al. "Exgc: Bridging efficiency and explainability in graph condensation." Proceedings of the ACM on Web Conference 2024. 2024.
Duddu, Vasisht, Antoine Boutet, and Virat Shejwalkar. "Quantifying privacy leakage in graph embedding." MobiQuitous 2020-17th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services. 2020.

评论- Authors' Rebuttal 3

2024-11-20

Q3: More discussion on impacts. Discuss how this benchmark and the discovered insights could benefit general graph machine learning researchers.

We appreciate the importance of discussing the impact of our benchmark. Our benchmark and its insights offer significant benefits to the broader graph machine learning community in the following areas:

Current Position of GC in Graph Machine Learning:
- GC originated in the computer vision domain but has been adapted to address the unique challenges of graph data. It incorporates techniques from graph sampling and coarsening to effectively manage the complexities inherent to graph modalities while to extract essential information.
- From the view of representation learning, GC aims to create a compact representation of the original graph, preserving essential features for training well-generalized GNNs.
- GC is gaining traction due to its advantages in accelerating training, enhancing scalability, and improving visualization, making it a valuable tool for various graph-based applications such as NAS [7], continual learning[2] and explainability [10].
Contributions of This Benchmark Paper:
- Addressing Key Questions:
  - When and Why Specific GC Methods Work: Our benchmark systematically evaluates different GC methods, elucidating the conditions under which each method excels. This helps researchers and users understand the strengths and limitations of various condensation techniques.
  - Broader Applications of GC: We demonstrate the versatility of GC beyond traditional applications like NAS and continual learning. Our benchmark highlights its potential in areas such as privacy preservation and efficient data management.
  - Key Observations and Novel Insights: Based on our well-established benchmark, we have made several new observations and provided fresh insights in the field of GC. For instance, GC methods exhibit significant denoising capabilities against structural noise but are less effective at mitigating node feature noise. Additionally, trajectory matching and gradient-based inner optimization are crucial for achieving reliable performance in NAS and enhancing transferability. These findings highlight both the strengths and limitations of current GC techniques.
Facilitating General Graph Machine Learning Research:
1. Our benchmark provides a pioneering investigation into the practical effectiveness of GC methods in privacy preservation and their denoising effects (robustness). This highlights the potential of GC methods to serve as a novel set of baselines for comparison with existing privacy defense and robustness techniques. Furthermore, as graph condensation inherently involves modifying datasets, i.e., a data-centric approach, it can be seamlessly integrated with model-centric efforts to deliver complementary benefits in robustness and privacy preservation.
2. Observation 4: Certain GC methods can achieve both privacy preservation and high condensation performance. This dual capability suggests the potential to break the traditional trade-off between privacy and utility in the trustworthy graph learning area by effectively synthesizing data.
3. Observation 7: We observe that different GC methods exhibit varying degrees of transferability across datasets, indicating natural differences among GNNs including Graph Transformer. This inspires a rethinking of the similarities between current GNN models, particularly regarding the perspectives and priors they prefer to extract.
4. Observation 11: We observed that homophilous graphs often become heterophilous after condensation while still maintaining high performance. This unexpected outcome challenges the conventional understanding of the relationship between GNN performance and homophily [9]. Our findings suggest that the dependency of GNNs on homophily may need to be reevaluated, opening new avenues for research into how graph condensation affects structural properties and model performance.

Overall, our benchmark serves as a valuable resource for graph machine learning researchers by providing comprehensive evaluations, uncovering new applications of GC, and inspiring innovative methodologies. This facilitates advancements in the field, enabling the creation of more effective and adaptable graph learning models. We have added above discussion on benefits for general graph machine learning researchers in Appendix A.11.

In light of these responses, we hope we have addressed your concerns, and hope you will consider increasing your score. If we have left any notable points of concern unaddressed, please do share and we will attend to these points.

评论- Response

2024-11-21

I appreciate the authors' efforts in the deatiled rebuttal, which enhances the clarify of the paper. However, after briefly going through other reviewer comments, I share the concern that the significance of the paper is somewhat boarder line. All things considered, I'd like to maintain my score.

评论- Authors' Rebuttal 2

2024-11-20

Q2: More technical details, e.g., how the users add a new dataset, implement own methods and propose a new setting. Also, how they use proposed method and some potential difficulties.

We appreciate the reviewer's suggestion to provide more usage information for our benchmark. In response, we have developed an easy-to-use code package, which is included in the supplementary material and has been open-sourced as a PyTorch library. The package accepts graphs in the PyG (PyTorch Geometric) format as input and outputs a reduced graph that preserves the properties or performance of the original graph. Below, we provide technical details on how users can integrate new datasets, implement their own methods, propose new settings, and address potential difficulties. We are also preparing a tutorial to assist users in utilizing our code effectively.

Usage:

from graphslim.dataset import *
from graphslim.evaluation import *
from graphslim.condensation import GCond
from graphslim.config import cli

args = cli(standalone_mode=False)
# Customize arguments here
args.reduction_rate = 0.5
args.device = 'cuda:0'
# Add more args.<main_args/dataset_args> as needed

graph = get_dataset('cora', args=args)
# To reproduce the benchmark, use our args and graph class
# To use your own args and graph format, ensure the args and graph class have the required attributes

# Create an agent for the reduction algorithm
# Add more args.<agent_args> as needed
agent = GCond(setting='trans', data=graph, args=args)

# Reduce the graph
reduced_graph = agent.reduce(graph, verbose=True)

# Create an evaluator
# Add more args.<evaluator_args> as needed
evaluator = Evaluator(args)

# Evaluate the reduced graph on a GNN model
res_mean, res_std = evaluator.evaluate(reduced_graph, model_type='GCN')

Parameters are categorized as follows:

<main_args>: dataset, method, setting, reduction_rate, seed, aggpreprocess, eval_whole, run_reduction
<attack_args>: attack, ptb_r
<dataset_args>: pre_norm, save_path, split, threshold
<agent_args>: init, eval_interval, eval_epochs, eval_model, condense_model, epochs, lr, weight_decay, outer_loop, inner_loop, nlayers, method, activation, dropout, ntrans, with_bn, no_buff, batch_adj, alpha, mx_size, dis_metric, lr_adj, lr_feat
<evaluator_args>: final_eval_model, eval_epochs, lr, weight_decay

Customization:

Adding a New Dataset: To implement a new dataset, create a new class in dataset/loader.py and inherit from the TransAndInd class.
Implementing a New Reduction Algorithm: To add a new reduction algorithm, create a new class in sparsification, coarsening, or condensation, and inherit from the Base class.
Adding a New Evaluation Metric: To implement a new evaluation metric, create a new function in evaluation/eval_agent.py.
Implementing a New GNN Model: To add a new GNN model, create a new class in models and inherit from the Base class.

Potential Difficulties

Users may encounter the following challenges:

Disk Space Limitations:
- Some methods store training trajectories of multiple experts, which can exceed 100 GB.
- Solution: Reduce the number of experts using the <method_name>.reduce() module to manage disk space.
Memory and GPU Constraints:
- Larger datasets might cause memory or GPU limitations during the condensation process.
- Solution: Load data and adjust the reduction process to run in a mini-batch manner to reduce memory usage.
Hyperparameter Adjustment:
- Tuning hyperparameters may be necessary for optimal performance.
- Solution: Modify the JSON configuration files in the configs folder, which contain all hyperparameters for each method.

We believe this information will help users effectively utilize, customize, and integrate our benchmark package with new datasets or algorithms. We provide comprehensive documentation and support for easy adoption and extension. Per your suggestion, key documentation highlights will be included in the paper (Appendix A.10).

评论- Authors' Rebuttal 1

2024-11-20

Q1: Significance of this work and more explanations for why robustness and privacy preservation factors are most important or challenging.

Thank you for your valuable suggestions. We have carefully considered your feedback and would like to provide further clarification on the design of our benchmark:

Validating New Applications:
1. Existing applications of GC, including concurrent benchmark studies and GCondenser [1], primarily focus on continual learning [1, 2] and neural architecture search. In contrast, we are not merely benchmarking existing GC methods but are also broadening the scope of graph condensation research to encompass critical areas such as privacy and robustness. This expansion aims to demonstrate the potential of graph condensation, inspiring more researchers to recognize its promise and contribute to this emerging field. Our goal is to encourage the graph research community to not only explore the diverse possibilities of GC but also to pay attention to privacy and robustness considerations when developing new GC methods.
2. While many studies mentioned that denoising ability (robustness) or privacy preservation are inherent properties of GC [3, 4], to the best of our knowledge, no prior work has empirically or theoretically validated these claims. Therefore, we aim to bridge this gap by providing an at least empirical investigation into these aspects.
  - We investigated robustness by injecting feature noise, structural noise, and adversarial structural noise into the original graphs before applying GC techniques. Our findings reveal that GC effectively mitigates structural noise, particularly with structure-based approaches, but is less effective against node feature noise, as highlighted in Observation 5.
  - We evaluated privacy preservation capability using MIA [11], which measure an adversary's ability to determine whether a node belongs to the training or test set. Our results demonstrate that condensed graphs significantly reduce privacy risks by mitigating the exposure of membership information during training.
3. These investigations on the privacy preservation and denoising ability of GC methods can also benefit the general graph machine learning field. For further details, please refer to our responses in Q3.
Other Uniqueness: In addition to privacy and denoising capabilities, our benchmark includes unique discussions on several other critical aspects:
- Comparison to Coreset and Coarsening Baselines: Recent developments in coreset selection [5] and coarsening methods [6] have shown high potential in preserving GNN performance while being more efficient compared to GC methods. These methods are indispensable baselines for comparing GC techniques. Moreover, they can serve as data initialization strategies for GC, as explored in Section 4.7. Studying GC in isolation without considering other graph reduction methods would provide a limited perspective.
- Disk Space Consumption: Evaluating disk space consumption provides a comprehensive view of how trajectory-based methods rely on substantial disk space. This is particularly critical because methods like GEOM and SFGC, which demonstrate the best performance, also tend to consume more disk space. Understanding this trade-off is essential for practical applications where storage resources may be limited.
- NAS: NAS is one of the most promising applications of graph condensation [7,8]. NAS involves identifying the most effective architecture from a vast pool of potential models, a process characterized by intensive computational demands. Graph condensation methods are utilized to accelerate NAS by reducing the size of the graph, thereby decreasing the computational load [7]. In practical scenarios, preserving the ranking of validation results between models trained on the condensed graph and the full graph is crucial for selecting the best architectures. We argue that all graph condensation methods should be evaluated on the NAS task to effectively assess their practical value. We expect to observe a reliable correlation in validation performance between training on the condensed graph and the full graph.
- Graph Properties Preservation: Assessing whether GC methods preserve specific graph properties is essential. This evaluation allows future work to design methods that explicitly preserve these properties. If GC methods do not preserve certain properties, it indicates that they can maintain property privacy, which is crucial for data releasers who do not wish to disclose statistical properties of their graph datasets.

We believe that these comprehensive evaluations provide a robust foundation for understanding the multifaceted applications and implications of graph condensation methods. By addressing these various aspects, our benchmark offers a thorough assessment that will benefit the graph research community.

评论- Further Clarification

2024-11-22

Thank you so much for your prompt reply. We would like to highlight an important point regarding graph condensation, which is often questioned in terms of its practicality. Some critics ask: "If the main motivation for graph condensation is to accelerate GNN training, what is the point if the condensation process itself is computationally expensive, often requiring the training of a GNN on the full dataset?" We believe that our work provides effective answers to these questions, demonstrating the broader potential of graph condensation and offering additional motivations for researchers to continue developing this field.

We understand that the primary concerns arise from comparisons between our work and GC-Bench. It is important to note that GC-Bench focuses on evaluating GC methods across different graph-related tasks (node-level and graph-level) to analyze the condensation performance. However, it does not consider critical applications of graph condensation, such as NAS, robustness, and privacy. Reviewer LcwZ20 also acknowledges that our work is “excellent” with improvements like “a more thorough node-level evaluation (e.g., privacy, etc.)”. This distinction highlights that our works are complementary, each making unique and valuable contributions to the field.

We appreciate your thoughtful consideration of these points and look forward to your feedback.

2024-11-21

Dear Reviewers,

The authors have posted a rebuttal to the concerns raised. I would request you to kindly go through their responses and discuss how/if this changes your opinion of the work. I thank those who have already initiated the discussion.

best,

Area Chair

评论- Request for Reconsideration: Clarification Regarding Concurrent Work

2024-11-23

Dear Reviewers and Area Chair,

Thank you for your thoughtful reviews and for considering our rebuttal. We see that the main concern revolves around the comparisons between our work and GC-Bench [1]. However, we would like to respectfully request a reconsideration of this concern in light of the ICLR policy.

GC-Bench, while an important effort, was formally published in a peer-reviewed venue on September 26, 2024. According to the conference guidelines, authors are not required to compare their work to contemporaneous papers published on or after July 1, 2024. This policy is designed to account for the challenges of addressing very recent work during the submission process, and we kindly ask that this be taken into account when evaluating our contributions.

We also wish to emphasize that our work and GC-Bench were developed independently and concurrently. We believe our study provides unique and valuable insights that stand on their own merits, and we hope this clarification ensures a fair assessment of our work.

Thank you again for your time, understanding, and reconsideration of this matter. We greatly appreciate your efforts and dedication in reviewing our submission.

Best,

GC4NC Authors

[1] GC-Bench: An Open and Unified Benchmark for Graph Condensation. OpenReview Link.

撤稿通知

2024-12-25

I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.