PaperHub
5.8
/10
Poster4 位审稿人
最低5最高6标准差0.4
6
5
6
6
3.5
置信度
正确性2.8
贡献度3.0
表达2.8
NeurIPS 2024

RAGraph: A General Retrieval-Augmented Graph Learning Framework

OpenReviewPDF
提交: 2024-05-14更新: 2024-12-20

摘要

关键词
Graph Neural NetworksGraph Prompt TuningRetrieval-Augmented Generation

评审与讨论

审稿意见
6

The paper propose a Retrieval-Augmented method to further assist graph in context learning. With the advancement of graph In-context learning and graph prompting, RAG is a natural technique that could be built upon them. The main contribution of the paper is to develop such a RAG pipeline on graph learning scenario by defining the graph database, graph retrieval pipeline, and training and inference of the model.

优点

The idea is novel in the field. And the implementation is clearly stated. Experiments show the effectiveness of the propose method.

缺点

While the idea is novel. It is unclear why it works and how it is in par with RAG in NLP/CV.

Specifically, it not sure how the “generation” in RAG works in the proposed framework. And the author did not explain why the proposed method would work — what pattern the model has learned from the retrieved graph pattern?

Based on my limited knowledge, the key contributions of the paper would be how to construct the query data base and how to retrieve them. The methodology part doesn’t explain through why it’s design like so. Motivation of the adopted construction is missing.

问题

  1. Why do we need Toy Graphs Augmentation Strategy in 4.2
  2. Why noise-based graph prompting is required?
  3. If the model is trained to able to utilize retrieve data/embedding, is it still able to perform zero-short inference?

局限性

None

作者回复

Thanks for your insightful comments. We address your concerns and answer your questions below.

W1: Why RAGraph works on par with RAG in NLP/CV, and how “generation” works. What pattern was learned from retrieved graph pattern?

A1: In NLP, RAG enhances the generation of LLM by retrieving relevant information via prompts. Similarly, in RAGraph, we enhance downstream graph learning by integrating information from retrieved toy graphs. Using these toy graphs with shared patterns assists the model inference. In our framework, the "generation" involves the retrieval-enhanced Graph Prompt: Toy Graph Intra Propagate & Query-Toy-Graph Inter Propagate to propagate retrieved knowledge (X and Y) into query graph. To illustrate, we analyze this from both experiment and theory.

  1. Experiment 1: We perform a case study to illustrate how "generation" works by displaying specific instances of node vectors. Due to word restriction, please refer to the global response A2 and Rebuttal PDF Figure 1.

  2. Experiment 2: In traditional GNN tasks, GCN, GAT, and GIN typically expand their receptive fields through stacked message-passing layers or neighborhood subgraph sampling for inference. Patterns learned in these contexts are often localized within the constrained receptive field. In contrast, in RAGraph, we observe that subgraphs sharing similar patterns often exhibit properties more aligned with downstream tasks. These subgraphs provide richer information for inference compared to simply enlarging receptive fields. As shown in Main Text Tables 1 and 2, Figure 3, RAGraph's strategy of incorporating toy graphs significantly outperforms baselines.

  3. Theory 1: Furthermore, we provide a theoretical justification of retrieval augmentation in GNNs (see Appendix B.4). From an information-theoretic perspective, introducing RAG knowledge into GNNs enhances the mutual information between input features X and output labels Y, such that I(X,RAG;Y)I(X;Y)I(X, RAG; Y) \geq I(X; Y), thereby improving the performance on downstream tasks. This is aligned with the information theory of RAG in NLP [1].

  4. Theory 2: Recent studies [2] [3] also suggest the generalization error diminishes with an increase in the node number of the graph in Theorem 1.1 [4]: the generalization error between the expected loss Rexp(Θ)=E(x,y)μG[L(Θ(x),y)]R_{exp}(\Theta)=\mathbb{E}_{(x,y)\sim \mu_G}[\mathcal{L}(\Theta(x), y)] and

empirical loss Remp(Θ)=1mi=1m[L(Θ(xi),yi)]R_{emp}(\Theta)=\frac{1}{m}\sum_{i=1}^m[\mathcal{L}(\Theta(x^i), y^i)] are supper bounded: Rexp(Θ)Remp(Θ)Cmq(N)|R_{exp}(\Theta)-R_{emp}(\Theta)| \leq \sqrt{\frac{C}{m}q(N)}, where C represents the model complexity (e.g., parameters), m denotes the training set size, and q(N)=ENv[N1D+1]q(N) = \mathbb{E}_{N\sim v} [N^{-\frac{1}{D+1}}] depends on the average graph size N (node num) with v is the graph size distribution and D is the metric-measure space dimension. In RAGraph, retrieving similar toy graphs significantly increases the number of graph nodes (via Query-Toy-Graph Inter Propagate, linking toy graph nodes to query graph), significantly augmenting N while reducing q(N). Consequently, upper bound of generalization error decreases, promoting smoother graph learning convergence and enhancing pattern learning.

[1] Generalization Analysis of Message Passing Neural Networks on Large Random Graphs. In NeurIPS 2022.

[2] Wasserstein Barycenter Matching for Graph Size Generalization of Message Passing Neural Networks. In ICML 2023.

[3] An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation. In ACL 2024


W2: Lack motivation to construct toy database and retrieve.

A2: The motivation behind initially constructing the toy graph database is to identify similar knowledge patterns in graphs (where the toy graph serves as a repository of such patterns, including X and Y). Therefore, establishing such a database is necessary to store potential candidate sets for downstream tasks. However, during the construction of toy graphs, if the toy graph size is too large, it may introduce excessive noise and adversely affect model; conversely, if it is too small, the introduced knowledge may be insufficient. Thus, we employ a method of chunking the resource graph given hop kk. Moreover, to better store long-tail knowledge and simulate real world scenarios, we also introduce inverse importance sampling and leverage augmentation to expand toy graph database.

In retrieval process, the motivation is to comprehensively retrieve knowledge X and Y that can enhance downstream tasks, and we evaluate similarity across four dimensions: time, structure, environment, and semantic relevance in Appendix B.3.

Thank you for your suggestions, and we will incorporate these clarifications into the final version.


Q1: Why do we need Toy Graphs Augmentation.

A3: We have provided an analysis and ablation study to illustrate the importance of augmentations. Due to word restrictions, please refer to global response A1 and Rebuttal PDF Table 1.


Q2: Why noise-based graph prompting is required?

A4: The necessity arises because ensuring the high quality of the graph vector base is challenging, which heavily depends on the quality of external knowledge. In RAGraph, we introduce Noise-based Graph Prompting Tuning (outlined in Section 4.3.3) to address this challenge. Due to word restriction, please refer to global response A3.


Q3: If RAGraph still has the zero-short inference ability.

A5: Your suggestion is very insightful. To assess whether fine-tuned model maintains zero-shot inference capability when tested without knowledge, we conducted experiments comparing PRODIGY and RAGraph:

Results in Rebuttal PDF Table 5 indicate that the performance without knowledge shows a decrease compared to injected knowledge due to the absence of knowledge. However, compared to PRODIGY, RAGraph is more robust and we argue that the trained model still retains its zero-shot knowledge inference capability.

评论

Dear Reviewer v5NH,

We would like to express our gratitude for taking the time to review our paper and provide us with valuable comments. We understand that you have busy schedules and apologize for any inconvenience caused by our urging letter.

With only seven hours left in the discussion period, we hope to receive feedback from the reviewers: did our response address your concerns, and what can we do to further improve our score?

Thank you for your consideration, and we look forward to receiving your feedback soon.

Best regards,

Authors of the paper 8566

审稿意见
5

The paper proposes RAGraph, a framework that enhances GNNs with RAG. RAG allows GNNs to utilize unseen data by retrieving relevant information. Extensive experimental results show the effectiveness of RAGraph.

优点

S1. A general and flexible framework.

S2. Extensive experiments on various tasks (both node, edge, and graph level)

S3. Structured and comprehensive discussion (e.g. details of experiments and comparison with PRODIGY).

缺点

W1. The diversity of datasets needs to be improved. For link prediction, the paper is evaluated on mostly e-commerce data, where knowledge graph tasks should also be considered. For node classification, both homophilic datasets e.g. Cora, Arxiv; and heterophilic datasets should be evaluated.

W2. Lack of large scale experiments, e.g. those OGB node/link/graph datasets.

W3. The model seems complicated with a big hyperparameter search space, e.g. the weights of time, structure, environment, and semantic similarities in Eq. 1.

问题

NA

局限性

NA

作者回复

Thank you for your insightful comments. We address your concerns and answer your questions below.

W1: The diversity of datasets needs to be improved. For link prediction, the paper is evaluated on mostly e-commerce data, where knowledge graph tasks should also be considered. For node classification, both homophilic datasets e.g. Cora, Arxiv; and heterophilic datasets should be evaluated.

A1: Thank you very much for your suggestions.

  • For the link prediction dataset: we have incorporated the link prediction results on the temporal knowledge graph ICEWS [1] with the backbone SPA [5].

  • For the homophilic node classification datasets: we have introduced the OGBN-Arxiv [2] and Cora [3] datasets for the homophilic setting with backbone GCN.

  • For the heterophilic node classification dataset: we have included the OGBN-MAG [4] graph, which contains 244,160,499 nodes and 1,728,364,232 edges with backbone R-GCN.

The experimental results of the RAGraph and baselines in Table 6 of the Rebuttal PDF demonstrate the superiority of the RAGraph, aligning with our analysis and conclusions presented in Section 5 Experiment.

[1] https://www.lockheedmartin.com/en-us/capabilities/research-labs/advanced-technology-labs/icews.html

[2] https://ogb.stanford.edu/docs/nodeprop/#ogbn-arxiv.

[3] https://paperswithcode.com/dataset/cora.

[4] Microsoft Academic Graph: when experts are not enough. Quantitative Science Studies. (2020). https://ogb.stanford.edu/docs/lsc/mag240m/.

[5] Search to Pass Messages for Temporal Knowledge Graph Completion. In ACL 2022.


W2: Lack of large-scale experiments.

A2: Thanks for your suggestions! We have conducted node classification (OGBN-MAG) tasks on large-scale graphs as detailed in A1 to W1 and shown in Table 6 of the Rebuttal PDF.

We will add these experiment results to the final version. In addition, regarding the reproducibility of the experiment, we have also updated the evaluation code for this sub-test in anonymous github.


W3: The model seems complicated with a big hyperparameter search space.

A3: Thank you for raising this question, indeed, the challenge of dealing with a large search space is inherent and formidable in most deep-learning models. In our RAGraph, for those sensitive hyperparameters kk and topKtopK, we have presented in Figure 3. For less sensitive hyperparameters, we observed optimal performance across a broad spectrum of values and employed Bayesian optimization using Optuna (https://github.com/optuna/optuna) to address hyperparameter tuning.

To mitigate the difficulty of manually fine-tuning hyperparameters for each dataset, we adopted a Bayesian optimization technique that concurrently optimizes hyperparameters across multiple datasets on the validation set [6] [7]. This approach automates and streamlines the hyperparameter tuning process across various datasets in RAGraph, eliminating the need for individual dataset-specific fine-tuning, and enhancing both generalization capability and resource utilization.

[6] Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. In NeurIPS.

[7] Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., & Freitas, N. (2016). Taking the human out of the loop: A review of Bayesian optimization. In IEEE Transactions on Power Delivery.

评论

I've carefully read the rebuttal and appreciate the authors' efforts in addressing the weaknesses I identified.

Regarding W1, while I acknowledge the additional experiments, since academic graphs like MAG are known to be homophilic, the experiments still predominantly reflect a homophilic setting. For W2, I appreciate the inclusion of additional experiments, and the results are indeed convincing. However, for W3, the authors did not directly address my concern.

Given that the major concerns are still valid and my scores are already positive. I'm keeping my original score.

评论

Dear Reviewer sivC,

We would like to express our gratitude for taking the time to review our paper and provide us with valuable comments. We understand that you have busy schedules and apologize for any inconvenience caused by our urging letter.

With only 7 hours left in the discussion period, we hope to receive feedback from the reviewers: did our response address your concerns, and what can we do to further improve our score?

Thank you for your consideration, and we look forward to receiving your feedback soon.

Best regards,

Authors of the paper 8566

评论

Dear Reviewer sivC,

Thank you for your careful reading of our rebuttal and for providing constructive feedback. We greatly appreciate the time and effort you have invested in evaluating our manuscript.

For Weakness 3, we apologize for not providing a direct response in our initial rebuttal. We have now prepared a more detailed analysis.

Firstly, the extensive hyperparameter search space in RAGraph is not merely a complexity for its own sake; it also facilitates the integration of more dimensions of information, such as w1w4w_1\sim w_4 balancing the importance of different similarities and γ\gamma weighing the significance of task-specific output vectors versus hidden embeddings. This, in turn, enhances RAGraph's performance across a range of tasks, including node-level, edge-level, and graph-level scenarios. By accommodating a comprehensive search space, our model can effectively capture the intricate relationships between nodes and edges, leading to superior performance across diverse tasks.

Secondly, aside from those sensitive parameters k,topKk, topK which we have already supplemented in sensitivity experiments in Figure 3, and apart from the augmentation number K=50K=50, we also attempt to remove α,λ,γ,w1w4\alpha, \lambda, \gamma, w_1\sim w_4, and conduct additional experiments to assess the impact of these hyperparameters, denoted as "w/o hyperparameters":

MethodsPROTEINS (5-shots)PROTEINS (5-shots)ENZYMES (5-shots)ENZYMES (5-shots)TAOBAOTAOBAOAMAZONAMAZON
Node LevelGraph LevelNode LevelGraph LevelRecallnDCGRecallnDCG
RAGraph/NF40.2755.1648.4125.8322.1321.4018.1105.94
RAGraph/NF + w/o hyperparameters39.8654.8247.3925.5022.0821.3217.9505.88

The results demonstrate that even without these hyperparameters, RAGraph still exhibits strong performance, with only a slight decrease compared to the tuned settings. This robustness underscores the strength of our model’s design, highlighting its ability to maintain high performance across varying conditions. Consequently, these hyperparameters perform well under a broad range of search spaces. This also further validates that when employing optimization algorithms like Bayesian optimization, RAGraph does not require as extensive a search space compared to traditional method like grid search.

Thirdly, the challenge of dealing with a large search space is inherent and formidable in most deep-learning models. In order to expedite the process of finding the most suitable hyperparameters while reducing search space, we utilized Optuna [2], an advanced hyperparameter optimization framework. Optuna employs an efficient sampling algorithm that significantly reduces the size of the search space by intelligently selecting candidate hyperparameters based on their potential impact on model performance. Unlike grid search or random search, Optuna dynamically prunes unpromising trials, focusing on the most promising regions of the search space, thus accelerating the overall training and improving the efficiency of our experiments.

Lastly, to empirically validate the effectiveness of Optuna, we conducted experiments using ACM dataset [1] and set hyperparameter α,λ,γ,w1w4\alpha, \lambda, \gamma, w_1\sim w_4 between range [0,1]. We compared the results of Grid Search and Optuna under the same settings. For Grid Search, with each parameter having a search step size of 2, we explored a total of 2^7 combinations. With the HAN backbone taking nearly 1.2 seconds per epoch and running for 10 epochs, the total time for Grid Search was [27 minutes and 39 seconds]. In contrast, using Optuna with 100 trials under the same settings, the total time was reduced to [4 minutes and 28 seconds]. This empirical analysis demonstrates that Optuna significantly reduces the search space of RAGraph and accelerates the hyperparameter optimization process.

We hope these additional points will further clarify your questions. If you have any further questions, please do not hesitate to contact us. We are committed to improving the manuscript and ensuring that all concerns are adequately addressed. Thank you again for your valuable suggestions and for maintaining a positive assessment of our work.

[1] Wang X, Ji H, Shi C, Wang B, Ye Y, Cui P, Yu PS. Heterogeneous graph attention network. In WWW 2019.

[2] Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A Next-generation Hyperparameter Optimization Framework. In KDD.

Best regards,

Authors of the paper 8566

评论

Dear Reviewer sivC,

Thank you for your careful reading of our rebuttal and for providing constructive feedback. We greatly appreciate the time and effort you have invested in evaluating our manuscript.

Regarding Weakness 1, it appears there may have been a misunderstanding.

We adhere to the definition of a heterophilic graph as outlined in the HAN [1] in Introduction <Heterogeneity of graph>, which describes it as "a heterogeneous graph is a special kind of information network containing either multiple types of entities or multiple types of links". In our rebuttal experiments, the Microsoft Academic Graph (MAG) [2] contains four types of entities: papers, authors, institutions, and fields of study, as well as four types of relationships that connect these entities: an author is "affiliated with" an institution, an author "writes" a paper, a paper "cites" another paper, and a paper "has a topic of" a field of study. Thus, the MAG dataset fulfills the criteria for heterophilic settings.

Additionally, in our experiments, we utilized dynamic bipartite graphs such as TAOBAO, KOUBEI, and AMAZON, which also contain two types of entities and are classified as heterogeneous graphs. Furthermore, we also conducted experiments on a more heterogeneous knowledge graph, ICEWS, as part of our rebuttal.

Moreover, we included additional experiments using datasets with more widely recognized heterogeneous graphs, as employed in HAN [1], specifically IMDB (three types of entities and two types of relationships) and ACM (three types of entities and two types of relationships). Specifically, we strictly follow the experiment configuration of HAN and leverage HAN as backbone model:

MethodsACM Micro F1ACM Macro F1IMDB Micro F1IMDB Macro F1
Backbone (HAN)87.4887.5057.9353.82
PRODIGY/NF86.7386.6957.2452.36
PRODIGY/FT87.6187.6358.2053.97
RAGraph/NF87.5887.5458.1653.95
RAGraph/FT87.7787.8058.4954.30
RAGraph/NFT88.1488.1658.5554.42

The results from these datasets also demonstrate the effectiveness of our approach. We will add these experiment results to the final version.

[1] Wang X, Ji H, Shi C, Wang B, Ye Y, Cui P, Yu PS. Heterogeneous graph attention network. In WWW 2019.

[2] Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. Microsoft academic graph: When experts are not enough. In Quantitative Science Studies 2020.

Best regards,

Authors of the paper 8566

审稿意见
6

This paper aims to leverage the retrieval-augmented generation method to improve the generalisation capability of pretrained graph neural networks (GNNs) to unseen data. To this end, a framework named RAGRAPH is proposed. RAGRAPH first constructs a toy graph vector library (key-value pairs) by chunking from resource graphs, where the keys store some features of the master nodes of the toy graphs and the values are the corresponding node representations and task-specific outputs. When making predictions for unseen nodes/graphs, RAGRAPH retrieves top-KK most similar toy graphs from the vector library and augment the nodes/graphs with the retrieved toy graphs. Moreover, to mitigate the challenge of retrieving related but irrelevant graphs, the paper also proposes a prompt tuning method to finetune the pretrained GNNs. The main idea of this prompt tuning method is to explicitly inject some noises into the retrieved toy graphs during the finetuning stage.

优点

  1. The paper propose a novel framework RAGRAPH, which leverages RAG methods to enhance the generalisation capability of pretrained GNNs.
  2. The paper is well-structured and easy to follow.
  3. Experimental results on three graph learning tasks, i.e., node classification, graph classification and link prediction, demonstrate the effectiveness of the proposed RAGRAPH framework.
  4. The code is provided for reproducibility.

缺点

  1. The RAGRAPH framework is quite complicated, consisting of several steps in the pipeline. However, some design choices within the pipeline are not properly justified. For example:

(1) When constructing the toy graph vector library (section 4.1), RAGRAPH uses an inverse importance sampling strategy and some augmentation strategies to sample toy graphs to be stored in the library. It is currently unclear if we actually need all these steps to achieve satisfactory performance and how different choices would affect the final performance.

(2) In the toy graph retrieval process (section 4.2), RAGRAPH uses four different sources of similarities, i.e., time, structure, environment and semantic similarities, to compute the similarities between the centre node in the query graph and the master nodes in the toy graphs. It is also unclear if all these four similarities all contribute to the performance improvement. Moreover, it is unclear how to set the different weights for these four similarities.

(3) In the knowledge fusion layer (section 4.3.2), RAGRAPH fuses the Decoder output with the aggregated task-specific output vector, which is obtained from the retrieved toy graphs, to make final prediction. However, it is unclear if this fusion method can outperform using the Decoder output only or the aggregated task-specific output vector only.

Therefore, I would recommend conducting some ablation studies to justify the specific choices in the RAGRAPH framework.

  1. There are lots of hyperparameters in the RAGRAPH framework, such as the balance weight α\alpha, the scaling constant KK in toy graph augmentation, the hyperparameters in different augmentation methods, the reweighting hyperparameter γ\gamma, ect. However, it is unclear how to set these hyperparameters to obtain the best performance. Additionally, the specific settings of these hyperparameters used in the experiments are also absent from the paper.

  2. In line 332-333, the paper states that “RAGRAPH outperforms all the baselines across the three graph tasks”. However, this is inaccurate given that RAGRAPH does not always obtain the best performance in the link prediction task (see Table 2).

  3. I am confused about the analyses in line 348-353. The paper states that “PRODIGY/NF and RAGRAPH/NF are inferior to Vanilla/NF, indicating that …”. However, the experimental results in Table 1 and Table 2 actually indicate that both PRODIGY/NF and RAGRAPH/NF outperform Vanilla/NF in almost all the cases.

问题

  1. What are the justifications for the specific design choices in the proposed RAGRAPH framework? Can they all contribute to the performance improvement?
  2. How to set the hyperparameters in RAGRAPH? It is unclear whether there is a general setting that can achieve decent performance across different datasets, or if the hyperparameters need to be adapted for each specific dataset.
  3. Can you provide some qualitative analyses to provide some insights regarding why retrieving toy graphs can help improve the performance?

局限性

The authors have discussed limitations and broader impacts in their paper.

作者回复

Thanks for your insightful comments. We address your concerns and answer the questions below.

W1 & Q1: Unclear of the contribution of (a) inverse importance sampling, (b) augmentation, (c) four similarities, and (d) fusion method to performance improvement.

A1: We conducted four ablation experiments on both node and graph classification tasks with settings in Appendix C.4 with the following variants by removing: (a) inverse important sampling strategy (wo IIS), (b) augmentation (wo AUG), (c) any one of the four similarities (wo Time, wo Structure, wo Environment, wo Semantic); and (d) Only use X (mask the task-specific output vector, w X) and only use Y (mask the decoder, w Y):

(a) The adoption of Inverse Importance Sampling strategy is crucial. In RAGraph, subgraphs are sampled as toy graphs, where nodes with higher degrees (non-long-tail knowledge, extensively learned and embedded into GNN parameters) are more frequently included in subgraphs due to their extensive connections with neighbors, resulting in higher frequency in toy graph base [1]. Conversely, nodes with low degrees (long-tail knowledge), are more important but ignored. To mitigate this issue, we propose this by prioritizing nodes with lower degrees to capture long-tail knowledge when sampling.

The ablation results in Rebuttal PDF Table 2 show that w (with) IIS significantly outperforms wo (without) IIS.

(b) Furthermore, regarding the rationale for conducting augmentation, due to word restriction, please refer to global response A1, and ablation result is in PDF Table 1.

(c) In practical applications, the four similarities all contribute to performance improvement and we state the significance as follows:

  • Time information is crucial to predict future states or trends [2] via node history, i.e. in social networks, analyzing historical user interaction aids in predicting future behaviors.

  • Structure pertains to how nodes are interconnected and overall graph topology, vital for capturing similar graph structure patterns [3]. In transportation networks, factories are always located on the outer ring of the city, sharing similar structural connectivity, aiding in the discovery of spatio-temporal patterns [4].

  • Sharing similar neighborhoods is essential for evaluating node similarity and correlation. In recommendations, shared purchase histories between users and products indicate potential interests, akin to collaborative filtering [5].

  • Semantic information measures similarity based on features [6]. In knowledge graphs, identifying relevant subgraphs to query nodes enhances retrieval accuracy based on semantic similarity.

The ablation results in Rebuttal PDF Table 3 indicate that each type of similarity has a positive impact.

(d) Fusion and decoder here represent one of the core contributions of RAGraph:

  • Overall Task Perspective: For same tasks, decoder can be directly employed to obtain outputs. For different tasks, decoder can be masked and utilize pre-computed embeddings without training or be tuned to better adapt. This underscores our primary contribution, where decoder functions as a versatile "plug-and-play" and "tune-free" component.

  • Integral Fusion Strategy: Fusion Strategy facilitates concurrent information propagation from toy graphs X (hidden embeddings) and Y (task-specific output vector) to query graph, aligning with our secondary contribution.

Experimental results in Rebuttal PDF Table 4 show the effect brought by fusion is enormous, and any deficiency in X / Y has an impact on model performance.

[1] Walking in Facebook: A case study of unbiased sampling of OSNs. In SIGCOMM 2010.

[2] GraphPro: Graph Pre-training and Prompt Learning for Recommendation. In WWW 2024.

[3] Position-aware Graph Neural Networks. In ICML 2019.

[4] Spatiotemporal Multi-Graph Convolution Network for Ride-Hailing Demand Forecasting. In AAAI 2019.

[5] A Survey of Collaborative Filtering Techniques. In AAAI 2009.

[6] PRODIGY: Enabling In-context Learning Over Graphs. In NeurIPS 2023.


W2 & Q2: Unclear how to set hyperparameters.

A2: For those sensitive kk and topKtopK, we have presented and analysed in Figure 3. For less sensitive hyperparameters, we observed optimal performance across a broad spectrum of values. In our study, for less sensitive parameters, we employed Bayesian optimization using Optuna (https://github.com/optuna/optuna) to achieve hyperparameter tuning.

  • Bayesian Optimization: To mitigate the difficulty of manually fine-tuning hyperparameters for each dataset, we adopted a Bayesian optimization technique that concurrently optimizes hyperparameters across multiple datasets on the validation set [7] [8]. This approach automates and streamlines the hyperparameter tuning process across various datasets in RAGraph, eliminating the need for individual dataset-specific fine-tuning, and enhancing both generalization capability and resource utilization. We apologize for our oversight of the paper and will supplement details into the main text.

  • Hyperparameter Settings: The hyperparameter configuration in our study is detailed as follows: kk is set to 2, topKtopK is set to 5, α=λ=γ=0.5,K=50,w1=w2=w3=0.1,w4=0.6\alpha=\lambda=\gamma=0.5, K=50, w_1=w_2=w_3=0.1, w_4=0.6, which can be found in Appendix C.4.

[7] Practical Bayesian optimization of machine learning algorithms. In NeurIPS 2012.

[8] Taking the human out of the loop: A review of Bayesian optimization. In TPD 2016.


W3: Two inaccurate writings.

A3: Thanks for pointing out this question. We will correct the two inaccurate writings in final version by replacing (1) "all" to "almost all", (2) "inferior" to "better". Additionally, we have thoroughly reviewed the manuscript for any other potential typographical errors to ensure accuracy.


Q4: Qualitative analyses of toy graphs retrieving.

A4: Due to word restriction, please refer to the global response A2 and the Rebuttal PDF Figure 1.

评论

Dear Reviewer 3qQ8,

We would like to express our gratitude for taking the time to review our paper and provide us with valuable comments. We understand that you have busy schedules and apologize for any inconvenience caused by our urging letter.

With only 1 day left in the discussion period, we hope to receive feedback from the reviewers: did our response address your concerns, and what can we do to further improve our score?

Thank you for your consideration, and we look forward to receiving your feedback soon.

Best regards,

Authors of the paper 8566

评论

Thanks the authors for providing the responses, which address my major concerns regarding the effectiveness of each components. I have updated my score accordingly.

评论

Dear Reviewer 3qQ8,

We are delighted to hear that your concerns have been satisfactorily answered! Thank you once again for recognizing the contributions of our work.

Best regards,

Authors of the paper 8566

审稿意见
6

Summary:

The paper presents RAGRAPH, a pioneering framework that integrates Retrieval-Augmented Generation (RAG) with pretrained Graph Neural Networks (GNN) to bolster their generalizability on unseen graph data. The authors construct a toy graph vector library capturing key attributes, which aids in the retrieval of analogous graphs to enrich the learning context during inference. RAGRAPH demonstrates superior performance over existing methods in tasks such as node classification, link prediction, and graph classification, showcasing its adaptability and robustness across diverse datasets without the need for fine-tuning.

Contributions:

  1. The introduction of RAGRAPH, the first of its kind to merge RAG techniques with pre-trained GNNs, offering a significant leap in model generalization.

  2. The work creates a novel library that stores key graph attributes, facilitating the retrieval of similar graphs to enhance learning.

  3. The proposed RAGRAPH outperforms state-of-the-art methods across multiple graph learning tasks, highlighting its effectiveness. Besides, the framework maintains high performance across different tasks and datasets, emphasizing its robustness.

优点

  1. RAGRAPH's strength lies in its innovative use of retrieval mechanisms to enhance graph learning tasks. By retrieving and integrating external graph data, it effectively broadens the context for learning, leading to improved performance and generalization.

  2. The whole process is reasonable and solid, with clear presenation.

  3. The ability to maintain high performance across various datasets without task-specific fine-tuning is a significant strength. This adaptability makes RAGRAPH a robust choice for diverse real-world applications. Moreover, RAGRAPH's design as a plug-and-play module allows for seamless integration with pre-trained GNNs.

缺点

I have only one concern about the cons of this work:

The major weakness is the difficulty to construct and maintain high-quality graph vector base for different tasks, since according to my experiences, the performance is highly dependent on the quality of the external knowledge. This may influence the application of the proposed method in more diverse real-world cases.

问题

See the concern above.

局限性

NA

作者回复

Thank you for your insightful comments. We address your concerns and answer your questions below.

W1: The difficulty to construct and maintain high-quality and diverse graph vector base for different tasks.

A1: We acknowledge the challenge you mentioned of constructing and maintaining high-quality graph vector bases tailored to diverse tasks. In RAGraph, our toy graph base largely leverages significant prior research datasets in pre-trained GNNs [1] [2] [3] [4], which are trained on meticulously curated graph datasets and cover diverse domains, such as biology, chemistry, medicine recommendation tasks, etc. For example, the PROTEINS dataset [5], derived from cryo-electron microscopy and X-ray crystallography, and the ENZYMES dataset [6], based on EC enzyme classification, are meticulously annotated by medical experts.

Moreover, to address inherent challenges in data quality, we introduce Noise-based Graph Prompting Tuning (Section 4.3.3). This method involves fine-tuning the model with artificially introduced noisy toy graphs (Inner-Toy-Graph Noise & Toy-Graph Noise), inspired by noise-tuning techniques in NLP [7] [8] [9]. Our approach enhances the model's robustness against real-world retrieval noise, as evidenced by superior performance compared to traditional tuning methods (in Main Text Tables 1 and 2). This approach reduces the stringent requirement for an exceptionally high-quality graph vector base, thereby ensuring robust performance across various tasks within our RAGraph, and significantly mitigating data quality impacts.

Lastly, to verify the diversity of applications for RAGraph, we also conducted experiments on the time-series encyclopedia TGB Wiki, the paper-cited datasets Arxiv and Cora, and large-scale graphs MAG in Rebuttal PDF Table 6.

The experimental results demonstrate that RAGraph can be applied in diverse real-world cases. In addition, the effect of noise-based fine-tuning is better than that of direct fine-tuning, which also proves the effectiveness of our NFT approach, effectively addressing the inherent challenge in data quality.

[1] Liu, Z., Yu, X., Fang, Y., & Zhang, X. 2023. GraphPrompt: Unifying Pre-Training and Downstream Tasks for Graph Neural Networks. In WWW.

[2] Xia, L., Kao, B., & Huang, C. (2024). OpenGraph: Towards Open Graph Foundation Models. Retrieved from arXiv preprint arXiv.

[3] Yu, X., Zhou, C., Fang, Y., & Zhang, X. (2023). MultiGPrompt for Multi-Task Pre-Training and Prompting on Graphs. In WWW.

[4] Huang, Q., Ren, H., Chen, P., Kržmanc, G., Zeng, D., Liang, P., & Leskovec, J. (2023). PRODIGY: Enabling In-context Learning Over Graphs. In NeurIPS.

[5] Borgwardt, K. M., Ong, C. S., Schönauer, S., Vishwanathan, S. V. N., Smola, A. J., & Kriegel, H.-P. (2005). Predicting protein function through graph kernels. In Bioinformatics.

[6] Wang, S., Dong, Y., Huang, X., Chen, C., & Li, J. (2022). FAITH: Few-shot graph classification with hierarchical task graphs. In IJCAI.

[7] Yoran, O., Wolfson, T., Ram, O., & Berant, J. (2024). Making Retrieval-Augmented Language Models Robust to Irrelevant Context. In ICLR.

[8] Fang, F., Bai, Y., Ni, S., Yang, M., Chen, X., & Xu, R. (2024). Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training. In ACL.

[9] Cuconasu, F., Trappolini, G., Siciliano, F., Filice, S., Campagnano, C., Maarek, Y., Tonellotto, N., & Silvestri, F. (2024). The Power of Noise: Redefining Retrieval for RAG Systems. In arxiv.

评论

Dear Reviewer va3W,

As the discussion phase is ending tomorrow, we extend our gratitude once more for your valuable and insightful comments!

We have provided careful and detailed responses to all your questions. It would be greatly appreciated if you could kindly let us know whether we have answered all your questions. Please also kindly let us know if you have any further questions, and we would like to try our best to resolve them before the deadline.

Best regards,

Authors of the paper 8566

评论

Dear authors,

Thanks for your rebuttal. I think your replies shows the efforts needed to address the challenge I proposed in my review. It does make sense to me. I believe more efforts will be needed when applying to problems beyond academic graphs. Therefore, I would like to keep my rating unchanged.

Best, Reviewer

评论

Dear Reviewer va3W,

Thank you for your careful reading of our rebuttal and for providing constructive feedback. We are delighted to know that your main concerns have been resolved! We greatly appreciate the time and effort you have invested in evaluating our manuscript.

Regarding your remaining concern, we believe there may have been a misunderstanding. In our experiments, we utilized a diverse set of datasets, including academic datasets such as PROTEINS, BZR, Cora, IMDB, ACM and large-scale datasets like MAG, supplemented by Rebuttal. In addition to these, we employed several industrial graph datasets, including TAOBAO (from the largest online shopping platform in China, https://ali-home.alibaba.com/en-US/about-alibaba), KOUBEI (from the large consumption platform, https://www.koubei.com/), and AMAZON (from the largest online shopping platform in USA, https://www.amazon.com/).

On large-scale dynamic recommendation graph data—specifically TAOBAO, KOUBEI, and AMAZON—our performance significantly outperforms that of GraphPro [1]. GraphPro has already been successfully deployed on a large-scale online platform for dynamic streaming data (as referenced in Section 4.5 of GraphPro), where it has achieved notable improvements in CTR prediction performance. In contrast to GraphPro, our algorithm not only delivers superior performance but also offers faster inference speed.

To prove this point, we conducted training and inference on 204,168 nodes in the dynamic TAOBAO dataset, and calculated their time consumed under the same settings:

AspectGraphProRAGraphEfficiency Improvement
Training21.87s19.14s1.143 ×\times
Inference12.11s10.08s1.201 ×\times

As illustrated in the Table, this advantage stems largely from RAGraph's k-hop subgraphs—query graphs—for inference, whereas GraphPro requires inference over the entire graph. Furthermore, RAGraph's capability to directly retrieve historical evaluation data without the need for model fine-tuning enables it to achieve superior results, as demonstrated by the comparison between GraphPro/NF and Vanilla/NF in Table 2 in the main Paper.

Therefore, in terms of performance and efficiency, our model outperforms GraphPro which has been deployed online on these three large-scale dynamic industrial graph datasets In addition, our toy graph base can also be cached to accelerate deployment in the industrial area.

Given the proven success of GraphPro in online deployment, we are optimistic about the future application of RAGraph in the industrial GNN field, i.e., in the recommendation system. We also anticipate that RAGraph could potentially achieve notable results in RAG within NLP, leveraging the potential success of large graph models.

At last, please also kindly let us know if you have any further questions or what can we do to further improve our score? And we would like to try our best to resolve them before the deadline.

[1] GraphPro: Graph Pre-training and Prompt Learning for Recommendation. In WWW 2024.

Best regards,

Authors of the paper 8566

评论

Dear authors,

Sorry for the misunderstanding in the provided real-world datasets beyond academic graphs. Then the increased efforts may not be more than the proposed efforts in your rebuttal. I would like keep my rating unchanged.

Best, Reviewer

评论

Dear Reviewer va3W,

Thank you for your thoughtful feedback and for taking the time to review our rebuttal. We are delighted to know that your main concerns and the misunderstanding have been resolved! We greatly appreciate the time and effort you have invested in evaluating our manuscript.

We understand that you have decided to keep your rating unchanged, and we respect your decision. However, we would like to kindly ask you to reconsider our contributions and the efforts we have put into addressing your concerns and improving our work.

In our revised manuscript, we have not only clarified the use of a diverse set of datasets, including large-scale, real-world datasets from industrial applications, but we have also demonstrated significant performance improvements over existing methods. Specifically, our approach shows clear advantages in terms of efficiency and scalability, particularly in handling large industrial graphs—a crucial aspect in real-world scenarios that we believe align closely with the goals of our field as suggested by you.

Furthermore, we have incorporated additional experiments and analyses to provide a more comprehensive evaluation of our method’s effectiveness. These efforts were made to ensure that our work contributes meaningfully to both the academic community and industry applications.

We understand the importance of maintaining rigorous standards, and we are committed to further refining our work based on your guidance. We would be grateful if you could reconsider your rating in light of these additional efforts and the potential impact of our contributions.

Thank you once again for your time and consideration.

Best regards, Authors of the paper 8566

评论

Dear authors,

I appreciate your efforts in this work, however, according to my experience in this domain, RAGraph is not that big and novel idea in terms of research contributions. The major contribution of this work, in my opinion, come from the technical and experimental parts, for which I gave a weak accept. I would like keep my rating unchanged.

Best, Reviewer

评论

Dear Reviewer va3W,

Thank you for your thoughtful feedback and for taking the time to review our rebuttal. We greatly appreciate the time and effort you have invested in evaluating our manuscript.

Regarding novelty, we believe that the introduction of RAGraph represents a meaningful step forward in the GNN field by innovatively incorporating external graph data to enhance generalization in unseen scenarios. While the concept might not seem groundbreaking in isolation, the framework's ability to dynamically retrieve and integrate relevant data during inference, without the need for fine-tuning, highlights its adaptability and potential for wide-ranging applications. We are particularly optimistic about its future impact, especially in areas like recommendation systems and potentially in RAG within NLP as large graph models continue to develop.

We sincerely thank you again for recognizing the strengths of our technical and experimental work, and we are grateful for your evaluation of our manuscript.

Best regards,

Authors of the paper 8566

作者回复

We would like to express our gratitude to all reviewers for their insightful comments and acknowledging the strengths of RAGraph. We have addressed all the concerns raised and provided comprehensive answers in this rebuttal.

In the attached PDF, we present:

(1) More ablation experiments mentioned by Reviewer 3qQ8 & Reviewer v5NH;

(2) Added datasets experiments pointed out by Reviewer sivC;

(3) Qualitative analyses of toy graphs retrieved suggested by Reviewer 3qQ8 & Reviewer v5NH.


Regarding commonly asked questions: (1) The effect of augmentation; (2) Qualitative analyses of toy graphs retrieval, and (3) The effect of Noise-based Graph Prompt Tuning; we give detailed explanations in the next parts and we will add the explanation in our future submission:

1. For Reviewer 3qQ8 & Reviewer v5NH: Answers to the effect of augmentation.

A1: The reasons for toy graph augmentation:

  • Expanding toy graph base, enriching the scale of the knowledge repository [1].

  • Simulating Real-World Scenarios: Real-world graphs often encounter challenges such as missing nodes [2], noisy attributes [3], and unexplored connections [4]. We introduce node dropout, noise injection, and edge removal to simulate these scenarios accurately.

  • Addressing Graph Domain Shift: To mitigate domain shift between the graph knowledge base and testing graphs, our augmentations employ Mixup techniques such as Node Interpolation and Edge Rewiring. These techniques interpolate between training samples to generate synthetic samples, effectively smoothing decision boundaries in embedding and reducing the model's sensitivity to minor variations in input data, thereby stabilizing predictions on domain shift testing samples [5].

To validate this strategy, we conducted additional experiments on node and graph classification tasks described in Appendix C.4. For simplicity, we abbreviate “Augmentation strategy” as “AUG”. The ablation results presented in PDF Table 1 indicate that tw (with) AUG significantly outperforms wo (without) AUG on both node and graph classification tasks.

[1] Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy. In TIP 2020.

[2] Incomplete Graph Learning via Attribute-Structure Decoupled Variational Auto-Encoder. In WSDM 2023.

[3] Boosting the adversarial robustness of graph neural networks: An OOD perspective. In ICLR 2024.

[4] Graph embedding techniques for predicting missing links in biological networks: An empirical evaluation. In TETP 2024.

[5] ProtoMix: Augmenting health status representation learning via prototype-based mixup. In SIGKDD 2024.


2. For Reviewer 3qQ8 & Reviewer v5NH: Qualitative analyses of toy graphs retrieving -- how “generation” works.

A2: We conduct qualitative analyses of how "generation" works while learning graphs through a case study in Rebuttal PDF Figure 1.

On the ENZYMES dataset, for a 3-class node classification task, regarding node "13984", which belongs to class 3, if we only use the GraphPrompt Backbone, the resulting one-hot encoding is: [0.28, 0.34, 0.38].

However, since the node is of class 3, we expect the one-hot encoding to be as close as possible to [0,0,1]. In RAGraph retrieval, taking the top 3 retrieved graphs as examples, the connection weights for these 3 toy graphs to query graphs are 0.5, 0.7, and 0.1, respectively, and their corresponding label one-hot encodings are [0,0,1], [0,0,1], and [0,1,0]. Therefore, the result obtained by propagating the task-specific output vector through toy graphs is: [0, 0.1, 1.2], and after normalization, the result is [0, 0.08, 0.92].

Meanwhile, the vector obtained by propagating toy graphs hidden embedding and via decoder is: [0.37, 0.32, 0.66].
The retrieval of toy graphs notably enhances performance at both the task-specific output vector and hidden embedding levels. The final vector is obtained through a weighted sum with γ=0.5\gamma=0.5 in Eq(6) is [0.185, 0.20, 0.79], after normalization the result is [0.157, 0.170, 0.673], which greatly enhances the model's discriminative ability compared to GraphPrompt [0.28, 0.34, 0.38].


3. For Reviewer va3W & Reviewer v5NH: The effect of Noise-based Graph Prompt Tuning.

A3: To address inherent challenges in toy graph quality, we introduce Noise-based Graph Prompting Tuning (Section 4.3.3). This method involves fine-tuning the model with artificially introduced noisy toy graphs (Inner-Toy-Graph Noise & Toy-Graph Noise), inspired by noise-tuning techniques in NLP [6] [7] [8]. Our approach enhances the model's robustness against real-world retrieval noise, as evidenced by superior performance compared to traditional tuning methods (in Main Text Tables 1 and 2). This approach reduces the stringent requirement for an exceptionally high-quality graph vector base, thereby ensuring robust performance across various tasks within our RAGraph, and significantly mitigating data quality impacts.

Lastly, to verify the diversity of applications for RAGraph, we also conducted experiments on the time-series knowledge graph Crisis Warning ICEWS, the paper-cited datasets Arxiv and Cora, and large-scale graphs MAG to test Noise-based Graph Prompt Tuning in Rebuttal PDF Table 6. Experiments show RAGraph's applicability in diverse real-world cases. In addition, the effect of noise-based fine-tuning is better than fine-tuning. This further demonstrates the effectiveness of the NFT approach in tackling the inherent challenges related to data quality.

[6] Making Retrieval-Augmented Language Models Robust to Irrelevant Context. In ICLR 2024.

[7] Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training. In ACL 2024.

[8] The Power of Noise: Redefining Retrieval for RAG Systems. In arxiv 2024.


\downarrow \downarrow is the Rebuttal PDF, which contains the supplementary experiments and figures.

评论

Dear AC, Senior AC, NeurIPS PCs, and Reviewers,

I hope this message finds you well. As the review process for our paper, [RAGraph: A General Retrieval-Augmented Graph Learning Framework] comes to a close, I wanted to take a moment to express my sincere gratitude to you and the reviewers for your time, effort, and valuable suggestions.

First and foremost, we are very grateful to the AC for your timely reminders and for ensuring the smooth and effective progression of the entire review process.

I would also like to extend my deepest thanks to each of the reviewers. We sincerely appreciate your recognition of our work, particularly your acknowledgment of RAGraph's novelty, as well as potential and promising applications. We are also delighted to know that all of the reviewers' concerns have been satisfactorily addressed! Your feedback has played a crucial role in refining our research, and we are grateful for your thoughtful and constructive suggestions.

In the future, we will incorporate the dataset experiments, ablation studies, and other improvements discussed during the rebuttal process into the main text.

Finally, we deeply appreciate the time and effort each of you has invested in this process. Your contributions have been invaluable, and we are grateful for the opportunity to engage with such a dedicated group of professionals.

Best regards,

Authors of the paper 8566

最终决定

This work proposes a framework named General Retrieval-Augmented Graph Learning (RAGraph) to utilize external graph data for the general graph foundation model to improve generalization on unseen scenarios. Reviewers acknowledged the merits of this work for integrating RAG techniques with pre-trained GNNs.

The authors should include the content they added during the rebuttal when they prepare the final version of the paper. Those include:

  1. The responses to Reviewer va3W's point: the difficulty of constructing and maintaining high-quality and diverse graph vector bases for different tasks
  2. Justifications for the specific design choices in the proposed RAGRAPH framework, Details about setting hyperparameters, and Qualitative analyses of toy graphs retrieving (Reviewer 3qQ8)
  3. Adding the results on heterogeneous graphs and large-scale graphs, as well as the experiments without hyperparameters (Reviewer sivC)
  4. Adding motivation to construct a toy database and retrieve & adding the zero-shot results (Reviewer v5NH)