PaperHub
7.3
/10
Poster4 位审稿人
最低4最高5标准差0.5
5
4
4
5
3.5
置信度
创新性2.5
质量2.8
清晰度2.5
重要性2.8
NeurIPS 2025

Towards Graph Foundation Models: Training on Knowledge Graphs Enables Transferability to General Graphs

OpenReviewPDF
提交: 2025-05-02更新: 2025-10-29

摘要

关键词
Knowledge GraphGraph Foundation ModelGraph Message Passing

评审与讨论

审稿意见
5

This paper proposes a novel graph foundation model (GFM) framework called SCR, which aims to enhance the transferability of knowledge graph (KG) reasoning to general graph learning tasks. The authors introduce a semantic-conditioned message passing mechanism to address the semantic isolation issue in traditional KG reasoning. Extensive experiments across diverse graph datasets demonstrate the effectiveness and adaptability of the proposed approach.

优缺点分析

Paper Strength

  1. The paper introduces a novel graph reasoning framework, SCR, which adapts the principles of knowledge graph reasoning to general graphs learning tasks like node classification, link prediction, and graph classification.
  2. The proposed semantic-conditioned message passing mechanism effectively considers the node semantics, addressing the semantic isolation issue in previous KG foundation models.
  3. The paper conducts extensive experiments on diverse graph datasets, covering node-level, link-level, and graph-level tasks across multiple domains, showcasing the effectiveness and generalizability of the proposed approach.

Paper Weakness

  1. The paper argues that existing attempts to apply node semantic features into the Init()Init() function would lead to the violation of target node distinguishability following previous research [1]. Motivated by this, SCR proposes a novel Semantic-injected INIT Function, and incorporates semantic features in a late-fusion manner (Eq.7). However, the paper does not provide sufficient comparison with other attempts that incorporate semantic features and also hold the target node distinguishability. Meanwhile, the semantic isolation issue should be further analyzed to enhance the motivation of the proposed method.

For example, in the Section 4.1 and Appendix C.3 of [1], the authors discuss that the Init3()=1u=v(zq+ϵu)Init^3()= \mathcal{1}_{u=v} * (z_q + \epsilon_u ) can be used to incorporate node semantic feature ϵu\epsilon_u while maintaining the target node distinguishability by adding query-dependent vector zqz_q. This can be considered as an early-fusion method and is easier to implement than the proposed SCR. The paper should provide more details on why the proposed late-fusion method is necessary and how it outperforms other methods that also incorporate semantic features [2,3].

Besides, the original embedding dimension of the semantic features is 768, which is much larger than the dimension of the node features (64). Such a large dimension gap may lead to a significant loss of semantic features during the feature fusion process and impair the performance as shown in Figure 2. Can the authors further conduct an analysis on the impact of the dimension gap between the semantic features and node features on the performance of the model to justify the necessity of the proposed method?

  1. The proposed SCMP only considers the semantics of the node. How does the proposed method handle the semantics of the edges [4]? This can be quite important for tasks like link prediction, where the edge semantics can provide additional information for reasoning. Moreover, for node classification on homogeneous graphs, there is only one type of edge, which makes the relation graph a fully-connected graph and provides no meaningful information for learning the relation representation.

  2. The proposed method shows a great zero-shot performance on node classification and graph classification tasks. However, the performance of the model decreases after a few-shot fine-tuning. The paper does not provide sufficient analysis on why the performance drops after a few-shot fine-tuning. Is the model trained from scratch, or is it fine-tuned from a pre-trained model? If it is fine-tuned from a pre-trained model, how does the pre-training and fine-tuning affect the performance of the model?

[1] Huang, Xingyue, et al. "A theory of link prediction via relational weisfeiler-leman on knowledge graphs." Advances in Neural Information Processing Systems 36 (2023): 19714-19748.
[2] Combining Structure and Text: Learning Representations for Reasoning on Graphs, https://openreview.net/forum?id=hJ8OQAiTrl
[3] Hua, Yin, et al. "Beyond Completion: A Foundation Model for General Knowledge Graph Reasoning." arXiv preprint arXiv:2505.21926 (2025).
[4] Arun, Arvindh, et al. "SEMMA: A Semantic Aware Knowledge Graph Foundation Model." arXiv preprint arXiv:2505.20422 (2025).

问题

  1. Can you compare the performance of the proposed SCR with other methods that also incorporate semantic features into the conditioned message passing framework to show the necessity of the proposed late-fusion method? Especially, when setting the dimension of the node features closer to the dimension of the semantic features and using Init^3 fusion, would the trend in Figure 2 still hold?

  2. In the homogeneous graphs node classification, are there only two types of edges (i.e., original edges and label edges)? How does the proposed method generate meaningful relation representations from the relation graph for reasoning?

  3. Why does the performance of the model drop after a few-shot fine-tuning? Is the model trained from scratch, or is it fine-tuned from a pre-trained model? Would fine-tuning from a pre-trained model improve the performance of the model?

局限性

yes

最终评判理由

This paper explores an important and promising direction. While many areas remain unexplored, its contribution still sets a strong precedent for future work.

格式问题

NA

作者回复

We thank the reviewer Z3ck for the insightful and valuable comments. We sincerely hope that our rebuttal adequately addresses your concerns. If so, we would deeply appreciate it if you could raise your score. If not, please let us know if you have further concerns, and we will continue actively responding to your comments.  

1. W1&Q1. Can you compare the performance of the proposed SCR with other methods that also incorporate semantic features into the conditioned message passing framework to show the necessity of the proposed late-fusion method?  

We sincerely thank you for this comment! We first discuss the three methods [1,2,3] mentioned by the reviewer:  

  1. The initializer INIT3()INIT^3() in Huang et. al. [1] incorporates the semantic feature of the query node uu, but ignores the semantics of other nodes in the neighborhood. As a result, using INIT3()INIT^3() would not significantly drop the performance of ULTRA in Figure 2, because it still cannot exploit the full range of node semantics. This is also one of the motivations that we propose the semantic isolation issue.  

  2. The second work [2] is not related to graph foundation models and does not utilize CMP-based encoding. It employs PLM-based textual embeddings as the input node feature of GNN, and alternates the training of GNN and PLM within a single dataset. This method cannot handle our zero-shot GFM settings, especially for non-textual semantic features in general graphs. Additionally, the performance drop reported in their Table 2 when directly using pre-trained PLM embeddings indicates the necessity of re-training a PLM for enhanced text representation. It is consistent with our findings in Table 2.  

  3. The third work [3] is a contemporaneous work (published in May 2025) for KG foundational reasoning. While [3] achieves semantic injection, it focuses solely on text embeddings derived from a single semantic space (i.e., an LLM). Consequently, their pretraining approach cannot process zero-shot features originating from different semantic spaces. We note that their late fusion between query‑conditioned structural encoding and global structural semantic encoding is similar to our proposed strategy, indirectly confirming the feasibility and rationale of our design choice.

In summary, the first method cannot utilize complete node semantics, while the latter two methods, reliant on either an LLM's semantic space or graph-specific training, cannot effectively integrate multi-source zero-shot features. Consequently, none of these approaches provides a viable alternative to our method within Graph Foundation Model. We will incorporate the above discussion and analysis into the revised manuscript to strengthen our method.  

2. W1&Q1. When setting the dimension of the node features closer to the dimension of the semantic features and using INIT3INIT^3 fusion, would the trend in Figure 2 still hold?

In terms of the impact of the dimension gap between semantic features and node features, we conduct additional experiments with 50d Glove Features (with zero-padding), Clipped 64d Bert features, random features, and all-ones features. The results below indicate that using node features with closer dimensions still invokes the performance drop. Additionally, we test the variant using the semantic features of the query node in INIT3()INIT^3(). There still exists a performance drop compared with the original ULTRA on most datasets.  

FB_v1WN_v1NE_v1
ULTRA0.4860.5930.716
+Bert0.1630.0140.580
+All One(64d)0.2270.0240.684
+Random(64d)0.2180.0150.658
+Bert(Clip64)0.2000.0130.593
+Glove(50d)0.1600.0070.609
+Bert+INIT30.4830.5490.648
+Glove+INIT30.4830.5240.720

Because Bert-based embeddings have 768 dimensions, it is much larger than the dimension of the node features (64). Sorry that we cannot pretrain such a 768-d ULTRA due to limited computational resources in the rebuttal phase.  

3. W2&Q2. How does the proposed method handle the semantics of the edges? In the homogeneous graphs node classification, are there only two types of edges (i.e., original edges and label edges)? How does the proposed method generate meaningful relation representations from the relation graph for reasoning?  

We sincerely appreciate these comments and questions.  

We focus on node semantic features in this work and acknowledge the value of handling explicit edge features. It is straightforward to extend SCR to incorporate edge semantics by replacing the first CMP module with our proposed SCMP. Similar to our work, the contemporaneous work, SEMMA [4], employs an additional CMP-based semantic relation encoding process and the late-fusion strategy to handle edge features.  

We clarify that, in homogeneous node classification, besides original edges and label edges, there are semantically-similar edges involved in the relation graph and CMP-based encoding. Although there are a few relation types in the relation graph, it still follows strict topological rules. For example, label edges only link from nodes to their labels, while original edges connect among nodes. This creates meaningful asymmetric constraints, i.e. there are no “t-h” and “t-t” interactions from the label edge type to the original edge type. By learning how different relation interactions interact topologically across the graph, our model derives transferable reasoning capabilities that generalize beyond explicit relation semantics.  

In Section 5.5 (3), we conducted ablation studies to verify the necessity of the relation representations from the relation graph. As shown in the right panel of Figure 3, despite the presence of only a few relation types in node/graph-level graph structures, the observed performance decline highlights the essential role of relational information.    

4. W3&Q3. Why does the performance of the model drop after a few-shot fine-tuning? Is the model trained from scratch, or is it fine-tuned from a pre-trained model? Would fine-tuning from a pre-trained model improve the performance of the model?  

Thank you for your comment. We guess the mentioned drop performance is from SCR-5 and SCR-20%. We clarify that these two variants are not few-shot fine-tuning, but few-shot labeling. As explained in the caption of Table 2, SCR-20% uses 20% of the “node-label” edges from the training set, while SCR-5 includes five edges per class. Both variants have no fine-tuning process, but utilize different scopes of “label information” when inference on the pre-trained model.  

Additionally, we observed that fine-tuning would enhance the model performance on target tasks. For example, we have conducted additional experiments, which fine-tune on heterophilic graph-derived KGs. We discovered that SCR’s performance is improved on heterophilic node classification benchmarks, as shown in the table below.

ULTRASCRSCR-Tune
Wisconsin0.49020.54910.5882
Texas0.56760.67640.7027
Actor0.22610.23260.2414

[1] Huang, Xingyue, et al. "A theory of link prediction via relational weisfeiler-leman on knowledge graphs." Advances in Neural Information Processing Systems 36 (2023): 19714-19748.

[2] Combining Structure and Text: Learning Representations for Reasoning on Graphs, https://openreview.net/forum?id=hJ8OQAiTrl

[3] Hua, Yin, et al. "Beyond Completion: A Foundation Model for General Knowledge Graph Reasoning." arXiv preprint arXiv:2505.21926 (2025).

[4] Arun, Arvindh, et al. "SEMMA: A Semantic Aware Knowledge Graph Foundation Model." arXiv preprint arXiv:2505.20422 (2025).

评论

Thanks to the author for the response. I think there are some misexpressions in the author's response that bother me to understand the response.

As a result, using INIT3INIT^3 would not significantly drop the performance of ULTRA in Figure 2, because it still cannot exploit the full range of node semantics.

Does the author want to express: INIT3INIT^3 would not drop the performance of ULTRA as it holds the node distinguishability. But it would also fail to consider the semantics of other nodes in the graph?

评论

Thank you for your thoughtful feedback. We appreciate the opportunity to clarify our position. Your interpretation aligns closely with our intended meaning: In the INIT3()INIT^3() method in Huang et.al., only the source node (i.e., query node) receives injected features in the initialization stage, while semantic features of neighboring nodes remain unutilized.

As shown in the table of our second response, there still exists a performance drop when using INIT3()INIT^3() compared with the original ULTRA on most datasets. Nevertheless, because INIT3()INIT^3() using semantic features can distinguish between nodes in most cases, the performance degradation is less severe than approaches that directly apply semantic features to all nodes.

评论

Thank you for the comment, which addresses my concerns. This paper explores an important and promising direction. While many areas remain unexplored, its contribution still sets a strong precedent for future work. I have changed my score accordingly.

审稿意见
4

This study broadens the scope of knowledge graph foundation models by enabling their application to both node-level and graph-level tasks. To support this expansion, it introduces a unified topological framework that accommodates diverse task formats and develops a semantic-conditional message passing mechanism to address the challenge of semantic isolation. Experimental results across link-level, node-level, and graph-level datasets confirm the model’s strong generalization capabilities.

优缺点分析

Strengths:

  • The primary contribution of this work lies in extending the capabilities of Knowledge Graph Foundation Models (KGFMs) beyond traditional link prediction, enabling their application to both node-level and graph-level tasks.

  • The introduced modules effectively mitigate the identified semantic isolation challenge.

  • Extensive experimental results validate the model’s effectiveness across diverse datasets and task types.

Weaknesses:

  • The link prediction performance on transductive datasets does not demonstrate a significant advantage compared to ULTRA.

  • The reliance on Semantic Conditional Message Passing (SCMP) may constrain the model’s reasoning ability on knowledge graphs that lack rich textual features. Besides, in the introduction, a brief explanation of "semantic features" should precede the discussion of the "semantic isolation issue," rather than deferring this explanation to the appendix. This would significantly enhance clarity and readability. Notably, in knowledge graph reasoning, semantic features are not strictly essential—as indicated by the relatively modest performance difference between SCR and ULTRA in Table 1. However, these features are often indispensable in certain node-level and graph-level tasks, which likely underpins the emergence of the semantic isolation issue in this work. I encourage the authors to revise this section to reflect this context more clearly.

问题

  • Can SCR be applied to knowledge graphs (KGs) that lack textual features?

  • Does SCR face limitations when scaling to large KGs? Specifically, is it capable of handling large-scale transductive datasets?

局限性

Yes.

最终评判理由

I keep my positive score.

格式问题

None.

作者回复

We thank the reviewer zZR9 for the insightful and valuable comments. We sincerely hope that our rebuttal adequately addresses your concerns. If so, we would deeply appreciate it if you could raise your score. If not, please let us know if you have further concerns, and we will continue actively responding to your comments.  

1. W1. The link prediction performance on transductive datasets does not demonstrate a significant advantage compared to ULTRA.  

We sincerely thank you for this comment. We agree SCR doesn't surpass ULTRA on transductive link prediction, but it is not the primary objective of SCR.

Our key contribution is to validate the semantic and structural transferability of our pre-trained KG reasoning model to general graph tasks. Towards graph foundation models, achieving consistent performance across diverse tasks with zero-shot topological graph and semantic features is challenging and valuable.  

We believe incorporating more diverse training KGs could enhance reasoning capabilities, which we plan to rigorously explore to strengthen SCR's applicability in future work.

2. W2. Besides, in the introduction, a brief explanation of "semantic features" should precede the discussion of the "semantic isolation issue", ... I encourage the authors to revise this section to reflect this context more clearly.  

Thank you for this essential feedback. We will restructure the related paragraphs in the introduction section to:  

  1. First define semantic features and their role in KG reasoning/general tasks;  
  2. Then introduce the semantic isolation issue as a fundamental barrier to cross-task generalization;  
  3. Explicitly connect both concepts to motivate SCR’s design.   This logical progression will clarify why resolving semantic isolation is critical for graph foundation models.

3. W2&Q1. The reliance on Semantic Conditional Message Passing (SCMP) may constrain the model’s reasoning ability on knowledge graphs that lack rich textual features. Can SCR be applied to knowledge graphs (KGs) that lack textual features?

Thank you for your insightful comment. We clarify that SCR is not restricted to graphs with textual features, as shown in the ablation study in Section 5.3. In Table 1, for SCR pretrained on textual features, replacing semantics with all one features (SCORE-One) only marginally degrades performance while still outperforming existing methods, indicating that semantic features do not significantly affect generalizability.

Besides, when evaluating on transductive KG datasets, SCR did not employ textual features (unavailable for some KGs) but instead ontology features derived from the KG’s inherent structure. SCR also achieved great performance.  

The key lies in SCR’s pre-training design: it systematically trains the model to handle diverse scenarios—from KGs with rich semantics (text/ontology) to those with no input features. SCR learns to reason over KGs with diverse semantics rather than depending on auxiliary textual data.  

4. Q2. Does SCR face limitations when scaling to large KGs? Specifically, is it capable of handling large-scale transductive datasets?  

We sincerely appreciate this comment. Regarding scalability, we analyzed the computational complexity of SCR in Appendix H, which exhibits comparable scalability and running time to ULTRA. We acknowledge that both methods face practical limitations when applied to KGs with millions or billions of triples. Specifically, the time cost of subgraph extraction and message passing becomes non-trivial compared to traditional embedding-based models. This limitation is inherent to subgraph-based inductive frameworks but does not preclude SCR’s applicability to typical large-scale KGs.  

We will prioritize this in future work. For large-scale KGs, recent acceleration techniques like TIGER [1] (enabling efficient subgraph extraction for inductive reasoning on Freebase) are critical. While SCR’s current implementation does not yet integrate these optimizations, its framework is compatible with such methods. We have discussed this point in Appendix J: Limitations.  

To further ease the concern, we recall the results on several large-scale transductive datasets in Table 7. The results demonstrate that SCR outperforms ULTRA, confirming its scalability and effective generalization.  

DatasetsKG ScaleULTRA(3g)SCR(3g)
EntitiesRelsMRRH@10MRRH@10
AristoV44494916050.1830.2620.2270.349
CoDEx-Large77951690.3330.4610.3290.458
ConceptNet100k78334340.0610.1170.1150.218
DBpedia100k996044700.3970.5650.4010.573
YAGO310123182370.4800.6580.4880.666
 

[1] TIGER: Training Inductive Graph Neural Network for Large-Scale Knowledge Graph Reasoning, Proceedings of the VLDB Endowment, 2024.

评论

Thanks for your response. I have no further concerns.

审稿意见
4

The paper introduces SCR (Semantic Conditional Reasoner), a framework that leverages inductive knowledge graph (KG) reasoning as a foundation for general-purpose graph learning. Instead of pretraining on general graphs directly, the authors propose transforming diverse graph tasks (like node or graph classification) into KG reasoning problems using task-specific KG structures. A key challenge they address is the semantic isolation problem, where existing KG reasoning models struggle to integrate node features effectively. To overcome this, SCR introduces Semantic Conditional Message Passing (SCMP), which combines structural and semantic information via a new initialization scheme and non-parametric semantic encoding. The framework supports semantic generalization by using feature-agnostic unification and modular pretraining with various semantic sources. By training exclusively on KGs, SCR can perform zero-shot inference across multiple graph domains and tasks without fine-tuning.

优缺点分析

Strengths

  1. The paper proposes a unified framework that reformulates multiple graph tasks as inductive link prediction problems on knowledge graphs (KGs), offering a different angle on generalizing Graph Foundation Models (GFMs).
  2. The method demonstrates competitive results across standard benchmarks.

Weaknesses

  1. Graph-level and node-level tasks often require modeling local neighborhoods, community structures, or global graph semantics—factors not typically present in KG link prediction tasks. Without pretraining on such structures, it is unclear how the learned representations transfer effectively.
  2. The comparison with ULTRA is potentially unfair. ULTRA is trained solely on KGs and not designed for general graph tasks. In contrast, the proposed method is designed a broader range, giving it a task diversity advantage.
  3. The model performs better on homophilic graphs than heterophilic ones, suggesting it relies heavily on feature similarity or local neighborhood consistency. This raises concerns about its robustness across structural regimes, which are common in real-world graphs.
  4. Over-smoothing and over-squashing are known issues in message-passing-based models, especially in deep architectures or long-range dependency tasks. The paper does not provide any discussion or evaluation on these issues. Benchmarks like LRGB or synthetic long-range tasks would help assess whether the model is scalable in such settings.

问题

  1. Section "CMP-based Backbone Model" is crucial in understanding the paper, but it is not carefully written. Could the author elaborate more, such as what is "conditioned on the query" or the initialization process of hvh_v ?

局限性

Yes

最终评判理由

The paper proposes a new approach toward graph foundation model by transforming diverse graph tasks (like node or graph classification) into KG reasoning problems using task-specific KG structures. Throughout the rebuttal phase, the authors have provided further explanation and experiments that mostly resolve my concerns.

格式问题

N/A

作者回复

We thank the reviewer wV6S for the insightful and valuable comments. We sincerely hope that our rebuttal adequately addresses your concerns. If so, we would deeply appreciate it if you could raise your score. If not, please let us know if you have further concerns, and we will continue actively responding to your comments.  

1. W1: Graph-level and node-level tasks often require modeling local neighborhoods, community structures, or global graph semantics—factors not typically present in KG link prediction tasks. Without pretraining on such structures, it is unclear how the learned representations transfer effectively.  

Thank you for highlighting this concern. Here we transform these two tasks into KG formats by introducing two extra entities, “label□” and “super graph△”, and a new relation, “is attributed with”, to predict the labels for a node or a graph. We believe that many KG relations (e.g., "person → hasGender → gender") are functionally similar to classification tasks ("is_attributed_with"), as they map numerous nodes/graphs to shared attributes.

Although the pre‑training objective is link prediction, it learns the neighborhood connectivity and therefore captures local neighborhoods and community structures. Global semantics are retained via our non‑parametric semantic representation, which aggregates all original node features in the CMP process. Together, these components enable representations learned during KG pre‑training to transfer to downstream node‑ and graph‑level tasks that rely on local structures and semantic cues.  

2. W2: The comparison with ULTRA is potentially unfair. ULTRA is trained solely on KGs and not designed for general graph tasks. In contrast, the proposed method is designed a broader range, giving it a task diversity advantage.  

We appreciate the reviewer’s valid concern regarding comparison fairness. We clarify that SCR and ULTRA use the same pretraining knowledge graphs (FB15k-237, WN18RR, CoDExMedium). ULTRA can handle general graphs by utilizing the proposed Unified Reasoning Format, that is why we obtain its performance on general graph tasks.  

The core difference lies in SCR’s complementary use of semantic features (e.g., text descriptions, node features) beyond raw graph structure. The advantages of SCR stem from the semantic-augmented graph encoding in SCMP and pretraining with multi-source features.  

3. W3: The model performs better on homophilic graphs than heterophilic ones, suggesting it relies heavily on feature similarity or local neighborhood consistency. This raises concerns about its robustness across structural regimes, which are common in real-world graphs.

We sincerely appreciate the reviewer’s insightful observation. We agree that SCR currently exhibits a degree of homophily bias, which we believe primarily stems from the CMP-based backbone architecture that emphasizes local relational connectivity.

Our main objective in this work is to validate the semantic and structural transferability of our pre-trained KG reasoning model to general graph tasks. To this end, we adopt the standard CMP-based backbone, whose generalization ability has been well established in the KG reasoning domain.

To improve robustness on heterophilic graphs, several promising strategies can be explored, such as pre-training or fine-tuning on heterophilic structures, or replacing standard aggregation functions with operators more suitable for heterophilic settings. In our preliminary experiments, we introduced a fine-tuning stage on heterophilic benchmarks, and the results demonstrate clear improvements in SCR’s performance:

ULTRASCRSCR-Tune
Wisconsin0.49020.54910.5882
Texas0.56760.67640.7027
Actor0.22610.23260.2414

These findings suggest that SCR can indeed be enhanced through additional training on heterophilic graphs. We believe that incorporating a broader set of training KGs—especially those exhibiting diverse structural properties—will further improve generalization and reasoning capabilities across varying graph regimes.

In summary, our work aims to improve transferability across graph tasks, and the observed performance gains of SCR over ULTRA validate this contribution. We will include a more in-depth discussion on the challenges of heterophily and potential mitigation strategies in the revised manuscript, and we plan to explore this direction further in future work.

4. W4: Over-smoothing and over-squashing are known issues in message-passing-based models, especially in deep architectures or long-range dependency tasks. The paper does not provide any discussion or evaluation on these issues. Benchmarks like LRGB or synthetic long-range tasks would help assess whether the model is scalable in such settings.  

We thank the reviewer for highlighting the critical issues of over-smoothing and over-squashing in message-passing models. While these are well-known challenges, our current focus is to investigate the semantic and structural transferability of KG pre-training to diverse graph tasks—a core goal in the emerging Graph Foundation Model (GFM) paradigm.

We agree that a discussion of these challenges is valuable. Over-smoothing typically occurs in deep architectures, where repeated message passing leads to uniform node embeddings. In our model, however, message propagation occurs over relatively shallow subgraphs (3-6 layers), which mitigates this risk. Furthermore, our message passing is source-conditioned: only the source node eqe_q is initialized with a non-zero embedding, and information flows outward. Thus, each node’s embedding is a view from eqe_q, not a globally shared representation, preserving diversity.

Over-squashing, caused by bottlenecks in aggregating long-range messages into fixed-size vectors, may arise due to dense multi-hop paths. To address this, we employ the expressive PNA aggregator and set the hidden dimension to 64, balancing representational capacity with computational efficiency, while also accommodating both BERT-based textual features and common graph-domain features.

It is worth noting that unlike conventional GNNs, Graph Foundation Models are still in their early stages, and many associated challenges remain open. Our work takes a step forward by focusing on cross-task and cross-graph transferability, which we believe is largely orthogonal to the over-smoothing, over-squashing, and long-range issues. Nonetheless, we agree that evaluating performance on deep or extreme-range benchmarks would further validate scalability. We will include a dedicated discussion on these challenges and possible mitigation strategies in the revised manuscript, and we consider this an important direction for future work.

5. Q1: Section "CMP-based Backbone Model" is crucial in understanding the paper, but it is not carefully written. Could the author elaborate more, such as what is "conditioned on the query" or the initialization process of hvh_v?  

Thank you for pointing this out. We will revise the manuscript accordingly, which first gives the high‑level intuition and then provides the formal update rules.

In CMP, the learned node embeddings are conditioned on a query (eq,rq)(e_q, r_q), where eqe_q is the source node and rqr_q is the specific relation embedding. At initialization, only the source node carries information: its hidden state hqh_q is set to a non‑zero vector determined by rqr_q, while all other nodes are zeroed out. During message passing, this signal propagates outward, and each target node eve_v ultimately learns an embedding hvh_v that reflects how it is viewed from the perspective of (eq,rq)(e_q, r_q).

评论

Thank the author for the detailed response. However, I still want to see the proposed framework's performance on long-range graph benchmarks (LRGB) and I would recommend the author to include a subsection of this task in the paper. With that said, I'm willing to consider raising my score if the experiment on LRGB is provided.

评论
MethodPascalVOC-SP Subset (Macro F1)
ULTRA (full labeling)0.039
SCR (full labeling)0.053
GCN (full training)0.101
GraphTransformer(full training)0.121
-------------------------------------------------------------------------
**SCR (20% labeling) **0.051
GCN (20% training)0.046
GraphTransformer(20% training)0.052

Thank you for the constructive suggestion! We fully agree that evaluating on long-range graph datasets will further clarify our method’s capability.

Due to the time and resource limit, we selected a subset of PascalVOC-SP (685K nodes, 5M edges) to evaluate the node classification task. While the current LRGB benchmark assumes fully-supervised training on very large graphs, our proposed SCR was designed for zero-shot generalization. SCR only saw the label information during inference but no any fine-tuning on them. SCR delivers a substantial gain over ULTRA, but under full supervision, it still trails GCN and GraphTransformer. We believe that large-scale training enhances the performance of GCN and GraphTransformer.

To mitigate the effect of training, we use a few-shot setting. GCN and GraphTransformer are trained on 20% samples, while SCR remains strictly zero-shot, merely seeing those labels at inference. Under this setting, SCR matches GraphTransformer and outperforms GCN, demonstrating that it retains long-range ability even without task-specific training.

Sorry again for no running the entire LRGB benchmark (hundreds of millions of labels) during the rebuttal phase. Nevertheless, our method is compatible with the optimization strategies outlined in Response 4 to Reviewer zZR9 to solve the scalability challenge. We will include these new results, along with a dedicated subsection discussing long-range ability and scalability challenges, in the revised manuscript.

审稿意见
5

This paper introduces SCR, a novel graph foundation model designed to address the challenge of transferring knowledge from knowledge graphs (KGs) to general graph domains and tasks without requiring extra fine-tuning. Inspired by recent advances in zero-shot KG reasoning, SCR unifies diverse graph tasks (node, link, and graph classification) into an inductive KG reasoning framework. The authors propose a task-specific KG transformation that reframes general graph problems as KG reasoning, and introduce Semantic Conditional Message Passing (SCMP), which integrates both structural and semantic information to overcome the "semantic isolation" problem in traditional KG models. Extensive experiments on 38 datasets across multiple domains demonstrate that SCR achieves strong zero-shot and few-shot generalization, outperforming existing graph foundation models and supervised baselines on a range of inductive reasoning and classification tasks. The paper provides detailed ablations and discussions on the impact of semantic features, model components, scalability, and limitations.

优缺点分析

  • Strengths

    • The paper provides a comprehensive empirical evaluation across 38 datasets spanning node-level, link-level, and graph-level tasks. This broad experimental scope convincingly demonstrates the generalizability and robustness of the proposed approach across a wide range of domains and graph types.
    • The motivation for bridging knowledge graph reasoning and general graph foundation models is well-articulated, and the paper offers a thorough description of the SCR framework and its components. The unified transformation of graph tasks into KG reasoning is clearly presented, and the introduction of Semantic Conditional Message Passing (SCMP) is well-explained. Most methodological details are accessible, with only minor sections that could benefit from further clarification.
  • Weaknesses

    • The main figure intended to illustrate the SCR framework does not significantly aid understanding of the method. As presented, it lacks clarity and does not effectively convey the model’s workflow or the intuition behind key components, which could hinder readers’ comprehension.
    • The main innovation appears to be a practical and effective integration of node semantic information based on the ULTRA framework. The introduction of virtual nodes to convert various tasks into reasoning tasks on graphs primarily serves this integration, which to some extent reduces the novelty of this work.
    • The largest performance gains are observed in graph classification tasks. However, it remains unclear to what extent the improvement is attributable to the introduction of semantic information. If such ablation studies have not been conducted, additional experiments isolating the effect of semantic features on graph classification would strengthen the claims. Additionally, did SCR (3g), ULTRA (3g) and ProLINK(3g) consistently use the same KGs?
    • The methodology section could be better organized. Currently, the lack of an initial high-level overview of the entire approach makes it difficult for readers to grasp the core ideas before delving into technical details. This affects the overall readability and the depth at which readers can understand the method.

问题

Please see weaknesses.

局限性

Yes

最终评判理由

N/A

格式问题

N/A

作者回复

We thank the reviewer cmwp for the insightful and valuable comments. We sincerely hope that our rebuttal adequately addresses your concerns. If so, we would deeply appreciate it if you could raise your score. If not, please let us know if you have further concerns, and we will continue actively responding to your comments.  

1. W1: The main figure lacks clarity and does not effectively convey the model’s workflow.

We sincerely thank you for this comment. Because external figure links are not permitted during the rebuttal phase, we describe the points we plan to revise. First of all, we will redraw all the arrows and reorganize the two bottom subfigures, so that the workflows for pre‑training on KGs and reasoning on general graphs are shown clearly. Besides, we also modify the caption to clarify the role of each component. The updated figure will be added in the revised version.

2. W2: The main innovation appears to be a practical and effective integration of node semantic information based on the ULTRA framework. The introduction of virtual nodes to convert various tasks into reasoning tasks on graphs primarily serves this integration, which to some extent reduces the novelty of this work.  

Thank you for raising this point. We agree that the use of virtual nodes is a common design choice, but it remains essential to unify different graph tasks. The novelty of our work is to generalize the inductive KG reasoning method within the ULTRA framework to handle diverse tasks on general graphs, providing a practical recipe for building graph foundation models.

3. W3: Additional experiments isolating the effect of semantic features on graph classification would strengthen the claims. Additionally, did SCR (3g), ULTRA (3g), and ProLINK(3g) consistently use the same KGs?  

We sincerely appreciate these valuable comments.

In Section 5.5, we conducted ablation studies to verify the impact of semantic features on node/graph classification. As shown in the right figure of Figure 3, the reduced performance of “w/o NPSR” emphasizes its important role in leveraging semantic diversity. Meanwhile, the Semantic-Injected Entity Initialization and Relation Representations also play essential roles. Additionally, in Table 3, two graph classification datasets we evaluated (IMDB-BINARY, COLLAB) are featureless, which indicates that the performance gain is not only derived from semantic features.  

The three compared methods (SCR, ULTRA, ProLINK) use identical underlying topological graphs. The critical distinction is that only SCR incorporates semantic features to construct additional semantically-similar edges, while ULTRA and ProLINK rely solely on raw graph structures.
 

4. W4: The methodology section could be better organized. An initial high-level overview of the entire approach is needed.  

We sincerely appreciate this constructive suggestion. To improve clarity, we will relocate the workflow description currently in Section 4.3 to the beginning of Section 4, serving a high-level overview of the entire approach accompanied by Figure 1. This will provide readers with a clear roadmap before delving into implementation details.

评论

Thank you for the author's response. However, after reading the explanation, my main concern still remains: to what extent is this work an incremental improvement over ULTRA? The author's reply has not fully alleviated my doubts on this issue.

I must admit that this research area is not my primary expertise, so my evaluation of other aspects of the paper may not be entirely precise. In light of this, I will maintain my current score for now. I look forward to and trust that other reviewers, who are more specialized in this area, will provide a more professional assessment regarding the relationship with ULTRA, and I hope they will pay particular attention to the actual originality of this work.

评论

We appreciate your feedback and clarify that our work differs from ULTRA in scope, methodology, and objectives. While ULTRA pioneers transferable representations for knowledge graphs (KGs), its design is intrinsically tied to KG-specific structures (e.g., entities/relations) and focuses exclusively on link prediction. In contrast, SCR addresses a broader challenge: enabling zero-shot reasoning across general graph tasks (node/link/graph-level) and diverse domains beyond KGs. Our key innovations directly tackle this goal:

1.Unified Task Formalization: We design task-specific KG structures to convert diverse graph tasks (e.g., social network classification, molecular property prediction) into a uniform topological framework, eliminating the need for task-specific fine-tuning.

2.Semantic-Structural Fusion: We propose semantic-conditioned message passing, a novel mechanism that jointly models structural and semantic invariance patterns. This explicitly resolves semantic isolation in traditional KG inductive reasoning, allowing generalization to non-KG graphs with unstructured features.

3.Extensive Cross-Domain Validation: SCR is evaluated on diverse datasets across multiple task types—far exceeding ULTRA’s KG-only link prediction setup. Our zero-shot results demonstrate consistent gains over foundation models (including ULTRA) and supervised baselines, proving unique adaptability to unseen graphs without fine-tuning.

Thus, we argue that SCR is not an incremental improvement but a paradigm shift toward a truly universal graph foundation model, bridging KG reasoning principles to general graph intelligence. We appreciate your time and expert guidance during this review. Your feedback significantly improved our paper, and we stand ready to address any further queries.

最终决定

The authors cast node classification, graph classification and link prediction as instances of KG completion through suitable introduction of new nodes and edges and graph transformations. They adopt recent techniques to inform a GNN with an initial query to compute more capable edge relation representations (and induced contextual node representations). Together, these steps allow the users to provide a "foundation" model for all these classes of graph inference without additional fine tuning. Extensive experiments on 38 datasets spanning multiple domains show that SCR achieves strong zero-shot and few-shot performance.

Although the topic is densely populated in recent literature, and the submission does not have an easy time setting off its novelty against all of it, it seems to have enough merit for a poster accept.

There was a fairly active and productive rebuttal session where most concerns were addressed satisfactorily.

The title may be slightly misleading to readers who are interested in graph foundation models that solve more discrete or combinatorial problems around graphs. It would be nice to reword the title to make this clear.

The exposition can be greatly improved. At least the abstract and intro should set up terminology that is standard in the community and not coined in just one or two papers cited here. E.g., "the semantic isolation issue", "semantic-conditioned message passing" (in general avoid "semantic" anywhere), etc. Take the reviewers' comments about improving writeup seriously.

L28 "KG reasoning, also known as KG completion" ― many would disagree with this conflation. Instead say "KG completion, a popular KG reasoning task".

L51 "initiation" of what? Make this clear at this point, without expecting the reader to read ahead a couple sections.

L155 this limitation holds for only the Init you proposed. There can be other more natural Init functions without this limitation.

In eq (4), X~\tilde{\mathcal{X}} is computed, but not used in the second SVD computation. Why are there two U\bm{U} matrices returned by SVD if X\mathcal{X} is not psd?

It is not clear why SVD works in this setting. It is possible for singular values close to each other two swing around singular vectors by 90 degrees, leading to falsely low similarity between training and test situations.

Intuition behind va\bm{v}_a is not presented properly.

You want a single is_attributed_with, or two such relations (one for nodes, one for whole graphs)?