From Theory to Practice: Rethinking Green and Martin Kernels for Unleashing Graph Transformers
We proposed new structural encodings for graph transformers based on the Green and Martin kernels. Our approaches achieve SOTA performance on 7 out of 8 benchmark datasets, particularly excelling in molecular and circuit graphs.
摘要
评审与讨论
This paper proposes a new graph transformer model using Green and Martin kernels. Specifically, GKSE and MKSE are defined as the structural encoding (SE) using Finite-Step Green Kernel and approximated finite-step Martin Kernel, respectively. They are used as the graph transformer's attention mechanism.
Theoretical analysis compares the representational power of the proposed methods, GKSE and MKSE, with other SEs. Numerical experiments apply graph transformers with GKSE and MKSE to graph prediction tasks to evaluate practical performances. In addition, encodings of molecules (aperiodic graphs) and circuits (DAGs) are visualized to tweak the factors that improve the prediction performances.
Update after rebuttal
I thank the authors for responding to my review comments. I am satisfied with them. Although the numerical evaluations are rigorous, they are not significant enough to be eligible for strong acceptance, even though the novelty and significance lie in the theoretical aspects. Therefore, I keep my score (4. accept)
给作者的问题
N.A.
论据与证据
This paper makes the following claims about GKSE and MKSE:
- High performance in prediction tasks on various sizes of graph benchmark datasets.
- Ability to capture long-range interactions.
- High performance on molecular graphs, including aperiodic graphs, and circuit graphs, including DAGs.
These claims are well-supported by evidence for the following reasons:
- Datasets with various average node sizes are adopted (Table 5.)
- Long-range datasets are adopted (Table 2.)
- Molecular dataset (PCQM4Mv2) and circuite dataset (Ckt-Bench101, Ckt-Bench301) are adopted.
This paper also claims in the abstract that the proposed method is theoretically grounded. This claim is well-supported because of the theoretical guarantees on the representation capability (Theorem 4.3, Corollary 4.4, Theorem 4.5) and the quantitative approximation between the Kernel used in MKSE and the finite-step Martin Kernel (Theorem 4.2).
方法与评估标准
If I do not miss any information, it is not indicated (at least explicitly) what kind of problem this paper wants to solve or what kind of issues the existing Graph Transformer research has (in my understanding, this paper's motivation for employing Green and the Martin Kernels is that Graph Kernel theory suggests that it captures the topology of graphs.) However, the numerical experiments employ diverse datasets and baseline models, which is appropriate for supporting claims 1--3 above.
理论论述
I check the proof of the theorems in the Appendix. I did not check it rigorously enough to reproduce the proof by myself. However, as far as I check, I do not find any critical mistakes. In addition, the mathematical statements and proofs are clearly written and easy to read.
实验设计与分析
The proposed methods are applied to GRIT only in the main text. However, CKGCN, another type of graph transformer, is employed in the Appendix (Section B.2). Therefore, the performance improvement is not specific to a particular type of graph transformer, although it is difficult to say that the proposed methods are universally effective.
Regarding hyperparameters, sensitivity analysis is performed for the number of steps , which is the most important parameter. However, since it is not explicitly stated how the hyperparameter search is conducted (unless I missed it), I cannot judge whether comparing the proposed and baseline methods is fair in terms of hyperparameter choice.
补充材料
I checked all parts of the Appendix.
与现有文献的关系
Methods for structural encoding using random walks on graphs were proposed by [Dwivedi et al. 2022a, Geisler et al. 2023, Ma et al. 2023a, Zhou et al. 2024]. Since this paper proposes new methods of structural encoding using Green and Martin Kernels, which are random-walk-based kernels that capture a graph's topology, we can place this paper in this line of research. The issues of existing studies are not explicitly stated in the abstract or introduction. However, for RRWP [Ma et al., 2023a], this paper points out in the experiments that it is unstable in aperiodic graphs and DAGs and that the proposed methods solve this problem.
遗漏的重要参考文献
To the best of my knowledge, I do not find any references that are not cited.
其他优缺点
The paper is clear and readable. The organization is good, and the mathematical description is accurate and easy to read. In addition, the appendix provides basic knowledge about Markov chains on graphs, which makes the paper accessible to those who are not familiar with this field.
其他意见或建议
- l.253 (right): What does SPD stand for?
伦理审查问题
N.A.
We sincerely thank the reviewer for the constructive and encouraging feedback. We are glad that the mathematical rigor, clarity, and structure of our paper were positively noted. Below, we address each comment in detail.
1. Motivation and Problem Statement
We appreciate the suggestion to clarify our motivation. Our goal is to connect theoretical kernels from stochastic process theory with structural encodings (SEs) in graph neural networks.
Classical kernels such as Green and Martin kernels capture long-term behavior of random walk and global structural properties. While widely used in potential theory, they are often inapplicable in graph learning due to assumptions like non-finiteness or transience of the graph. As explained in Section 3.3, many such kernels do not extend directly to real-world graphs.
Our contribution lies in reformulating these kernels into scalable and adaptable encodings—GKSE and MKSE—which preserve theoretical grounding while enabling practical use as absolute and relative SEs. This reformulation retains the theoretical meaning while making them usable in practice.
2. Broader Applicability Beyond GRIT
Our main experiments use GRIT because it separates the effect of SEs and supports both absolute and relative encodings. This makes it ideal for isolating the impact of replacing RRWP with our proposed SEs.
To demonstrate generality, we also applied our SEs to CKGConv, a different GNN architecture (Appendix B.2). These results show consistent gains, indicating our SEs are not GRIT-specific. We plan to expand to more architectures in future versions.
3. Hyperparameter Sensitivity and Fairness
We agree that the number of steps is a key hyperparameter. For the OCB datasets (Ckt-Bench101, Ckt-Bench301), we reused hyperparameters from ZINC due to their similar size and structure (see Tables 6 and 8). We varied linearly and presented results in Table 11, which show meaningful performance changes and confirm that an optimal exists.
Importantly, even when is not finely tuned, our SEs achieve performance comparable to strong baselines, suggesting that they exhibit enough performance without extensive hyperparameter search. This also implies that using smaller can reduce precomputation time without sacrificing much performance.
We also evaluated whether similar trends hold for RRWP, a structurally comparable SE. However, we observed that RRWP does not yield the same level of performance as our proposed SEs under similar conditions, further validating the effectiveness of our kernel-based approach. These details will be added to Appendix B.3.
4. Clarification of SPD (l.253)
SPD stands for Shortest Path Distance, defined earlier in line 248. We will ensure the abbreviation is properly introduced upon first use to improve clarity.
5. Comparison with Prior Work and RRWP
We appreciate the reviewer’s summary that positions our work within the broader literature on random walk-based SEs. As noted, our method extends this line of research by connecting SEs to well-established mathematical kernels. Unlike methods such as RRWP, which compute expected walk counts over fixed-length paths, GKSE and MKSE derive from limiting behaviors of Markov chains, capturing deeper global structure.
The instability of RRWP in non-aperiodic or DAG-like graphs is discussed in our experimental results and qualitative analyses (e.g., Figure 1 and 2). These issues are theoretically motivated and practically observable in molecular and circuit datasets.
Additionally, prior works like [Dwivedi et al. 2022a, Geisler et al. 2023, Zhou et al. 2024] mainly design absolute SEs without supporting relative SEs—a key distinction from our approach.
6. Summary of Contributions
To summarize:
- We introduce GKSE and MKSE, novel SEs grounded in Green and Martin kernels with strong theoretical backing.
- These SEs are efficiently computable and compatible with both absolute and relative forms.
- We show empirical gains in datasets where global or asymmetric structures are important (e.g., PCQM4Mv2, Ckt-Bench101).
We appreciate the reviewer’s recognition of the mathematical contributions and clarity of our work. We will revise the introduction and implementation sections to better communicate our motivation and ensure reproducibility.
Conclusion
We thank the reviewer once again for the detailed and constructive feedback. Your comments have helped refine both our presentation and experimental justification. We are encouraged by your evaluation and will incorporate all suggestions in the final version.
I thank the authors for answering my questions and comments. I am satisfied with the authors' responses. I check the other reviewers' discussions and will update the score if necessary.
This paper builds on the framework introduced in GRIT, where a carefully chosen set of relative positional/structural embedding (SE) is defined at the input of the model, which is then updated every layer. Specifically, the paper introduces two random-walk based SEs - GKSE and MKSE, which are well motivated with the included theory. Experimental results are also provided on a variety of popular graph benchmarks.
给作者的问题
-
My main concern with this paper is regarding the strength of empirical results presented. This concern is described in the second point of "Claims And Evidence," and I would request the authors to take action to improve the statistical significance of the results presented.
-
Isn't GKSE a linear transformation of RRWP? In that case, could the network not easily learn GKSE from RRWP in the first layer itself?
论据与证据
The paper introduces GKSE and MKSE, and their formulation appears to be rigorously defined. The extension of random walks using Green and Martin kernels appears to be a valid theoretical contribution.
The paper claims that the method achieves SOTA results on 7 benchmarks but based upon the margin-of-error with different seeds, there is only evidence for SOTA performance on 3 benchmarks. In particular:
- Table 1: Only MNIST shows a statistically significant improvement over baselines.
- Table 2: None of the two datasets show statistically significant improvement over GRIT+RRWP. Moreover, the numbers reported do not come close to the leader in the leaderboard present at [2] (maintained by the original authors of the dataset's paper).
- Table 3: Improvement over baselines is statistically significant on PCQM4Mv2.
- Table 4: Only Ckt-Bench101 with GRIT+GKSE shows a statistically significant improvement.
Since improvements on other benchmarks are small in comparison to the margin of error, if the paper wishes to claim for SOTA performance on the other tasks presented, it should either (a) provide experiments with a higher (number of seeds) so that any reasonable statistical support for this claim may be made independently by the readers, or (b) present convincing statistical analysis of its own supporting this claim.
The claim that GKSE and MKSE handle non-aperiodic substructures (present in molecular datasets) is nicely aligned with their theoretical properties. However, clear empirical evidence for such an increased ability to deal with molecular datasets is only provided in the PCQM4Mv2 benchmark (Table 3). More examples of such clear improvements over baseline would be needed to better support this claim.
The paper claims the theoretical properties of GKSE and MKSE make it suitable for datasets with DAG structures. However, similar to the previous point, a stronger empirical justification would help strengthen this claim. Of the two circuit datasets considered, the presented technique shows improvements larger than the margin of error (barely) on only one dataset.
方法与评估标准
Yes, evaluation on the included graph-level tasks makes sense for testing a relative structural encoding scheme.
理论论述
I have checked the proofs for Theorems 4.1 and 4.2, and they seem correct.
实验设计与分析
I have concerns regarding the statistical significance of many of the SOTA results claimed by the paper and are detailed in point 2 of the "Claims And Evidence" part of the review.
补充材料
Yes, I have reviewed parts A and B of the Appendix.
与现有文献的关系
Position/Structural encodings for graph transformers is an active and important area of research. This paper builds on an existing SE framework (GRIT) and proposes two new encodings, resulting in a broadening of the library of SEs presented in literature.
遗漏的重要参考文献
None noted
其他优缺点
Strengths:
- The paper is well written, with easy to follow text and math.
- The development of the GRIT framework with well-motivated SE initializations is a valuable contribution.
- Experimental results are provided on a large collection of benchmarks.
- A well-chosen set of theorems and proofs is included in the paper.
Weaknesses:
- Since the paper is presenting general SEs that could be used with many base graph-transformer architectures, it would be necessary to see results on all datasets with architectures other than GRIT in the main text. Currently results are only presented for some datasets in the Appendix.
其他意见或建议
None
We thank the reviewer for the positive comments on our theoretical development, writing quality, and comprehensive benchmarking. We also appreciate the thoughtful concerns raised regarding statistical significance and generalization. Below, we respond to each point in detail.
1. Clarification on SOTA Claims and Statistical Significance
We fully agree that our previous statement claiming “SOTA on 7 benchmarks” was overstated. Since many improvements fall within the margin of error (N = 4), we will remove all such claims from the paper.
Regarding (a), we used N = 4 to match the experimental setup in GRIT and ensure consistent comparisons. However, we acknowledge this is insufficient for statistical rigor. We are currently running additional trials with a larger number of seeds across datasets and will include extended results in the final version.
Regarding (b), we conducted t-tests and observed statistically significant improvements (p < 0.05) in PCQM4Mv2 and OCB (results in Table 3, 4). Other cases did not show significance, likely due to small sample size. We agree that larger N is needed to draw solid conclusions, and will re-run all key experiments with more seeds.
2. Empirical Justification for Theoretical Advantages
We appreciate the reviewer’s acknowledgment of the connection between theory and empirical results. PCQM4Mv2 and Ckt-Bench101, with non-aperiodic and DAG-like structures, clearly demonstrate the strengths of our SEs.
We agree that additional evidence would be valuable. While benchmarks with similar structure are limited, we are exploring domains such as knowledge graphs and program analysis, where our methods may further excel.
3. Evaluation Beyond GRIT
The reviewer highlights the need to evaluate the generality of our SEs on other architectures. GRIT was selected because it cleanly isolates the effect of SEs by explicitly using both absolute and relative encodings. This allowed us to assess the performance impact of replacing RRWP with our SEs in a controlled setting.
Nonetheless, our SEs are not limited to GRIT. As shown in Appendix B.2, we applied them to CKGConv—a different architecture—and observed consistent gains. We plan to extend evaluations to additional GNNs in future versions.
4. Relationship Between GKSE and RRWP
As noted in Appendix D.4, GKSE can be expressed as a linear transformation of RRWP under specific conditions. However, despite this theoretical link, our experiments show that initializing with GKSE leads to better empirical performance.
This mirrors known results in deep learning: different initializations (e.g., Xavier vs. He) can lead to distinct learning behaviors, even when functionally equivalent in theory. Thus, GKSE and RRWP differ in practice in meaningful ways.
5. Experimental Scope
We thank the reviewer for acknowledging our broad experimental coverage. While gains vary across datasets, our SEs provide consistent improvements, particularly in graphs with structural complexity. Even when improvements are small, our SEs offer a principled, scalable, and interpretable alternative to existing approaches.
Their effectiveness is most pronounced in domains where global structure and asymmetry matter—settings that standard benchmarks may not fully reflect. We believe our work lays the foundation for broader adoption of mathematically grounded SEs.
6. Summary of Actions
- Removed all exaggerated SOTA claims.
- Expanded experiments with more seeds for reliable significance testing.
- Clarified the theoretical and empirical differences between GKSE and RRWP.
- Demonstrated broader applicability through CKGConv results (Appendix B.2).
Conclusion
We thank the reviewer again for the constructive and fair evaluation. Your feedback has helped us refine our claims and improve the clarity and rigor of the paper. We believe the theoretical insights, along with targeted empirical results and ongoing evaluations, support the relevance and contribution of this work.
This paper proposes two new structural encodings for graph transformers, Green and Martin kernels. These two kernels are some extension of the random walk kind. This paper also demonstrates that the proposed two kernels outperform the existing one.
给作者的问题
I encourage authors to answer the question above.
论据与证据
The theoretical claims are interesting. Particularly, Thm 4.1 and Thm 4.2 give theoretical foundations of the proposed algorithms.
However, a real question is, why do we need this?
i) Almost all experimental results do not show a significant improvement over the random walk + GRIT, where the figures shown is very close to each other.
ii) The formulation of Green Kernel kind is somewhat similar to the page-rank based on in [1]? Can you please give an explanation of the difference between the Green Kernel and [1]?
I am quite not sure why we need a newer version than random walk. What is the limitation of random walk one? What is the advantage of the proposed ones? In which scenario these can improve the existing random walk? Since this paper aims to provide theoretical justification, it would be very nice if you can provide theoretical explanation.
Therefore, I am not that convinced by the claims this paper made.
[1] Pan Li et al. Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning. NeurIPS 2021.
方法与评估标准
I am not quite sure which problem authors try to address. What is wrong with random walk, given the very marginal improvement in the experiment.
理论论述
Theoretical claims are interesting -- if this is a novel result.
实验设计与分析
The authors provided a fair experimental setting.
补充材料
I haven't carefully read the supplementary material.
与现有文献的关系
As I repeatedly say, I am not quite sure what is the key contribution of this paper. What is the improvement over the existing work?
遗漏的重要参考文献
Maybe it would be very nice if the authors can relate to [1]
其他优缺点
N/A
其他意见或建议
This comment does not directly affect my rating.
Authors may want to come up with better prompt to save "big words", if authors rely on the GenAI writing tools. For example, in abstract,
This work underlines the promise of theoretically grounded, AI-ready kernels in advancing the capability and scope of Transformer architectures for graph-based learning.
What is the promise? What is the AI-ready kernels? These big words may blur the authors' real contributions.
I cannot rule out I am indirectly affected by these big words, since my questions are concentrated on the questions like what are the key contributions.
We appreciate the reviewer’s detailed feedback and critical questions, which allow us to clarify both the motivation and contributions of our work. Below, we address the main concerns.
1. Limitations of Existing Random Walk-Based SEs
We agree that random walk (RW)-based SEs, such as RRWP, are already effective in many graph transformer models. Our work does not claim these methods are flawed; rather, we aim to reformulate the concept of structural encoding through a more mathematically grounded lens. Existing SEs are typically designed heuristically based on the network centrality theory, or with limited theoretical support beyond local transition statistics.
Our SEs—GKSE and MKSE—are derived from stochastic process theory. Green and Martin kernels arise in the asymptotic analysis of random walks and are widely used in potential theory to describe the global behavior of Markov chains on graphs. They can uniquely encode specific structural information, such as non-aperiodic and DAG-like structures. As shown in Section 5.2, the values produced by our SEs differ significantly from RRWP across various graph topologies, highlighting their distinct representational properties.
2. Difference Between Green Kernel and PageRank
Although Green kernels are algebraically similar to Generalized PageRank (GPR) [1], they differ in key ways:
-
Mathematical Origin: GPR focuses on signal smoothing and centrality, while Green kernels emerge from the resolvent of the Laplacian and describe expected visit counts in infinite random walks. This gives them a formal stochastic interpretation, which we rigorously connect to structural encoding via Theorem 4.1 and 4.2.
-
Usage in GNN: GPR is typically used as a node feature or precomputation step. In contrast, we use Green and Martin kernels as absolute and relative SEs that directly guide attention in graph transformers.
To our knowledge, this is the first work to adapt Green and Martin kernels as structural encodings in GNNs.
3. Why Use These SEs Despite Marginal Gains?
We agree that performance differences can be small on standard benchmarks, which often favor architectures over SE design. Still, our SEs show meaningful gains on topology-sensitive datasets like OCB (Ckt-Bench101, 301) and PCQM4Mv2, which contain DAG and non-aperiodic structures.
More importantly, our methods:
- Are model-agnostic and plug into any transformer architecture
- Are computationally efficient, computing recursively and avoiding eigendecomposition
- Are theoretically grounded, derived from stochastic process theory
Even modest gains can be impactful when combined with scalability and theoretical robustness, making these SEs useful in large-scale, real-world settings.
4. Theoretical Contributions of GKSE and MKSE
Our work provides several new theoretical insights:
- We connect Green and Martin kernels to SEs with provable structural meanings
- We establish formal results (Theorems 4.1–4.5) showing their expressiveness
- We show MKSE captures non-linear random walk behavior distinct from both RRWP and GKSE (Theorem D.3)
To the best of our knowledge, these are the first such results making these stochastic tools directly applicable to GNNs while preserving their mathematical essence.
5. Summary of Key Contributions
To summarize:
- Theoretical Innovation: Reformulating classical random walk kernels into structural encodings
- Architectural Generality: Easily applicable to transformer architectures
- Efficiency: No eigendecomposition, scalable to large graphs
- New Direction: Enables broader use of mathematical kernels in GNNs
We also acknowledge the reviewer’s note about wording in the abstract and will revise it to avoid vague phrases like “AI-ready” or “promise,” in favor of more precise claims.
6. Final Remarks
We appreciate the reviewer’s critical perspective and hope this rebuttal clarifies our contributions. Although the gains may be modest in some benchmarks, we believe the theoretical foundation, practical utility, and extensibility of our SEs make a compelling case.
We will revise the manuscript to clarify our distinctions from existing methods, including [1], and better communicate our theoretical and empirical contributions.
Thank you again for your thoughtful and constructive feedback.
This paper proposes to utilize green and martin kernels to build new SEs for better graph transformers. The proposed method extends previous RW-based SE such as RRWP in GRIT, demonstrating great empirical accuracy across multiple datasets.
给作者的问题
See above.
论据与证据
-
Both theoretical and empirical analyses of the proposed method clearly show its effectiveness.
-
Extending RW-based SEs with Green and Martin kernels can potentially bring more insights and observations for future research. Nonetheless, besides the observation of the connection between Green/Martin kernels and SEs, the proposed method may not bring fundamental improvement/breakthrough, i.e., the derived SEs seem to have minor differences compared with RRWP, and the empirical accuracy across multiple datasets only demonstrates marginal improvement, which may pose questions on the necessity/benefits of the proposed method and the positioning of the paper.
方法与评估标准
The evaluation criteria follow the standard of this area. The widely used datasets for evaluating graph transformers are included. However, this paper centers its comparison mostly on GRIT, without including newer results from the literature, e.g., [1]. Newer results without RRWP-like SE may exhibit even better empirical performance, which may again bring the concern about the necessity/positioning of the proposed SEs.
[1] Ding, Yuhui, et al. "Recurrent distance filtering for graph representation learning." arXiv preprint arXiv:2312.01538 (2023).
理论论述
I did not check the appendix, but I have checked the theoretical part in the main text, which justifies the design of GKSE & MKSE and their expressiveness.
实验设计与分析
See above.
补充材料
No.
与现有文献的关系
This paper is related to designing better graph transformers, in terms of both accuracy and efficiency. This paper centers mostly on the accuracy part by designing better SEs, a core part in graph transformers. More specifically, this paper is very related to GRIT, a paper utilizing RRWP to empower graph transformers with better performance. By extending RRWP with Green and Martin kernels, this paper further improves GRIT.
遗漏的重要参考文献
There are graph transformer-like methods that can improve performance without requiring complex design of SEs. The authors may want to discuss them, e.g., [1, 2]
[1] Ding, Yuhui, et al. "Recurrent distance filtering for graph representation learning." arXiv preprint arXiv:2312.01538 (2023).
[2] Wang, Xiyuan, Pan Li, and Muhan Zhang. "Graph as point set." arXiv preprint arXiv:2405.02795 (2024).
其他优缺点
Even though the empirical improvement does not seem that significant, the observed connection between Green/Martin kernels and SEs may potentially bring more insights for future research.
其他意见或建议
No.
We sincerely thank the reviewer for the detailed and thoughtful feedback. Below, we clarify the key motivations, contributions, and empirical implications of our work, while responding to specific concerns regarding novelty, positioning, and evaluation scope.
1. Novelty Beyond RRWP and Value of Kernel-based SEs
While our proposed SEs share a random-walk foundation with RRWP, they originate from Green and Martin kernels, classical constructs from potential theory and stochastic processes. These kernels reflect asymptotic behaviors of random walks, capturing structural phenomena such as non-aperiodicity, absorbing components, and hierarchical depth—aspects that finite-length walk-based encodings like RRWP may overlook.
Our contribution lies not in incremental variation, but in reformulating these kernels into scalable encodings for modern graph transformers, retaining their mathematical meaning. We believe this offers a new direction for structural encoding design, grounded in well-established theory.
2. Empirical Gains and Practical Implications
We agree that performance gains on standard benchmarks may seem modest overall, which reflects the nature of these datasets where local attention often dominates. However, we observe notable improvements in datasets with more complex global structures—PCQM4Mv2 (molecules) and Ckt-Bench101 (circuits)—where the structural sensitivity of GKSE and MKSE becomes evident (Table 3, 4; Appendix B.4–B.5).
Crucially, these gains are obtained without changing model architecture, demonstrating the practicality of our SEs as drop-in modules. Even small but consistent improvements from theoretically grounded, efficient encodings can accumulate meaningfully across applications in circuits, molecules, or program graphs.
3. Positioning Relative to Recent Literature ([1], [2])
We thank the reviewer for referencing [1] Recurrent Distance Filtering and [2] Graph as Point Set. These represent powerful yet fundamentally different approaches:
- [1] adapts message passing using learned filters based on node distance.
- [2] maps graphs to sets via spectral regularization and permutation-invariant networks.
Both methods involve architecture-level changes and rely on eigendecomposition, which can hinder scalability. In contrast, our SEs are model-agnostic, require no spectral operations, and target transformer-style models that lack native graph inductive bias.
Though our work differs in intent and scope, we agree that comparing across such methods adds valuable context. We are preparing further experiments and will include comparative discussion in the final version.
4. Experimental Scope and Use of GRIT
We chose GRIT as our primary architecture because it provides an explicit SE framework, including both absolute and relative positional encoding. It also uses RRWP as a core module, making it a natural baseline for assessing our SEs under controlled conditions.
Nonetheless, our SEs are not limited to GRIT. In Appendix B.2, we apply them to CKGConv, a structurally different graph transformer, and observe consistent improvements. This supports the general applicability of our method. We plan to expand these evaluations to additional models in future work.
5. Broader Relevance and Future Directions
Our work aims to bridge mathematical theory with practical GNN design. Reformulating Green and Martin kernels into usable SEs represents a principled step toward designing structural encodings with formal guarantees.
This approach opens promising future directions:
- Exploring other theoretical kernels (e.g., hitting time, diffusion).
- Applying to domains with structured topology (e.g., program graphs, knowledge graphs).
- Integrating with adaptive architectures like [1] for further synergy.
We hope this inspires more mathematically grounded exploration in GNN research.
Conclusion
We thank the reviewer again for their valuable insights. In response, we will:
- Clarify the novelty of GKSE and MKSE versus RRWP.
- Highlight broader architectural compatibility beyond GRIT.
- Expand the discussion to include recent related works and future directions.
We hope these clarifications address the reviewer’s concerns and underscore the contribution and relevance of our work.
I thank the authors for the detailed response. I believe this paper can still be valuable for future research on PE/SE due to the clear reformulation of those kernels. Therefore, I will increase my rating.
This paper suggests strucutural encodings for graph transformers via random walks based on the Green and Martin Kernels. Results across multiple benchmarks were consistenly better, often but not always very marginally so. The reviewers had no argument with the soundness of the paper, but some of them were not convinced by the importance of the problem or significance of the empirical results. I recommend acceptance, if there is room, since overall this paper present a novel idea, with theoretical grounding and reasonably good empirical results.