6.4

/10

Poster4 位审稿人

最低3最高5标准差1.0

3.5

置信度

创新性2.5

质量3.0

清晰度3.3

重要性3.0

NeurIPS 2025

Interpretable and Parameter Efficient Graph Neural Additive Models with Random Fourier Features

Thummaluru Siddartha Reddy,Vempalli Naga Sai Saketh,Mahesh Chandran

OpenReview PDF

提交: 2025-05-10更新: 2025-10-29

TL;DR

Parameter efficient interpretable neural additive model for graph data

摘要

Graph Neural Networks (GNNs) excel at jointly modeling node features and topology, yet their black-box nature limits their adoption in real-world applications where interpretability is desired. Inspired by the success of interpretable Neural Additive Models (NAM) for tabular data, Graph Neural Additive Network (GNAN) extends the additive modeling approach to graph data to overcome limitations of GNNs. While being interpretable, GNAN representation learning overlooks the importance of local aggregation and more importantly suffers from parameter complexity. To mitigate the above challenges, we introduce Graph Neural Additive Model with Random Fourier Features (G-NAMRFF), a lightweight, self‐interpretable graph additive architecture. G-NAMRFF represents each node embedding as the sum of feature‐wise contributions where contributions are modeled via a Gaussian process (GP) with a graph- and feature-aware kernel. Specifically, we construct a kernel using Radial Basis Function (RBF) with graph structure induced by Laplacian and learnable Finite Impulse Response (FIR) filter. We approximate the kernel using Random Fourier Features (RFFs) which transforms the GP prior to a Bayesian formulation, which are subsequently learnt using a single layer neural network with size equal to number of RFF features. G-NAMRFF is light weight with $168\times$ fewer parameters compared to GNAN. Despite its compact size, G-NAMRFF matches or outperforms state-of-the-art GNNs and GNAN on node and graph classification tasks, delivering real-time interpretability without sacrificing accuracy.

关键词

InterpretabilityGraph Neural networksFinite Impulse Response FiltersRandom Fourier Features

评审与讨论

审稿意见

评分: 3置信度: 32025-06-26

This paper proposes a new model for graph data named Graph-aware neural additive model with random features (G-NAMRFF). It assumes that the outputs are additive functions. The component-functions are drawn from a Gaussian prior (GP) with covariance generated from a graph-reweighed square exponential kernel. Two theoretical results are derived in Section 4. The method is compared to standard procedures in Section 6 for node and graph classification datasets.

优缺点分析

Strengths:

I did enjoy reading the article. Everything is well-explained and clean.
The approach seem to be motivated by proposing small models. It is highlighted that the approach requires 168 times less parameters than GNAN without destroying the performance.

Weaknesses:

My overall impression is that everything is quite straightforward: The method is just regression with a specific GP, Theorem 4.1 seems (at least intuitively) to be clear. While there is nothing wrong with the paper, I did miss some deeper insights.
The expressivity of an additive model is very limited and in learning tasks often just inappropriate, e.g. if the inputs are highly correlated. The authors should have invested much more in motivating how the inclusion of the covariance could overcome that such that one can learn large function classes with G-NAMRFF. Without some more theoretical insights on that, I find the approach hard to believe.

问题

Could you make this deep, by staking G-NAMRFF layers on top of each other?
I did not understand Theorem 2.1 (for instance what do you mean with approximated?) and the statement could be wrongly formulated. Could you provide a reference or a proof?
Can you address the issue of the expressivity?

局限性

I did not find anything in the text discussing the potential limitations of this work.

最终评判理由

The answers of the authors mainly confirmed my initial assessment. While this is a timely topic, I do not find the article sufficiently innovative and also think that it is sometimes a bit sloppy. An example is the discussion about Theorem 2.1. To my opinion, the way this theorem is stated is impossible to process by a reader.

格式问题

None

作者回复

2025-07-28

We thank the reviewer for the positive feedback and also happy to hear that you enjoyed reading the article. We address the specific questions in detail below:

The method is just regression with a specific GP, Theorem 4.1 seems (at least intuitively) to be clear. While there is nothing wrong with the paper, I did miss some deeper insights.

We thank the reviewer for this comment. To emphasize the strengths of our approach and contrast it with traditional GP regression on graphs, we briefly review prior graph‑GP methods. Most of these methods place a full prior

$f \sim \mathcal{GP}\bigl(0,K\bigr),\quad \text{where} \quad K\in\mathbb{R}^{N\times N},$

where $K$ may be a graph aware Matérn kernel [1] or a wavelet‐inspired kernel [2] to name a few. Inference is done by obtaining the posterior distribution and it involves inverting $K + I$ matrix, incurring $\mathcal{O}(N^3)$ time and $\mathcal{O}(N^2)$ memory. While they can quantify uncertainty—unlike GCN or GAT, which yield only point estimates. GP based models still exhibit some key limitations: 1. Lack of feature‑wise interpretability, since every prediction mixes all features and nodes through the dense inverse; 2. Scalability due to matrix inversion operation; 3. Cannot be linearized, because the most of these graph aware kernels do not satisfy the conditions stated by Bochners theorem. Importantly, these black box approaches to be interpretable/explainable requires a post-hoc explainers that adds complexity and lacks transparency. In contrast, our current framework is self interpretable by design.

Our framework is inspired by additive models and achieves feature‑level interpretability. We treat each feature independently (also check below for discussion on the correlated features), place a GP prior on its graph‑filtered values, and choose a kernel that satisfies Bochner’s conditions for shift invariance. More specifically, our kernel is RBF that operates on the filtered feature vector $\tilde x_i$ , it remains shift‑invariant —unlike conventional graph kernels. This design admits a RFF linearization, yielding a lightweight, linear‑in‑parameters model with $\mathcal{O}(NMD)$ forward‑pass complexity instead of costly matrix inversion.

Importantly this is the first work that introduces learnable FIR filters into the GP framework where each coefficient $\alpha_h$ directly quantifies the contribution of the $h$ -hop neighborhood, providing structural interpretability alongside feature‑wise insights. To the best of our knowledge, this is the first work to provide a theoretical justification for embeddings that are not only permutation equivariant but also stable under graph perturbations, owing to the use of finite impulse response (FIR) filters.

Finally, Theorem 4.1 states permutation equivariance property that captures the graph symmetry. We emphasize that this property is non-trivial due to non-linearity involved (contrary to any GP based regression algorithms). The novelty again lies in integration of FIR filters and RFF based approximations, which induces the structure dependencies while maintaining the permutation equivariance.

In summary, G‑NAMRFF transforms a black‑box, cubic‑cost GP regression into a glass‑box, linear‑cost, interpretable model, going well beyond standard GP approaches.

[1] Borovitskiy et al., “Matérn Gaussian processes on graphs,” AISTATS ’21.

[2] Fang et al., “Wavelet kernels for graph Gaussian processes,” ICML ’21.

Comment on expressivity

We thank the reviewer for the insightful comment. As pointed out correctly, G-NAMRFF models the output as a sum of univariate functions, each acting on a single feature independently. This design avoids feature mixing basically for two reasons: (1) To reduce the parameter complexity and design a very light-weight model (2) To position the model in line with recent interpretable models such as NAMs and GNAN. We acknowledge that this imposes a limitation on expressivity, as it does not explicitly capture inter-feature correlations. Despite this constraint, G-NAMRFF being light-weight matches or outperforms several black-box GNNs across benchmarks, demonstrating that the core design is effective for many practical tasks.

To further enhance expressivity while preserving interpretability, we highlight that G-NAMRFF can be readily extended using ideas from NODE-GAM [1] and GAMI-Net [2], which incorporate structured pairwise interactions. Specifically, G-NAMRFF can model both independent and pairwise feature effects as:

$y_{i} = \sum_{k=1}^{D} f_{k}(x_{i,k}) + \sum_{k=1}^{D} \sum_{k'>k}^{D} f_{k,k'}(x_{i,k},x_{i,k'})$
where $f_{k,k'}$ is modeled as GP is modeled as a Gaussian Process with a 2D input. These functions can be efficiently approximated using 2D random Fourier features (RFFs) as

$f_{k,k'}(x_{i,k},x_{i,k'}) = \mathbf{\Phi}_ {a}^{T}([\tilde{x}_{i,k},\tilde{x} _{i,k'}])\mathbf{w} _{k,k'}.$

The overall prediction of node $i$ then becomes

$y_{i} = \sum_{k=1}^{D} \mathbf{\Phi}_ {a}^{T}(\tilde{x}_{i,k}) \mathbf{w} _{k} + \sum _{k=1}^{D} \sum _{k'>k}^{D} \mathbf{\Phi} _{a}^{T}([\tilde{x} _{i,k},\tilde{x} _{i,k'}])\mathbf{w} _{k,k'}$

As discussed earlier, this further increases the complexity due to increase in the number of learnable parameters. We again emphasize that the main motivation involved in design of G-NAMRFF is the low-complexity without compromising the performance compared to the black-box models.

In the final draft, we will add a remark stating how G-NAMRFF can be extended to handle correlated features.

[1] Chang, Chun-Hao, Rich Caruana, and Anna Goldenberg. "NODE-GAM: Neural Generalized Additive Model for Interpretable Deep Learning." International Conference on Learning Representations 2022.

[2] Yang, Zebin, Aijun Zhang, and Agus Sudjianto. "GAMI-Net: An explainable neural network based on generalized additive models with structured interactions." Pattern Recognition 120 (2021): 108192.

Could you make this deep, by staking G-NAMRFF layers on top of each other?

Yes — in principle, G-NAMRFF layers can be stacked to increase model capacity. However, doing so introduces significant interpretability overhead. Each additional layer brings its own set of FIR filter coefficients $\{\alpha_h^{(\ell)}\}$ and RFF weight vectors $\{w_k^{(\ell)}\}$ , making it harder to attribute predictions to specific input features or graph structure components.

Unlike message-passing GNNs such as GCNs, which expand their receptive field across layers via

$X^{(\ell+1)} = \sigma\bigl(AX^{(\ell)}W^{(\ell)}\bigr),$

our model uses explicit FIR graph filtering, which already aggregates $R$ -hop neighborhood information in a single layer. Therefore, deeper stacking is not necessary purely for increasing receptive field — a single G-NAMRFF layer with properly tuned filter order $R$ and RFF dimension $M$ is often sufficient.

Empirically, we evaluate stacking multiple G-NAMRFF layers on node classification task on Cora dataset and observe only marginal gains or even performance degradation. We attribute this to two competing effects: (1) oversmoothing, where repeated graph filtering causes node features to become indistinguishable, and (2) overfitting from stacking multiple layers of high-dimensional RFF projections.

In summary, while stacking G-NAMRFF is technically feasible and may increase function class capacity, it compromises interpretability and offers limited performance benefits. Our results suggest that a carefully designed single-layer G-NAMRFF already achieves competitive performance with far greater transparency.

RFF(M)	Filter Order (R)	Layers	Accuracy (%)
100	3	2	78.40 ± 0.25
200	3	2	80.12 ± 0.36
100	5	2	75.50 ± 0.11
200	5	2	75.92 ± 1.98
100	3	3	76.56 ± 0.23
200	3	3	75.77 ± 0.45
100	5	3	72.35 ± 1.26
200	5	3	71.19 ± 2.49

I did not understand Theorem 2.1 (for instance what do you mean with approximated?) and the statement could be wrongly formulated. Could you provide a reference or a proof?

We thank the reviewer for the comment. The statement of Theorem 2.1 is a standard result: By Bochner’s theorem, a shift‑invariant kernel can be expressed as $K(x,x') = \mathbb{E}_{a,b}(\mathbf{\Phi} _{a}^{T} (x) \mathbf{\Phi} _{a}(x))$ . The finite dimensional inner product is therefore an unbiased Monte carlo estimate of true kernel approximating it to the sampling error. Complete proof along with the convergence and the concentration bounds can be found in Theorem 1 in [1].

We will add this reference in the revised manuscript.

[1] Rahimi, Ali, and Benjamin Recht. "Random features for large-scale kernel machines." Advances in neural information processing systems 20 (2007).

Comment on limitations

We thank the reviewer for the suggestion. Following your suggestion, we will include a section in the revised draft mentioning limitations and future directions as follows:

\section{Limitations and Future Directions}

G-NAMRFF, as discussed, is primarily developed under the assumption of a static graph structure and static node features. While this is consistent with most existing interpretable graph learning frameworks, it limits applicability to dynamic real-world data from social or communication networks, where both the graph topology and node attributes evolve over time.

Extending G-NAMRFF to dynamic graphs is a promising direction for future work. In particular, both continuous-time dynamic graphs (CTDGs) and discrete-time dynamic graphs (DTDGs) are widely used to model temporal graph data. Developing such lightweight, interpretable models for dynamic graphs remains an exciting and important research challenge.

2025-08-03

Dear authors, thank you very much for the response.

I am not an expert for GPs, but I believe that in most cases there is a way around the O(N^3) time.

The response discusses pairwise interactions, but my point was the generalisation to a much broader class of interaction models.

I still do not understand Theorem 2.1. The author refer to Theorem 1 in [1] which is very clear and indeed standard but to my opinion not comparable with Theorem 2.1. More precisely, what is e.g. M in Theorem 2.1?

评论- Reply to the comments from Reviewer 9JUV

2025-08-04

We thank the Reviewer for taking time in reviewing the rebuttal.

I am not an expert for GPs, but I believe that in most cases there is a way around the O(N^3) time.

We agree that to mitigate the computational challenges involved in estimating the posterior distribution, several approaches such as sparse GPs relying on variational inference have been proposed [1]. However, these models still remain black-box, similar to traditional GPs, and lack both feature-level and structure-level interpretability. In contrast, G-NAMRFF not only provides feature-level interpretability but also introduces structure-level interpretability through its modular design involving learnable FIR filters.

[1] Snelson, Edward, and Zoubin Ghahramani. "Sparse Gaussian processes using pseudo-inputs." Advances in neural information processing systems 18 (2005).

I still do not understand Theorem 2.1. The author refer to Theorem 1 in [1] which is very clear and indeed standard but to my opinion not comparable with Theorem 2.1. More precisely, what is e.g. M in Theorem 2.1?

We clarify that $𝑀$ in Theorem 2.1 refers to the number of Random Fourier Features (RFFs). As noted, the finite-dimensional inner product involving RFFs serves as an unbiased Monte Carlo estimate of the true kernel function. Specifically,

$k(x,x') \approx \frac{1}{M} \sum_{m=1}^{M} \mathbf{\Phi}_{a,m}(x) \mathbf{\Phi} _{a,m}(x')$

and as $M \rightarrow \infty$ , the approximation converges to the exact kernel value. We acknowledge that the current form of Theorem 2.1 may cause confusion due to implicit notation. In the revised version, we will explicitly define $𝑀$ and present the RFF-based kernel approximation more clearly and better align with the standard presentation in [1].

[1] Rahimi, Ali, and Benjamin Recht. "Random features for large-scale kernel machines." Advances in neural information processing systems 20 (2007).

The response discusses pairwise interactions, but my point was the generalisation to a much broader class of interaction models.

We interpret the reviewer's comment as referring to higher-order feature interactions, i.e., modeling interactions involving three or more features jointly. Following the similar lines, as the one discussed earlier, G-NAMRFF can be extended using ANOVA decomposition [1] to model higher order interactions as

$y_{i} = \sum_{k_1=1}^D f _{k_1}(x _{i,k_1}) + \sum _{k_1 <k_2} f _{k_1,k_2}(x _{i,k_1}, x _{i,k_2}) + \sum _{k_1<k_2<k_3} f _{k_1,k_2,k_3}(x _{i,k_1},x _{i,k_2},x _{i,k_3}) + ..... .... + \sum _{k_1<k_2<...<k_d} f _{k_1,k_2,...k _{d}}(x _{i,k_1},x _{i,k_2},.....x _{i,k _{d}})$

where $f_{k_1,k_2,...k_{d}}(.)$ models interactions among the $d$ features. In particular, these functions are modeled using multi-dimensional Gaussian process. It is important to observe that function that captures higher order correlations be modelled as in [2] with $f_{k_1,k_{2},..k_{d}}(x_{i,k_1}, x_{i,k_2},...x_{i,k_{d}}) = GP(0,K _{d}(.))$ , where $K _d$ models the interactions among the variables and can be expressed in terms of product of the covariance kernels as $K _{d}( x _{i,k_1}, x _{i,k _2}, .... x _{i,k _d}) = K_1(x _{i,k_1}, x _{j,k_1}) K _2(x _{i,k_2}, x _{j,k_2})... K _d(x _{i,k_d}, x _{j,k_d}) = \prod _{p=1}^{d} K _p(x _{i,k _{p}}, K _{j,k _{p}})$ , where each $K_p$ follows eqn(4) as in the main draft. Following our discussion earlier, further each of this can be approximated using Random Fourier Features (RFFs) with inputs as graph-filtered signals that captures the structural (neighborhood) interactions.

While such a model increases expressivity, we emphasize that it comes with significant computational overhead primarily due to exponential increase in the number of parameters and reduced interpretability as stated in [2]. While the current method can, in principle, accommodate such higher-order interactions, developing a scalable approach that captures these correlations without substantially increasing the number of parameters remains a promising and important direction for future research—especially given the recent surge of interest in interpretable graph machine learning models. In the revised draft, we will add this in the future directions.

[1] Durrande, Nicolas, et al. "ANOVA kernels and RKHS of zero mean functions for model-based sensitivity analysis." Journal of Multivariate Analysis 115 (2013): 57-67.

[2] Duvenaud, David K., Hannes Nickisch, and Carl Rasmussen. "Additive gaussian processes." Advances in neural information processing systems 24 (2011).

2025-08-05

Dear authors, thank you for carefully responding to my questions.

2025-08-05

We thank the reviewer for their time and for the positive feedback on the paper.

审稿意见

评分: 5置信度: 42025-06-30

This paper introduces G-NAMRFF, a lightweight, interpretable model for graph learning. It improves upon GNAN (the only existing glass-box GNN architecture) by using Random Fourier Features (RFFs) to approximate Gaussian process kernels, reducing parameter complexity while maintaining interpretability. G-NAMRFF represents node embeddings as feature-wise contributions, with a graph-aware kernel based on Radial Basis Function (RBF) and Laplacian. It outperforms SOTA black-box GNNs and GNAN on node and graph classification tasks, providing real-time interpretability without sacrificing accuracy.

优缺点分析

Strengths

The proposed G-NAMRFF is a novel method that effectively reduces the parameters of existing graph neural additive networks while maintaining strong interpretability and performance.
Extending the Random Fourier Features-based kernel approximation to the Graph Neural Additive Model is an intriguing approach and offers valuable potential for future research.
This paper provides sufficient theoretical and empirical analysis, particularly through real-world node and graph classification experiments, runtime analysis, and ablation studies, all of which demonstrate the effectiveness of the proposed methods.
The paper is well-written, with a clear logical structure that makes it easy to read.

Weaknesses

Although I believe this paper presents a novel contribution, it still has some issues.

The paper contains several minor errors; I have provided a few examples and kindly ask the authors to review them:
a) The description of Figure 1 does not specify what the x-axis represents.
b) Theorem 2.1 lacks the necessary references.
c) Some figures and tables are missing punctuation after the captions, such as Figures 1 and 2, and Table 2.
d) The notation is inconsistent, for example, the Laplacian matrix in Theorem 4.2 is not written in boldface capital letters.
The paper focuses solely on parameter complexity and does not address time complexity.
No code is provided for reproducibility.

问题

Please analyze the time complexity of the method proposed in this paper, specifically the time complexity of Algorithm 1. How does it compare to that of GCN?
In line 61, the authors mention that GNAN requires O(N+E) distance computations per node. Why is this considered 'quadratic scaling'?

局限性

No. Please see the Weaknesses and Questions Section.

最终评判理由

After reviewing the author’s rebuttal and the feedback from other reviewers, I have decided to support this paper and believe it deserves acceptance.

The use of Random Fourier Features to improve the complexity of Graph Neural Additive Networks and enhance interpretability is both novel and technically elegant, with strong theoretical foundations and empirical validation. The intuitive illustrations in Figures 3 and 4, in particular, are interesting.

Regarding reviewer wbe8's question, I believe it is somewhat outside the scope of this paper. Dynamic graphs should be addressed in future research, and the authors have already incorporated link prediction, demonstrating the effectiveness of the proposed method across three fundamental graph learning tasks: node classification, graph classification, and link prediction.

Regarding Theorem 2.1 in the Preliminaries section, the authors' writing is unclear, and references are non-standard. While this issue should be addressed in a revision, it does not constitute grounds for rejection.

格式问题

No.

作者回复

2025-07-28

We thank the reviewer for the positive feedback on the model strengths and the presentation of the paper. We address the specific questions in detail below.

Comment on typos

We thank the reviewer for the thorough reading of the manuscript and pointing out the typos. We will include all the suggestions in the revised manuscript. More specifically, we will add the details such as label for X-axis, references for the Theorem 2 and carefully check for the other typos as well.

Comment on time complexity

The complexity of G-NAMRFF is dominated by two key steps as outlined in Algorithm 1 in supplementary material, namely (1) Filtering (2) RFF approximation. For a graph with $N$ nodes and $D$ dimensional features, we have Laplacian as $\mathbf{L} \in \mathbb{R}^{N \times N}$ , graph data as $\mathbf{X} \in \mathbb{R}^{N \times D}$ . Then the graph filtering operation costs $\mathcal{O}(N^{2}D)$ . Whereas the RFF approximation costs $\mathcal{O}(NMD)$ . Hence the total time complexity for dense graph i.e., worst case complexity of the proposed algorithm is $\mathcal{O}(N^{2}D + NMD)$ . Exploiting the sparsity in the Laplacian, let say if $\mathbf{L}$ has atmost $d$ (aka degree) non-zero entries the graph filtering operation costs $\mathcal{O}(dND)$ . Total time complexity is $\mathcal{O}(dND+NMD)$ .

For dense graphs, the time complexity of GCN with $K$ layers is $\mathcal{O}(KN^2F + K N F^2)$ . Exploiting the sparsity the time complexity for GCN is given by $\mathcal{O}(KEF+ KNF^2)$ , where $E$ is the total number of edges in the graph. So the time complexity is comparable to the black-box model.

The authors mention that GNAN requires O(N+E) distance computations per node. Why is this considered 'quadratic scaling'?

As explained in the introduction, GNAN requires computing the distance of a node $i$ to all $N$ nodes in the graph, leading to a per-node complexity of $\mathcal{O}(N + E)$ . Aggregating over all $N$ nodes results in an overall complexity of $\mathcal{O}(N(N + E))$ . In the case of a complete graph, where $E = \mathcal{O}(N^2)$ , the total complexity becomes $\mathcal{O}(N^3)$ , making GNAN computationally expensive. This quadratic (or cubic) scaling limits its scalability even if sparsity can be exploited in more realistic sparse graphs.

We will make the code public including all the suggestions in the final draft.

2025-08-04

Thank you for your response; it cleared up my concern. I will keep my supportive score.

2025-08-05

We thank the reviewer for their time and the very positive feedback on the paper.

审稿意见

评分: 5置信度: 42025-07-01

This paper introduces the Graph Neural Additive Model with Random Fourier Features (G-NAMRFF), a novel, lightweight, and self-interpretable model for graph-structured data. The interpretability is achieved by additive modelling, i.e., the output of a node is a sum of each individual channels of transformed features without channel mixing. A core design to avoid the quadratic complexity of global feature aggregation, is to leverage random Fourier features to approximate a graph kernel that captures node-pairwise interaction.

优缺点分析

Strengths

The paper tackles the critical problem of computational complexity and parameter efficiency in interpretable GNNs.
A theoretical quantification of robustness is provided, which is crucial for model generalization.
Extensive experiments demonstrate interpretability with descent predictive performance.
The paper is well-written and easy to follow

Weaknesses

There could be concerns about model capacity, since the overall model is pretty much linear. The expressivity problem could be worse since there is no feature mixing for interpretability.
In the main experiments, it is not clear how the number of parameters of the proposed method is compared to other baselines. This is of particular interest as being lightweight is one of the main motivations.
Minor: equation (1), I believe it should be $f_{k}(x_{j,k})$ .

问题

Please see Weaknesses.

局限性

The authors did not discuss any potential limitations. Please see Weeknesses for possible improvements.

最终评判理由

My concern regarding parameter comparison has been addressed by the authors’ rebuttal. While expressive power could be a limitation, I view it more as an avenue for future work, particularly in studying the trade-offs between interpretability, robustness, and expressivity in this framework.

Regarding Reviewer wbe8's comment on the limitation to static graphs, I do not consider this to be a major limitation of the current contribution.

Regarding Reviwer 9JUV's comment on Theorem 2.1, I feel like it is a standard statement in random features.

Taking all these aspects into account, I maintain my recommendation for acceptance.

格式问题

N/A

作者回复

2025-07-28

We thank the reviewer for the very positive feedback on the strengths of the work and the presentation of the paper. We address the specific questions in detail below.

Comment on expressivity

$y_{i} = \sum_{k=1}^{D} f_{k}(x_{i,k}) + \sum_{k=1}^{D} \sum_{k'>k}^{D} f_{k,k'}(x_{i,k},x_{i,k'})$ , where $f_{k,k'}$ is modeled as a Gaussian Process with a 2D input. These functions can be efficiently approximated using 2D random Fourier features (RFFs) as

$f_{k,k'}(x_{i,k},x_{i,k'}) = \mathbf{\Phi}_ {a}^{T}([\tilde{x}_{i,k},\tilde{x} _{i,k'}])\mathbf{w} _{k,k'}.$

The overall prediction of node $i$ then becomes

$y_{i} = \sum_{k=1}^{D} \mathbf{\Phi}_ {a}^{T}(\tilde{x}_{i,k}) \mathbf{w} _{k} + \sum _{k=1}^{D} \sum _{k'>k}^{D} \mathbf{\Phi} _{a}^{T}([\tilde{x} _{i,k},\tilde{x} _{i,k'}])\mathbf{w} _{k,k'}$

As discussed earlier, this further increases the complexity due to increase in the number of learnable parameters .

In the final draft, we plan to add this as a remark.

[1] Chang, Chun-Hao, Rich Caruana, and Anna Goldenberg. "NODE-GAM: Neural Generalized Additive Model for Interpretable Deep Learning." International Conference on Learning Representations 2022.

[2] Yang, Zebin, Aijun Zhang, and Agus Sudjianto. "GAMI-Net: An explainable neural network based on generalized additive models with structured interactions." Pattern Recognition 120 (2021): 108192.

Comparison of number of parameters with other baselines

We thank the reviewer for the comment. Although in the main paper we compared the parameter efficiency against GNAN, here we present the comparison against the other black box and glass box models. In particular, we present the ratio of number of parameters of black/glass box models to G-NAMRFF. Considering $H_u$ as size of the hidden layer, $D$ as input feature dimension, $M$ as number of RFFs, $R$ as filter order, $L$ as number of GCN layers or MLP layers and $K$ as number of heads in GAT, we present the parameter efficiency in the below table where it is clear that proposed model is parameter efficient.

Model	Parameter ratio	Learnable parameter equation
GNAM-RFF	1.0	$D \times M + R+1$
GCN	5.12	$( H_u \times D + H) +(L - 1)(H_u^2 + H_u) + (H_u + 1)$
GAT	10.24	$K(H_u \times D+3H_u) + (L-1)K(H_u^2 + 3H_u) + K \times H_u + 1$
NAM	150	$(D)\times( H_u^{2}\times (L-1) + (L+2)\times H_u + 1)$
GNAN	168	$(D+1)\times( H_u^{2}\times (L-1) + (L+2)\times H_u + 1)$

Comment on equation 1

We thank the reviewer for pointing out the typo. We will correct this while submitting the final draft.

Comment on limitations

We thank the reviewer for the suggestion. Following your suggestion, we will include a section in the revised draft mentioning limitations and future directions as follows:

\section{Limitations and Future Directions} G-NAMRFF, as discussed, is primarily developed under the assumption of a static graph structure and static node features. While this is consistent with most existing interpretable graph learning frameworks, it limits applicability to dynamic real-world data from social or communication networks, where both the graph topology and node attributes evolve over time.

Extending G-NAMRFF to dynamic graphs is a promising direction for future work. In particular, both continuous-time dynamic graphs (CTDGs) and discrete-time dynamic graphs (DTDGs) offer formal models for temporal graph data. The modular nature of G-NAMRFF—particularly its use of FIR graph filters and additive modeling—suggests that it can be adapted to these settings by incorporating time-varying graph filters and temporally evolving RFF features. Developing such lightweight, interpretable models for dynamic graphs remains an exciting and important research challenge.

评论- Official Comment by Reviewer i8kU

2025-08-06

Thank you for your rebuttal. Most of my concerns have been addressed. I will maintain my acceptance score.

2025-08-06

We thank the reviewer for the valuable time in reviewing and for the very positive feedback on the manuscript.

审稿意见

评分: 3置信度: 32025-07-08

This paper studies a graph neural additive model that leverages random Fourier features to yield a relatively less complex and self-interpretable graph additive architecture, referred to as G-NAMRFF.

优缺点分析

Strengths

G-NAMRFF offers interpretability, which makes it suitable for real applications that require transparent decisions.
G-NAMRFF requires a substantially less complex computing architecture relative to existing approaches.
G-NAMRFF is shown to be robust to small perturbations in the graph structure, lending stability to its decisions.
G-NAMRFF achieves competitive performance in node and graph classification tasks.

Weaknesses

The effectiveness of the model seems to depend on careful fine-tuning of a large set of hyperparameters.
The interpretability aspect is limited to static data.
The effectiveness of G-NAMRFF beyond graph or node classification tasks is not clear.

问题

See strengths and weaknesses section.

局限性

The authors claim that their work has no limitations. I disagree with this assessment (see weaknesses section above).

最终评判理由

Among the concerns raised in my review, the authors provided sufficient response only to the concern about the number of hyperparameters. While authors are confident about the generalizability of their approach to broader settings involving dynamic graphs, they could not provide any convincing evidence/arguments about resolving these aspects in their rebuttal. Additionally, the authors claim their approach to be task-agnostic. However, the concept of a 'task-agnostic' approach is not rigorously defined or explored in this paper or their rebuttal. With these concerns, my recommendation is to reject this paper in its current form.

格式问题

No paper formatting concerns.

作者回复

2025-07-27

We thank the reviewer for the detailed review and also for the positive feedback on the strengths of the proposed model. We address specific questions in detail below:

The effectiveness of the model seems to depend on careful fine-tuning of a large set of hyperparameters.

The proposed G-NAMRFF model includes a limited and interpretable set of hyperparameters: the filter order ( $R$ ), number of random Fourier features ( $M$ ) and kernel bandwidth ( $\theta$ ). Among these $R$ controls the receptive field depth and is typically constrained to small values (e.g., 1–5), as higher values lead to oversmoothing, a well-established limitation in graph representation learning, $M$ and $\theta$ are kernel approximation parameters which are common in any Random Fourier Feature (RFF) or Gaussian Process-based methods and are not specific to our model.

We further highlight that compared to GNAN, our model has significantly less hyperparameters. GNAN requires separate deep neural networks for each feature and for neighborhood weighting, each with tunable number of layers, hidden units, and additional regularization mechanisms (e.g., dropout), increasing tuning complexity.

We also emphasize that the performance of G-NAMRFF is relatively robust across different settings of hyperparameters as demonstrated in the supplementary material. This clarifies that G-NAMRFF is both parameter-efficient and more manageable in terms of tuning compared to prior interpretable baselines.

The interpretability aspect is limited to static data

G-NAMRFF, as correctly pointed out, is a self-interpretable and lightweight model developed primarily for static graphs. Our study explicitly focuses on static graph settings, as detailed in the paper, given the early stage of research in interpretable graph models—where, to the best of our knowledge, only one prior self-interpretable method (GNAN) has been proposed. The goal of this work is to develop a first-of-its-kind interpretable and parameter-efficient architecture that performs on par with black-box GNNs, while offering real-time interpretability.

That said, the modular design of G-NAMRFF readily allows for extensions to streaming or temporal graph data. In particular, the model’s core strength lies in its use of graph-aware FIR filters, which are functions of the graph Laplacian. In the dynamic setting—commonly modeled via Discrete-Time Dynamic Graphs (DTDGs)—both the graph Laplacian $\mathbf{L}$ and node features evolve with time. Existing works [1, 2] in graph signal processing (GSP) has extensively studied time-varying graph filters which can be directly used to generate time-varying filtered features. These filtered signals can then be passed through the same RFF-based univariate modeling framework, preserving interpretability while capturing temporal dynamics. While our current implementation is for static graphs, the architectural components are well-positioned for principled extension to dynamic graphs which we consider an exciting direction for future work. We will add this in a future direction section while submitting the final draft.

[1] Isufi, Elvin, et al. "Filtering random graph processes over random time-varying graphs." IEEE Transactions on Signal Processing 65.16 (2017): 4406-4421.

[2] Bohannon, Addison W., Brian M. Sadler, and Radu V. Balan. "A filtering framework for time-varying graph signals." Vertex-Frequency Analysis of Graph Signals. Cham: Springer International Publishing, 2018. 341-376.

The effectiveness of G-NAMRFF beyond graph or node classification tasks is not clear.

We thank the reviewer for the comment. Following prior works in interpretable and explainable graph learning we focused on node and graph classification tasks to enable a fair and direct comparison. However, the core strength of G-NAMRFF lies in its design: graph signal filtering followed by additive modeling via RFFs. This structure is task-agnostic and readily extends to other downstream tasks, such as link prediction and graph regression. To validate this, we conducted additional experiments with downstream task as link prediction. Given a node pair (say $u,v$ ), we compute their embeddings using G-NAMRFF and predict the likelihood of an edge via a dot product decoder followed by a sigmoid activation. Importantly, since node embeddings are formed through additive combinations of filtered, RFF-transformed features, the model retains feature-level interpretability in the link prediction setting. Specifically, we can decompose the link score into contributions from individual features and hop depths for each node, allowing us to quantify which features most strongly influence the prediction. We quantitatively present the results on benchmark datasets with AUC as a metric. From the results we emphasize that the G-NAMRFF is task agnostic and offers interpretability without compromising the accuracy.

Models	Cora	Citeseer	Pubmed
GCN	92.03 ± 0.27	94.78 ± 0.35	98.69 ± 0.06
GAT	93.08 ± 0.32	96.25 ± 0.20	98.20 ± 0.07
GraphSAGE	92.35 ± 0.15	93.12 ± 0.30	98.87 ± 0.04
GPAM	83.46 ± 0.37	88.65 ± 1.50	78.38 ± 0.08
GNAN	84.95 ± 0.55	81.12 ± 0.90	88.77 ± 0.40
G-NAMRFF(ours)	93.83 ± 0.35	93.96 ± 1.20	97.25 ± 0.21

Comment on limitations

We thank the reviewer for the suggestion. Following your suggestion, we will include a section in the revised draft mentioning limitations and future directions as follows:

\section{Limitations and Future Directions}

Extending G-NAMRFF to dynamic graphs is a promising direction for future work. In particular, both continuous-time dynamic graphs (CTDGs) and discrete-time dynamic graphs (DTDGs) are widely used to model temporal graph data. The modular nature of G-NAMRFF—particularly its use of FIR graph filters and additive modeling—suggests that it can be adapted to these settings by incorporating time-varying graph filters and temporally evolving RFF features. Developing such lightweight, interpretable models for dynamic graphs remains an exciting and important research challenge.

评论- Unfaithful declaration

2025-08-07

Dear reviewer wbe8,

please notice that you have released an unfaithful declaration since you have NOT engaged in discussions and responded to authors rebuttal! Please, do it as soon as possible. Specifically, are you satisfied with respect to their rebuttal ? Are there still pending issues ?

Best AC

2025-08-06

Dear Reviewer,

We have addressed all your comments. It would be really helpful if you review the rebuttal and put your concerns if any. If the rebuttal answers your questions can you please update the final score.

评论- Request for reviewing rebuttal

2025-08-06

Dear Reviewer,

As the author-reviewer discussion is about to end, can you please check the rebuttal and respond. If you have any further queries we are more happy to respond.

2025-08-07

Dear Area Chair,

I have considered all aspects of the authors' rebuttal (which is in line with the essence of the reviewing process) and chosen to retain my original rating. The concerns raised by me in the review are corroborated by authors' response, which limits my recommendation for this work.

Thank you.

2025-08-08

We thank the reviewer for their response. However, we respectfully point out that the reviewer’s assertion that our rebuttal “corroborates” their concerns is not accurate. In fact, we have specifically addressed and disproved each of the concerns with detailed arguments, comparisons, and new evidence.

To summarize :

1: Hyperparameter tuning: We clarified that our model uses only three hyperparameters, and these are significantly fewer and easier to manage than those in state-of-the-art interpretable models like GNAN.

2:Interpretability limited to static data: We clearly stated that G-NAMRFF is explicitly designed for static graphs, just like many foundational GNN models (e.g., GCN, GAT, Graph Transformers). Moreover, we explained how the model architecture is naturally extendable to dynamic graphs and supported this with citations to relevant literature.

3:Effectiveness beyond node/graph classification: While our main experiments followed standard tasks in the interpretability literature, we went further and added new results on link prediction. These demonstrated both accuracy and retained interpretability — showing the model is indeed task-agnostic.

Given these clarifications, we believe the reviewer’s summary does not reflect the content of our rebuttal. Rather than corroborating concerns, our rebuttal resolves them with direct and clear evidence.

We remain open to further discussion and are happy to clarify any specific points.

最终决定Accept (poster)

2025-09-17

The paper proposes Graph Neural Additive Model with Random Fourier Features (G-NAMRFF), an extension of Graph NAM, that is claimed to be interpretable, computational efficient and lightweight, seamlessly integrating graph topology and feature mapping into a single module. Interpretability is inherited by GNAM, while a better integration of graph structure and node features (wrt GNAM), is obtained via the adoption of a kernel, that implements a Gaussian Process prior, capturing integrated information for one-hop neighbourhood in the graph, The kernel is approximated via Random Fourier Features, and multi-hop graph information is obtained by embedding the node features using Finite Impulse Response filters with trainable coefficients. The adoption of univariate functions supports interpretability. The experimental assessment of the proposed approach provided evidence of the ability of G-NAMRFF to learn light-weight and interpretable models able to reach comparable performances wrt to comparable approaches.

The main strengths of the paper can be summarised as follows:

addresses an important problem involving computational complexity and parameter efficiency in interpretable GNNs;
the paper claims, i.e. compact interpretable models with very good predictive performances, seems to be supported by an extensive experimental assessment;
theoretical results on permutation equivariance, as well as robustness to perturbation of graph Laplacian are provided;
overall (with the exception of Theorem 2.1) the paper is well-written and easy to follow.

The paper also has some weaknesses, the main ones of which are listed below:

it is not clear how expressive is the family of functions that can be implemented by the proposed approach. This could limit the scope of applicability.
the contribution could be considered incremental, given that it consists of the application of existing techniques.

After discussion with SAC, it was decided that the positive contributions outweigh the weaknesses, which can be addressed in future work.

Interpretable and Parameter Efficient Graph Neural Additive Models with Random Fourier Features

摘要

评审与讨论

优缺点分析

问题

局限性

最终评判理由

格式问题

The method is just regression with a specific GP, Theorem 4.1 seems (at least intuitively) to be clear. While there is nothing wrong with the paper, I did miss some deeper insights.

Comment on expressivity

Could you make this deep, by staking G-NAMRFF layers on top of each other?

I did not understand Theorem 2.1 (for instance what do you mean with approximated?) and the statement could be wrongly formulated. Could you provide a reference or a proof?

Comment on limitations

I am not an expert for GPs, but I believe that in most cases there is a way around the O(N^3) time.

I still do not understand Theorem 2.1. The author refer to Theorem 1 in [1] which is very clear and indeed standard but to my opinion not comparable with Theorem 2.1. More precisely, what is e.g. M in Theorem 2.1?

The response discusses pairwise interactions, but my point was the generalisation to a much broader class of interaction models.

优缺点分析

Strengths

Weaknesses

问题

局限性

最终评判理由

格式问题

Comment on typos

Comment on time complexity

The authors mention that GNAN requires O(N+E) distance computations per node. Why is this considered 'quadratic scaling'?

优缺点分析

问题

局限性

最终评判理由

格式问题

Comment on expressivity

Comparison of number of parameters with other baselines

Comment on equation 1

Comment on limitations

优缺点分析

问题

局限性

最终评判理由

格式问题

The effectiveness of the model seems to depend on careful fine-tuning of a large set of hyperparameters.

The interpretability aspect is limited to static data

The effectiveness of G-NAMRFF beyond graph or node classification tasks is not clear.

Comment on limitations