7.1

/10

Poster5 位审稿人

最低4最高5标准差0.5

4.0

置信度

创新性2.4

质量2.6

清晰度2.8

重要性3.0

NeurIPS 2025

HeroFilter: Adaptive Spectral Graph Filter for Varying Heterophilic Relations

Shuaicheng Zhang,Haohui Wang,Junhong Lin,Xiaojie Guo,Yada Zhu,Si Zhang,Dongqi Fu,Dawei Zhou

OpenReview PDF

提交: 2025-05-08更新: 2025-10-29

摘要

Graph heterophily, where connected nodes have different labels, has attracted significant interest recently. Most existing works adopt a simplified approach - using low-pass filters for homophilic graphs and high-pass filters for heterophilic graphs. However, we discover that the relationship between graph heterophily and spectral filters is more complex - the optimal filter response varies across frequency components and does not follow a strict monotonic correlation with heterophily degree. This finding challenges conventional fixed filter designs and suggests the need for adaptive filtering to preserve expressiveness in graph embeddings. Formally, natural questions arise: Given a heterophilic graph $\mathcal{G}$ , how and to what extent will the varying heterophily degree of $\mathcal{G}$ affect the performance of GNNs? How can we design adaptive filters to fit those varying heterophilic connections? Our theoretical analysis reveals that the average frequency response of GNNs and graph heterophily degree do not follow a strict monotonic correlation, necessitating adaptive graph filters to guarantee good generalization performance. Hence, we propose HeroFilter, a simple yet powerful GNN, which extracts information across the heterophily spectrum and combines salient representations through adaptive mixing. HeroFilter's superior performance achieves up to 9.2% accuracy improvement over leading baselines across homophilic and heterophilic graphs.

关键词

Graph heterophilygraph signal processingmodel generalization

评审与讨论

审稿意见

评分: 4置信度: 42025-06-24

This paper presents a viewpoint on the issue of homophilic or heterophilic graphs. There exists a nonlinear relationship between the degree of graph heterogeneity and high and low-frequency filters rather than a simple linear one. Therefore, a single high or low-frequency filter cannot accomplish the task, and an adaptive filter needs to be introduced. A HERO filter framework is proposed. The first part first constructs a polynomial filter by weighting frequency domain information, establishes a node correlation matrix based on this, and finally sorts the correlation degrees and reconstructs the node neighbors. The second part adopts a dual MLP hybrid structure, obtaining patch dimensions from the reconstructed node neighbors and new feature dimensions, thereby enhancing the model's ability to express complex heterophilic structures. In addition, a fast HERO filtering method is proposed for large-scale graphs. Experiments show that it works well in heterophilic graphs.

优缺点分析

Strengths

The proposed approach is novel, linking the heterogeneity of images, spectral filter responses, and propagation errors. Single high-frequency and low-frequency filters, and even some composite filters, are considered from the perspective of heterogeneity or homogeneity, necessitating an adaptive filter to address the issue.
The first part of the model, which re-establishes node relationships based on frequency domain information, is a novel concept that can capture the relationships between heterophily nodes. Weaknesses
From the perspective of error bound, only theoretical analysis is provided without proof, which seems somewhat vague.
In the second part, the use of a two-layer MLP structure lacks interpretability, and I do not believe it can effectively address the issue of node feature representation. From the ablation experiments, the results of removing the mixing layer are also not significantly different.
There is a crucial low-level error. There are only 34 references in the main text, but references 35-39 are mentioned in the attachment, and there are no references in the attachment. I am not sure what references 35-39 are.

问题

1.For the second part of the model, could other hybrid methods be improved and compared in experiments to enhance versatility? 2. Regarding the effectiveness of the two-layer MLP structure, Table 5 only shows the results of two large-scale data sets. Could you provide more ablation experiment results from heterophily datasets? 3. The first part of the model reconstructed the neighbors. Could you compare the heterophily degree of the graph after reconstruction? This would also support the proposed theory.

局限性

No. The paper does not discuss limitations.

最终评判理由

Most of my concerns are addressed by the authors. However, there are still some issues in this study that require further revision. Thus, I lean toward accepting the paper given that the authors will revise the paper accordingly.

格式问题

作者回复

2025-07-31

W1: From the perspective of error bound, only theoretical analysis is provided without proof, which seems somewhat vague.

We would like to clarify that our error bound (Theorem 1) is fully proven in Appendix C, with detailed derivations connecting heterophily, spectral filtering, and GNN prediction error. We believe our proof provides a rigorous, interpretable framework showing how filter misalignment widens bounds under varying heterophily, motivating adaptive designs for tighter generalization. The experimental results in Table 1 show that HeroFilter's adaptation reduces errors by +2-5%, empirically validating our theoretical insights.

W2: In the second part, the use of a two-layer MLP structure lacks interpretability, and I do not believe it can effectively address the issue of node feature representation. From the ablation experiments, the results of removing the mixing layer are also not significantly different.

We resepctfully disagree with the assessment of HeroFilter Mixer. Our two-layer MLP in the HeroFilter Mixer (Eq. 11) can be viewed as a token and feature aggregator, inspired by the MLP-Mixer architecture [1], which excels at capturing complex patterns through channel and token mixing. In our context, the MLP non-linearly transforms spectral patches to refine node representations under varying heterophily, effectively addressing feature representation by blending low- and high-frequency signals (e.g., for heterophilic nodes, per Property 1). This interpretability stems from its role as a contextual aggregator: the first layer (linear+ReLU) enhances feature interactions within patches, while the second layer projects to a refined space, preserving spectral order critical for heterophily adaptation.

[1] Tolstikhin et al., MLP-Mixer: An All-MLP Architecture for Vision, NeurIPS 2021

W3: There are only 34 references in the main text, but references 35-39 are mentioned in the attachment, and there are no references in the attachment. I am not sure what references 35-39 are.

We apologize for this formatting problem in the submission—the missing references 35-39 (cited in the appendix but not listed in the main body, due to a compilation error) are as follows, drawn from key spectral GNN works for completeness:

[35] Hoffman, D. G. (1981). Packing problems and inequalities. In The Mathematical Gardner (pp. 212–225). Springer.

[36] He, M., Wei, Z., Huang, Z., & Xu, H. (2021). BernNet: Learning arbitrary graph spectral filters via Bernstein approximation. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, & J. Wortman Vaughan (Eds.), Advances in Neural Information Processing Systems (Vol. 34, pp. 14239–14251). Curran Associates, Inc.

[37] Pei, H., Wei, B., Chang, K. C.-C., Lei, Y., & Yang, B. (2020). Geom-GCN: Geometric graph convolutional networks. In International Conference on Learning Representations.

[38] Lim, D., Hohne, F., Li, X., Huang, S. L., Gupta, V., Bhalerao, O., & Lim, S. N. (2021). Large scale learning on non-homophilous graphs: New benchmarks and strong simple methods. Advances in Neural Information Processing Systems, 34, 20887–20902.

[39] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

[40] Fout, A., Byrd, J., Shariat, B., & Ben-Hur, A. (2017). Protein interface prediction using graph convolutional networks. Advances in Neural Information Processing Systems, 30.

[41] Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., Catasta, M., & Leskovec, J. (2020). Open graph benchmark: Datasets for machine learning on graphs. Advances in Neural Information Processing Systems, 33, 22118–22133.

[42] Leskovec, J., & Sosič, R. (2016). SNAP: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology (TIST), 8(1), 1–20.

We'll correct this in revision by including a full appendix reference list, ensuring seamless integration.

Q1: Could other hybrid methods be improved and compared in experiments to enhance versatility

Thank you for raising this point about hybrid models. To ensure we provide the most relevant results, could you please clarify what you mean by "hybrid model"?

To the best of our knowledge, if you are referring to using alternative GNNs on the induced graphs generated by our HeroFilter Patcher (replacing the HeroFilter Mixer), we have conducted ablation experiments below. These show that models like GCN and FAGCN can benefit from our patch-induced graphs, particularly on the heterophilic dataset Chameleon, where GCN shows modest gains—but they still underperform compared to our full HeroFilter, underscoring the Mixer's practical effectiveness.

Model	Cora	Chameleon
GCN-Patch	74.1 ± 1.0	56.0 ± 2.0
FAGCN-Patch	74.8 ± 0.4	48.3 ± 2.2
HeroFilter	77.3 ± 0.6	57.3 ± 1.2

Q2: Table 5 is the ablation study on two large datasets (where most of comparison methods are OOM), what about other smaller heterophilic graphs?

We would like to provide new results on smaller/medium heterophilic graphs (Squirrel and Chameleon), showing component effectiveness on heterophilic graphs: The full model synergizes for +3-4% average gains, with the patcher contributing most to adaptation (as it reconstructs neighborhoods per Property 1), validating generalization on non-large graphs.

Component	Squirrel (filtered)	Chameleon (filtered)
HeroFilter	40.14±01.76	38.49±2.83
with Patch Mixer	39.28±1.76	37.75±2.69
with Feature Mixer	39.17±1.73	37.21±3.15
No Mixer	37.69±1.87	36.12±2.46

Q3. The first part of the model reconstructed the neighbors. Could you compare the heterophily degree of the graph after reconstruction? This would also support the proposed theory.

We would like to provide some results on the effect of patches and heterophily degree. Shown from our results, the reconstructed patch has significantly lower heterophily degrees compared to the original graph, with an average reduction of ~0.42 across datasets. This reduction validates Property 1’s non-monotonic spectral response by adaptively selecting spectrally aligned, informative nodes, making neighborhoods more "homophily-like" for effective signal propagation, and aligns with Theorem 1’s error bounds by minimizing misalignment under varying heterophily.

Dataset	Original Heterophily	After Reconstruction
Texas	0.81	0.3134
Wisconsin	0.74	0.3323
Chameleon	0.64	0.3357
Squirrel	0.77	0.3849

A higher value means a higher heterophily degree.

2025-08-05

Thank you for your detailed rebuttal. After reading the response, some of my concerns are addressed. I have a better understanding of this paper. I will consider improving my score.

2025-08-05

Dear Reviewer EZ4C,

We sincerely thank you for acknowledging that our previous rebuttal has addressed your concerns. We are excited for your appreciation and promise to add the above answer to our updated paper.

Further, we extend a holistic and comprehensive experiment for your Question 2. Our results show that,

On smaller heterophilic graphs (e.g., Squirrel filtered/unfiltered, Chameleon filtered/unfiltered, Cornell, Wisconsin, Actor; heterophily scores 0.76-0.91), HeroFilter achieves +1-4% gains over the strongest ablation (w/ Patch Mixer), with an average +3.2% across these datasets.
- The patch mixer drives the largest individual contribution (+1-2% over feature/no mixer), aligning with Property 1 by enabling adaptive neighborhood handling in heterophilic settings.
On medium/large graphs (Arxiv Year, Snap-Patents; heterophily 0.78-0.93), gains are similar (e.g., +0.17-2.1%), confirming scalability and generalization beyond the large datasets.
- This validates the full model's individual module importance without overfitting to scale.

Dataset/Model	Squirrel (filtered)	Squirrel (unfiltered)	Chameleon (filtered)	Chameleon (unfiltered)	Cornell	Wisconsin	Actor	Arxiv Year	Snap-Patents
Size	small	small	small	small	small	small	small	medium	large
Heterophily Score	0.91	0.78	0.89	0.77	0.89	0.84	0.76	0.78	0.93
HeroFilter	40.14 ± 1.76	46.47 ± 1.39	38.49 ± 2.83	58.79 ± 1.54	75.45 ± 2.74	84.24 ± 3.31	35.63 ± 0.78	54.66 ± 0.28	65.05 ± 0.004
with Patch Mixer	39.28 ± 1.76	44.77 ± 1.48	37.75 ± 2.69	57.83 ± 1.42	73.69 ± 2.87	79.80 ± 3.07	31.36 ± 1.59	54.49 ± 0.004	64.97 ± 0.005
with Feature Mixer	39.17 ± 1.73	43.85 ± 1.53	37.21 ± 3.15	57.36 ± 1.00	73.96 ± 3.83	77.58 ± 4.31	28.59 ± 2.42	53.85 ± 0.020	63.00 ± 0.001
No Mixer	37.69 ± 1.87	43.38 ± 1.32	36.12 ± 2.46	54.51 ± 1.68	71.89 ± 2.84	76.01 ± 2.93	26.32 ± 1.21	52.81 ± 0.010	62.95 ± 0.009

2025-08-06

Thank you for your detailed rebuttal. I would improve the score.

2025-08-08

Thank you for acknowledging that we have addressed your concerns and for improving the score. We will ensure that we incorporate the new results and improve the clarity based on your insightful suggestions in the final version.

审稿意见

评分: 4置信度: 42025-07-01

This paper investigates the heterophily of graphs and observes that the frequency response varies with the degree of heterophily and the graph's eigenvalues. Building on this insight, the authors propose HEROFILTER, a method that incorporates neighborhood information based on spectral affinity instead of relying solely on topological connectivity. The approach is evaluated on both homophilic and heterophilic graph datasets for node classification tasks.

优缺点分析

Strength

Spectral filter design is an important area in graph neural networks. This paper makes a valuable study by focusing on heterophilic graphs and highlighting the limitations of standard spectral methods that rely solely on low-pass or high-pass filtering. The proposed architecture introduces an adaptive polynomial filter based on spectral affinity, aligning the filter design with the node heterophily. This alignment allows the method to better capture relevant neighborhood information. The empirical results further support the effectiveness of the approach, showing improved performance on node classification tasks, especially in heterophilic settings.

Weakness

The authors use the heterophily to define the heterophily degree in spectral domain (Definition 2) and analyze this quantity. However, the motivation for focusing on node heterophily rather than edge heterophily is not discussed. The paper would benefit from a theoretical or empirical justification of why edge-level heterophily is not considered, and what insights could be gained by analyzing heterophily along edges instead.
It is not clear what is the definition of heterophily degree, is it $\mathbf{h}$ or $h_i$ or $\hat{\mathbf{h}}$ or $\hat{h}_i$ ?
The proof for Property 1 in Appendix C is hard to follow. The authors jump to fast between lines without clear explanation. When the equality holds, what insight or interpretation can be drawn from this result?
The theoretical results are inconsistently presented. The main text refers to “Property” and “Theorem” without formally stating them or clearly labeling their locations. For example, there is no “Theorem 1” in the main paper, and the proof of Theorem 3 in the main text appears as “Theorem 1” in the appendix, creating confusion. Additionally, the use of the term “Property” lacks formal mathematical rigor. The paper would benefit from standard terminology such as “Theorem,” “Proposition,” “Lemma,” and “Remark,” along with clearer cross-referencing and structure.
Lines 136–140 assert that the align function highlights the expressive power of adaptive filters under heterophily. However, the paper does not explain why computing cosine similarity with the label vector is appropriate or effective, particularly in heterophilic graphs.
The paper is missing spectral GNN models such as [1-9].
The paper does not specify how the baselines are trained, nor does it report the hyperparameters used for these methods.
The dimensions involved in the Patch-Mixing Layer (Equation 10) and Feature-Mixing Layer (Equation 11) are not specified. It is unclear what inputs LayerNorm and MLP act on and what outputs they produce.
It is not clear whether the results in the experiments are produced using HEROFILTER or its faster variant, Fast-HEROFILTER. Furthermore, while Fast-HEROFILTER is proposed for scalability, there is no analysis of the trade-off between runtime efficiency and performance.
The generalization error bound in lines 141–146 is difficult to interpret. First, it is unclear what $\mathbf{X} = (\mathbf{X}_0, \mathbf{X}_1)$ represents, what are filtered node features? The dimensions of $\mathbf{X}_0$ and $\mathbf{X}_1$ are unspecified. Second, it is not clear why the analysis is limited to binary classification or how it generalizes to multi-class settings.

[1] How powerful are spectral graph neural networks

[2] Graph neural networks with convolutional ARMA filters

[3] Specformer: Spectral graph neural networks meet transformers

[4] Cayleynets: Graph convolutional neural networks with complex rational spectral filters

[5] Equivariant machine learning on graphs with nonlinear spectral filters

[6] Graph neural networks with learnable and optimal polynomial bases

[7] Rethinking spectral graph neural networks with spatially adaptive filtering

[8] Piecewise constant spectral graph neural network

[9] SLOG: An inductive spectral graph neural network beyond polynomial filter

Other

Several references ([35]–[42]) are cited in the appendix but not listed in the reference section.
The legend text in Figure 3 is too small and difficult to read.
Algorithm 1 and Algorithm 2 in the appendix both carry the same title "Fast-HEROFILTER Framework"

问题

The performance on the Chameleon and Squirrel datasets, which have been shown to contain duplicate nodes [1]. The results may therefore be biased. What are results on the cleaned versions of these datasets?
What is $\mathbf{R}\_v$ in Equation 9
Please clarify how these quantities are connected in the figure 1a. Does each point in the figure corresponding to a eigenvalue $\lambda_i$ , a heterophily degree w.r.t i, and frequency response $g(\lambda_i)$ ?
In Figure 1b, what type of accuracy is here?

[1] A critical look at the evaluation of GNNs under heterophily: Are we really making progress?

局限性

The authors discuss the scalability issue and proposed a fast variant of the method.

最终评判理由

Most of my comments were addressed by the authors. While there were some minor issues that hindered the readability and accessibility, which the authors promised to fix, the empirical results presented during the rebuttal and discussion period show the strength of the proposed method. I have read the comments from other reviewers and agree with the APPNP aspect. Nevertheless, I lean toward accepting the paper given that the authors will revise the paper accordingly.

格式问题

作者回复

2025-07-31

W1: Motivation and Justification

We would like to clarify that our focus on node heterophily (Definition 2) is driven by its direct connection to spectral analysis, capturing variations per frequency and node label distribution for a more precise theoretical framework for GNN performance bounds compared to edge-level metrics.

Theoretically, node heterophily tightens Theorem 1's error bounds by addressing frequency-specific label mismatches through eigenvector projections $\mathbf{U}^T \mathbf{y}$ , whereas edge heterophily (e.g., average fraction of differing neighbor labels) misses these, potentially overlooking non-monotonic filter requirements (Property 1).

Empirically, node-level adaptation yields +2-4% accuracy gains over edge-based baselines on heterophilic graphs like Squirrel, as it better aligns filters to diverse patterns [1,2]. While edge heterophily could provide insights into local connectivity (e.g., for edge pruning), it lacks the global spectral perspective critical for adaptive filtering—e.g., high edge heterophily may mask mid-frequency node opportunities [3].

[1] Heterophily-Aware Graph Attention Network, Pattern Recognition 2024.

[2] Beyond Homophily in Graph Neural Networks: Current and Future Trends, Neurips 2020.

[3] Beyond Low-frequency Information in Graph Convolutional Networks, AAAI 2021.

W2: It is not clear what is the definition of heterophily degree, is it $\mathbf{h}$ or $h_i$ or $\hat{\mathbf{h}}$ or $\hat{h}_i$ ?

We would like to clarify that $h_i$ is the node heterophily for node $v_i$ , and $\mathbf{h} = (h_0, \ldots , h_{n−1}) \in \mathbb{R}^n$ is the vector containing the heterophily level of each node.

We provide the detailed formulation in lines 95-98.
Furthermore, $\hat{\mathbf{h}}= (\hat{h}\_0, \ldots , \hat{h}_{n−1})$ heterophily degree vector in the spectral domain, which we propose for conducting our theoretical analysis, where $\hat{h}_i$ is the $i$ -th component. The detailed formulation is given in lines 117-120.

W3: The proof for Property 1 in Appendix C is hard to follow

The proof follows three main steps: (1) applying the weighted AM-GM inequality, (2) using the bound $0 \leq g(\lambda_i) \leq 1$ for scaling, and (3) performing logarithmic operations.

The goal is to show when filter response aligns with heterophily for optimal performance—equality holds when the adaptive filter perfectly matches the non-monotonic heterophily pattern across frequencies, providing the insight that GNNs achieve minimal error only with frequency-specific modulation (e.g., mid-freq emphasis in low-heterophily graphs).
This interpretation reveals why fixed filters fail: inequality arises from misalignment, loosening bounds in Theorem 1.

We appreciate this feedback and will extend the proof with more detailed steps in the updated paper.

W4: The theoretical results are inconsistently presented.

Thank you for the suggestion for formatting.

Theorem 1 in the main text refers to the error bound and is correctly matched in the appendix. The "Theorem 3" reference is a typo and will be fixed in revision.

In addition, we will standardize "Property" with proof to "Proposition," add clear labels like "Theorem 1 (Generalization Error Bound)" for all theoretical results in the main text with explicit cross-references to their proofs in the appendix to enhance structure and clarity.

W5: However, the paper does not explain why computing cosine similarity with the label vector is appropriate or effective, particularly in heterophilic graphs.

We would like to clarify that computing cosine similarity with the label vector in lines 136–140 (align function) is appropriate because it quantifies filter-label alignment in the spectral domain, directly measuring expressive power under heterophily.

In heterophilic graphs, where connected nodes often have different labels, filters that only consider neighborhood structure are insufficient.

Therefore, the align function effectively quantifies this by captureing how well adaptive filters match label distributions (Property 2), which is particularly crucial in heterophilic graphs where traditional topology fails (e.g., similarity highlights high-freq needs, boosting +3-5% accuracy on Squirrel vs. non-aligned methods). This is insightful for generalization, as misalignment widens errors (Theorem 1).

W6: Missing Baselines

Due to time constraints, we have reproduced the results below. Additional baseline results will be included in the revised version.

Model	Squirrel (filtered)	Chameleon (filtered)
HeroFilter	40.14±1.76	38.49±2.83
JacobiConv[1]	36.50±2.50	36.10±2.60
Specformer[3]	36.80±2.10	36.50±1.90
NLSFs[5]	38.50±1.70	37.20±1.80
FavardGNN[6]	37.20±2.00	36.90±2.20
PieCoN[8]	39.10±1.40	38.00±1.50
SLOG[9]	37.00±2.50	37.10±2.60

W7: The paper does not specify how the baselines are trained, nor does it report the hyperparameters used for these methods.

We would like to clarify that baselines are trained with standard settings (Adam optimizer, 0.01 LR, 200 epochs, early stopping on validation loss; full hyperparameters in Appendix F), tuned via grid search for fairness and for reproducibility.

W8: It is unclear what inputs LayerNorm and MLP act on and what outputs they produce.

In Eq. 10 (Patch-Mixing), LayerNorm normalizes across features (input: n×p×d tensor, output: same), followed by MLP on patch dimension (input: p×d, output: p×d').

Eq. 11 (Feature-Mixing) transposes to n×d'×p, LayerNorm across patches (output: same), MLP on features (input: d'×p, output: d''×p). Dimensions: n nodes, p patch size (32), d input features (e.g., 128); outputs aggregated to n×d'' for classification.

W9: There is no analysis of the trade-off between runtime efficiency and performance.

We would like to clarify that all results use Fast-HEROFILTER for scalability (original O(n^3) impractical beyond small graphs), with trade-off analysis showing <0.3% accuracy drop vs. original on medium graphs (e.g., Chameleon: 57.5% original vs. 57.3% fast) but 5-10x runtime speedup (Fig. 4)—persuasively balancing efficiency and performance for large-scale heterophily.

W10: Interpretation of error bound

We would like to clarify that $\mathbf{X}_0$ and $\mathbf{X}_1$ represent the filtered node features for nodes belonging to class 0 and class 1, respectively.

Specifically, if we have $n_0$ nodes in class 0 and $n_1$ nodes in class 1, then $\mathbf{X}_0 \in \mathbb{R}^{n_0 \times d}$ and $\mathbf{X}_1 \in \mathbb{R}^{n_1 \times d}$ , where $d$ is the feature dimension after filtering [1]. Starting with binary settings for theoretical analysis is a common practice in graph theory [1,2,3,4].

For multi-class generalization, potential extensions includes using one-vs-rest decomposition to apply the binary bound to each class pair. We will add these clarifications and discuss multi-class extensions in the revised version.

[1] When does a spectral graph neural network fail in node classification?. 2022.

[2] Is homophily a necessity for graph neural networks? 2021.

[3] Is heterophily a real nightmare for graph neural networks to do node classification? 2021.

[4] Node-oriented spectral filtering for graph neural networks 2023.

Q1: Results on cleaned Chameleon/Squirrel.

Thank you for bringing this to our attention, we have conducted extra experiments on these two datasets. The results, shown below, reinforce HeroFilter’s strong performance and robustness on these datasets, aligning with our theoretical claims of adaptive spectral filtering for varying heterophily (Property 1 and Theorem 1).

Model	Squirrel (filtered)	Chameleon (filtered)
Bern-Net	33.85 ± 1.55	32.73 ± 2.10
APPNP	31.30 ± 1.24	37.10 ± 2.99
HeroFilter	40.14 ± 01.76	38.49 ± 2.83

Q2: $\mathbf{R}$ in Eq. 9.

We would like to clarify that $\mathbf{R}$ in Equation 9 refers to the $v$ th column of spectral relevance matrix introduced in Equation 7 (Section 4.1 of the main paper).

Q3: Fig. 1a connections.
We would like to clarify that each point in Figure 1a represents an eigenvalue $\lambda_i$ , its associated heterophily degree $h_i$ (from $\mathbf{h} = \mathbf{U}^T \mathbf{y}$ , Definition 2), and the filter response $g(\lambda_i)$ (Eq. 7), illustrating Property 1’s non-monotonic relationship to optimize classification under varying heterophily.

Q4: Fig. 1b accuracy.

We would like to clarify that the accuracy in Figure 1b refers to node classification accuracy (mean test accuracy across 10 runs) on synthetic graphs with controlled heterophily degrees. It shows interpolated performance for different filter types (low-pass, high-pass, and our adaptive HeroFilter) as heterophily varies from 0.1 to 0.9.

2025-08-04

I would like to thank the authors for their rebuttal. I have read the response and the comments from other reviewers. Some of my concerns were addressed, for instance, those regarding the motivation behind node heterophily and the empirical support on filtered datasets. However, I still find a gap between the authors’ intentions and what is currently presented in the paper, as well as what they plan to revise. For example, I asked for a definition of the heterophily degree, which plays a central role in the paper. The authors referred to lines 117–120 for the detailed formulation, but those lines define only the spectral heterophily vector. It became clear only after the rebuttal that the term heterophily degree actually refers to a heterophily degree "vector". It remains unclear how the authors plan to revise the paper to reflect this clarification. Similarly, in response to my question about the hyperparameters used for the baselines, the authors stated that standard settings were used (Adam optimizer, learning rate of 0.01, 200 epochs, early stopping on validation loss) and that full hyperparameter details are provided in Appendix F. However, Appendix F does not include the full hyperparameter settings for the competing methods, it is "Scalability and Parameter Sensitivity Analysis". I am not sure whether and where the authors intend to address this point in the revision. Additionally, I believe the paper would benefit from a more detailed discussion of the trade-off between runtime efficiency and performance, and a clear section for the used notation. Also, my comments on the other section were not addressed, including 1) Several references ([35]–[42]) are cited in the appendix but not listed in the reference section, 2) The legend text in Figure 3 is too small and difficult to read. 3) Algorithm 1 and Algorithm 2 in the appendix both carry the same title "Fast-HEROFILTER Framework”, and the difference between the two should be highlighted. While these are detailed issues, they impact the clarity and accessibility of the paper.

2025-08-05

Thank you so much for your comments and for acknowledging that our rebuttal has partially addressed your concerns. We sincerely appreciate your thoughtful feedback and continued engagement, as it helps us refine the paper. We are committed to incorporating your suggestions more comprehensively in the revision.

Q1: I asked for a definition of the heterophily degree, which plays a central role in the paper.

A1: We apologize for not fully understanding your question in our initial response. To clarify, the term heterophily degree refers to the heterophily degree vector. In the revision, we will make the following changes for clarity and consistency:

We will use precise notation in our revision: $\mathbf{h}$ for heterophily degree vector, $h_i$ for node heterophily for node $v_i$ , $\hat{\mathbf{h}}$ for heterophily degree vector in the spectral domain, $\hat{h}_i$ for heterophily of the $i$ -th element in the spectral domain.
We will remove ambiguous expressions like "heterophily degree". We will consistently use terms like “heterophily degree vector” as appropriate.

Q2: Similarly, in response to my question about the hyperparameters used for the baselines, the authors stated that full hyperparameter details are provided in Appendix F. However, Appendix F does not include the full hyperparameter settings for the competing methods, it is "Scalability and Parameter Sensitivity Analysis".

A2: We apologize for the confusion. Our hyperparameter settings are actually provided in Appendix D. We will include more comprehensive information in the revision and ensure proper cross-referencing. For clarification, we employ a uniform hyperparameter setting accorss different baseline models for a fair comparison. The specific hyperparameters are num_hid=64, lr=0.01, weight_decay=1e-5.

Q3: The paper would benefit from a more detailed discussion of the trade-off between runtime efficiency and performance.

A3: Thank you for your suggestion. We provide a detailed experiment of the trade-off between runtime efficiency and performance and discussion below.

Model	Squirrel Filtered Performance	Squirrel Filtered Time (secs)	Chameleon Filtered Performance	Chameleon Filtered Time (secs)
Bern-Net	35.69±0.162	0.012±1.585e-5	36.63±0.301	0.012±4.465e-5
APPNP	34.06±0.157	0.0032±1.018e-5	37.61±0.320	0.0034±1.728e-7
ChebNet	36.13±0.176	0.0058±6.767e-5	37.32±0.302	0.0032±9.884e-6
HeroFilter	40.14±0.176	0.0033±6.806e-5	38.49±0.283	0.0036±3.006e-5

These results demonstrate performance on node classification tasks, where HeroFilter shows better accuracy with comparable efficiency to baselines like Bern-Net, APPNP, and ChebNet. We will include this study in our revised version.

Q4: The paper would benefit from a clear section for the used notation.

A4: Thank you for your suggestion. We provided a "Symbols and Notations" table in Appendix B. We will expand this table to include more symbols, particularly the heterophily degree vector notation you mentioned, and clearly describe how we reference these symbols in the main text to improve clarity.

2025-08-05

Q5: Several references ([35]–[42]) are cited in the appendix but not listed in the reference section.

A5: We apologize for this formatting problem in the submission. The missing references 35-42 (cited in the appendix but not listed due to a compilation error) are as follows:

[35] Hoffman, D. G. (1981). Packing problems and inequalities. In The Mathematical Gardner (pp. 212–225). Springer.

[37] Pei, H., Wei, B., Chang, K. C.-C., Lei, Y., & Yang, B. (2020). Geom-GCN: Geometric graph convolutional networks. In International Conference on Learning Representations.

[39] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

[40] Fout, A., Byrd, J., Shariat, B., & Ben-Hur, A. (2017). Protein interface prediction using graph convolutional networks. Advances in Neural Information Processing Systems, 30.

[42] Leskovec, J., & Sosič, R. (2016). SNAP: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology (TIST), 8(1), 1–20.

We'll correct this in revision by including a full appendix reference list, ensuring seamless integration.

Q6: The legend text in Figure 3 is too small and difficult to read.

A6: Thank you for pointing this out. We will increase the font size in the revised version. Currently, Figure 3 in the appendix shows 12 plots arranged as 4 per row. We will reorganize this to 3 plots per row to provide more space for larger, more readable legends.

Q7: Algorithm 1 and Algorithm 2 in the appendix both carry the same title "Fast-HEROFILTER Framework”, and the difference between the two should be highlighted.

A7: Thank you for pointing this out. We will revise the titles in the appendix to clearly distinguish between the two: Algorithm 1 will be retitled "HeroFilter Framework" (standard version), and Algorithm 2 will remain "Fast-HeroFilter Framework" (efficient variant). To further emphasize the difference, we will add a brief explanatory note or inline description underscoring that the key subtlety lies in the computation of the spectral relevance score. Specifically, Algorithm 2 employs optimized approximations for faster processing while maintaining comparable accuracy.

Again, we sincerely thank reviewer BfgS for your detailed comments. Your valuable feedback has helped us produce a better, clearer, and more accessible manuscript.

2025-08-06

I would like to thank the authors for their response. I will update the score.

2025-08-08

We sincerely thank you for your detailed and insightful suggestions, and for considering a score update. We will incorporate the new results and enhance clarity based on your valuable feedback in the final version.

审稿意见

评分: 5置信度: 42025-07-02

This paper proposed an method to adaptively learn the context ( i.e., multi-hop neighbors ) of central node, that leverage the PPR importance scoring to sample the neighbors within K-order neighborhoods, aiming at addressing the heterophily diversity among nodes. The experiments across the full heterophily spectrum of graphs have been conducted to validate the effectiveness of the proposed method.

优缺点分析

Strengths:

The paper is well-written and easy to follow.
Theoretical analysis of the non-monotonic relationship between average filter response and the spectral heterophily is present.
The proposed method consistently shows competitive performance on homophilic and heterophilic graphs.

Weaknesses:

The goal of this work is to learn a filter that scores nodes based on their spectral alignment so as to enable an adaptive notion of neighborhood, which means that specific parameter vector $\omega_{ki}$ (in Eq.(6)) will be assigned for $\lambda_i$ of different order $k$ . I think it is the core of this paper. However, an alternative approach based uniform diffusion (with constant $c$ ) is used. While functionally effective, this method shares notable commonalities with established attention or diffusion mechanisms, offering limited novelty.
The claimed novelty is undermined by insufficient comparisons with relevant works (e.g., [1-5]), including adaptive filtering approaches , diffusion/pageRank based and attention based methods.
No performance evaluation of the proposed HETEROFIlTER Patcher(not the fast implementation). Despite of its high time complexity, it is worthwhile to show the performance of HETEROFIlTER on the small/mediate heterophilic graphs.

Ref. [1]. Dong, et al. Graph Neural Networks with Adaptive Frequency Response Filter. CIKM 2021.

[2]. J Guo, et al. Rethinking spectral graph neural networks with spatially adaptive filtering. arXiv preprint arXiv:2401.09071, 2024. 4, 2024. [3]. A Saha, et al. Learning Adaptive Neighborhoods for Graph Neural Networks. ICCV 2023. [4]. J Wang, et al. Heterophily-Aware Graph Attention Network. (arXiv:2302.03228) PR 2024. --HA-GAT [5]. E. Chien, J. Peng, P. Li, O. Milenkovic, Adaptive universal generalized pagerank graph neural network, ICLR, 2021.

问题

It is unclear why let $g(\Lambda)$ align with $\mathbf{Y}$ (the label vector)
Do the datasets Squirrel and Chamelon are duplicate removed ones?
Table 5 is the ablation study on two large datasets (where most of comparison methods are OOM), what about other smaller heterophilic graphs?

局限性

yes

最终评判理由

After considering the authors' responses to my concerns and those of other reviewers, I recommend acceptance of the paper.

格式问题

no formatting issues

作者回复

2025-07-31

W1: Novelty.

We respectfully but firmly assert that HeroFilter's novelty is not undermined by commonalities with attention or diffusion methods.

In general, HeroFilter is a direct, rigorous operationalization of our theoretical insights, creating a seamless connection that advances beyond the limitations of prior works like attention-based (e.g., GAT [1], HA-GAT [2]) and diffusion-based methods (e.g., APPNP [3], GPRGNN [4]), which often rely on uniform or topology-driven mechanisms without explicit heterophily-spectral theory.

Next, we will extend this from four aspects:

Property 1 mathematically proves the non-monotonic relationship between heterophily degree and spectral filter response, revealing why uniform diffusion in APPNP/GPRGNN or local attention in GAT/HA-GAT fail on varying graphs—this insight directly motivates our adaptive polynomial filters (Eq. 7), which learn frequency-specific responses beyond diffusion's fixed propagation or attention's edge-weighting, focusing on enhancing generalization by dynamically handling diverse heterophily patterns.
Property 2 theoretically guarantees that such polynomials can align with arbitrary label patterns under heterophily, justifying the patcher's spectral relevance scoring (Eq. 8) as a non-uniform mechanism for neighbor selection that prioritizes prediction performance through frequency-specific adaptation, unlike attention's heuristic similarity scores or diffusion's personalized but non-spectral walks.
Theorem 1's error bound explicitly links filter misalignment to prediction errors, bounding them in terms of heterophily—guiding the mixer's aggregation to minimize this bound empirically, thereby improving robustness and accuracy across homophilic and heterophilic settings, where attention/diffusion methods lack such bounds and often underperform on mixed heterophily (e.g., +4-7% gains over GAT/APPNP, Table 1).
This theoretical-practical bridge focuses on heterophily adaptation, algorithmic advances like scalable patching in Fast-HeroFilter (deriving from Eq. 7, not mere diffusion), and empirical gains (e.g., +5.4% over baselines on Texas, Table 1), where our solutions yield superior generalization and prediction.

W2: Relevant works

We understand your concern and will cite/discuss these in the revision. HeroFilter is the first theoretical framework linking heterophily, graph filters, and generalization performance, with RQ1 (non-monotonic links) and RQ2 (adaptive, scalable filters) advancing beyond priors.

Comparing with [1]: Discusses global filter limits; [1] uses localization for node-wise filters, while we analyze heterophily's impact on polynomial error (RQ1) and adapt patches via heterophily degree (RQ2).
Comparing with [2]: Rethinks spectral power (RQ1) but ignores heterophily complexity (e.g., mid-freq in low-heterophily, our Fig. 1); we add Property 2's universal proof for adaptive RQ2 solutions.
Comparing with [3]: Offers static diverse filters (RQ2); our MLP-Mixer enables node-specific dynamism, +3-7% over similar baselines (Table 1).
Comparing with [4]: Uses attention for local heterophily (RQ2); lacks polynomial flexibility/spectral grounding—our Eq. 7 filters and Theorem 1 bounds generalize better, +2-5% on heterophilic graphs (e.g., Squirrel).
Comparing with [5]: Employs PageRank propagation; assumes uniform optimization without heterophily theory—our Property 1 non-monotonicity and Eq. 8-10 patching achieve 4-6% gains on mixed graphs.

While [1-5] advance spectral and adaptive GNNs for heterophily, HeroFilter distinguishes itself through: (1) theoretical novelty—proving non-monotonic heterophily-spectral relationships (Property 1), universal filter alignment (Property 2), and generalization error bounds (Theorem 1), all absent in [1-5]; (2) algorithmic innovation—integrating adaptive polynomial filtering with MLP-Mixer for node-specific, scalable patching, unlike [1]'s frequency response filters, [2]'s spatial-spectral rethinking without per-node adaptivity, [3]'s non-spectral neighborhood learning, [4]'s heterophily-aware attention lacking global mixing, or [5]'s PageRank propagation without mixer synergy; and (3) empirical superiority—delivering consistent generalization across heterophily spectra and outperforming on diverse graphs. This unified framework surpasses priors with persuasive SOTA results.

W3: Fast version performance

Our results are based on the non-fast version of HeroFilter. Below, we include performance for the fast variant (Fast-HeroFilter) from our experiments, alongside the original HeroFilter results. Shown from our results, the fast variant (Fast-HeroFilter) shows slightly lower performance but with reduced standard deviations, suggesting improved stability.

Model	Squirrel (filtered)	Chameleon (filtered)
Fast-HeroFilter	39.90±1.36	38.00±2.10
HeroFilter	40.14±1.76	38.49±2.83

Q1: Unclear why let $g(\mathbf{\Lambda})$ align with label vector $Y$ . The key insight is that in graph learning, we want the filtered features to preserve class-distinguishability. By aligning $g(\mathbf{\Lambda})$ with the label vector $\mathbf{Y}$ , we ensure that the spectral filter produces representations where nodes from the same class have similar values. This is particularly important for heterophilic graphs where the original graph structure (encoded in $\mathbf{\Lambda}$ ) may not align well with labels.

Q2: Are Squirrel/Chameleon de-duplicated?

Thank you for bringing this to our attention. We have conducted extra experiments on these two datasets. The results, shown below, reinforce HeroFilter’s strong performance and robustness on these datasets, aligning with our theoretical claims of adaptive spectral filtering for varying heterophily (Property 1 and Theorem 1).

Model	Squirrel (filtered)	Chameleon (filtered)
Bern-Net	33.85 ± 1.55	32.73 ± 2.10
APPNP	31.30 ± 1.24	37.10 ± 2.99
HeroFilter	40.14 ± 01.76	38.49 ± 2.83

Q3: Ablation on small heterophilic graphs.
To further validate the effectiveness of HeroFilter’s mixer components, we conducted ablation studies on the filtered Squirrel and Chameleon datasets. The results highlight the critical roles of both patch and feature mixing: removing either component leads to a gradual performance decline, confirming their synergistic contribution to robust heterophily adaptation and underscoring the mixer’s integral role in achieving HeroFilter’s superior performance.

Component	Squirrel (filtered)	Chameleon (filtered)
HeroFilter	40.14±01.76	38.49±2.83
with Patch Mixer	39.28±1.76	37.75±2.69
with Feature Mixer	39.17±1.73	37.21±3.15
No Mixer	37.69±1.87	36.12±2.46

2025-08-05

Thank the authors for the detailed responses, which have addressed all my concerns. I have updated my scoring accordingly.

2025-08-08

We're glad to hear that all your concerns have been addressed! Thank you again for your thoughtful feedback and strong support — we’ll incorporate the new results into the next version.

审稿意见

评分: 4置信度: 42025-07-02

This paper presents HEROFILTER, an adaptive spectral graph filter framework designed to address performance challenges of Graph Neural Networks (GNNs) on heterophilic graphs. Through theoretical analysis and experiments, the authors demonstrate that the relationship between graph heterophily and optimal filter response is not simply monotonic but varies complexly across frequency components. This finding challenges the conventional fixed filter design paradigm. Based on this observation, the authors propose a framework capable of adaptively capturing information across different frequency ranges. Experimental results show that HEROFILTER performs excellently on both homophilic and heterophilic graphs.

优缺点分析

Strengths

The paper establishes a rigorous theoretical connection between graph heterophily, spectral filters, and GNN prediction performance, introducing a spectral domain measure of heterophily and deriving error bounds.
The authors challenge the simplified assumption in existing research that "homophilic graphs use low-pass filters, heterophilic graphs use high-pass filters," demonstrating through controlled experiments that this relationship is actually more complex and non-monotonic. This insight is crucial for understanding GNN behavior on heterophilic graphs.
The authors conduct extensive experiments on 12 datasets with varying degrees of heterophily, showing that HEROFILTER outperforms existing methods on both homophilic and heterophilic graphs.

Weaknesses

Although the theoretical analysis in this paper effectively supports the main motivation of this paper, the design of specific solutions is still heuristic, resulting in a disconnect between theoretical analysis and practical solutions.
Motivation has actually been explored in previous studies. [1] systematically discussed Q1 in its work, and [1-3] provided solutions to Q2. Authors should reasonably cite these references and discuss the similarities and differences between their work and these studies.
As a key module of this work, the design of HEROFILTER Mixer is very similar to that of [4]. This requires clarification and explanation of the similarities and differences, as well as explicit references to [4].
Unclear details. The author did not specify whether the experimental results were obtained using the original version of herofilter or the fast version. In addition, the number of steps when implementing the fast version was not mentioned. It is important to note that the number of passes is critical for the patcher, as it determines the size of the subgraph over the target node on which the filter is estimated [5].
Fast-HEROFILTER actually follows the practice of APPNP, so it may not be suitable as a contribution point.

[1] Node-oriented spectral filtering for graph neural networks. TPAMI 2023

[2] How powerful are spectral graph neural networks. ICML 2022

[3] Graph neural networks with diverse spectral filtering. WWW 2023

[4] A Generalization of ViT/MLP-Mixer to Graphs. ICML 2023

[5] The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE signal processing magazine 2013

问题

In addition to the issues mentioned in “Weaknesses,” there are the following questions. It would be greatly appreciated if the authors could respond.

What is the significance of the spectral correlation matrix mentioned in Section 4.1? Please elaborate and explain further.
How is the specific operation of selecting top-k performed, and how is gradient backpropagation implemented?
It is recommended that the results of [1] and [2] be added to the experimental comparison, and that a qualitative analysis of the patcher and mixer be added.
In the implementation details section of the supplementary materials, the authors mention adding an additional two-layer MLP as a classifier for all baselines. I do not think this is reasonable, as this practice actually modifies the original model architecture, such as APPNP and BernNet. The authors need to provide an explanation for this practice.

局限性

Yes

最终评判理由

The author's response resolved most of my concerns. However, due to the excessive similarity of this work to MLP-Mixer and some over-claimed contributions, I give a final score of 4 (borderline accept). At the same time, based on the above considerations, if the AC ultimately chooses to reject this paper, I will follow the AC's decision.

格式问题

N/A

作者回复

2025-07-31

W1: Heuristic design and disconnect between theory and solutions.

We respectfully but firmly assert that our design is not heuristic.

In general, HeroFilter is a direct, rigorous operationalization of our theoretical insights, creating a seamless connection that advances beyond the limitations of prior works.

Next, we will extend this from four aspects:

First, Property 1 mathematically proves the non-monotonic relationship between heterophily degree and spectral filter response, revealing why fixed strategies fail on varying graphs—this insight directly motivates our adaptive polynomial filters (Eq. 7), which learn node-specific responses to modulate across frequencies, focusing on enhancing generalization power by dynamically handling diverse heterophily patterns.
Then, Property 2 theoretically guarantees that such polynomials can align with arbitrary label patterns under heterophily, justifying the patcher's spectral relevance scoring (Eq. 8) as a non-heuristic mechanism for neighbor selection that prioritizes prediction performance through frequency-specific adaptation.
Finally, Theorem 1's error bound explicitly links filter misalignment to prediction errors, bounding them in terms of heterophily—guiding the mixer's aggregation to minimize this bound empirically, thereby improving robustness and accuracy across homophilic and heterophilic settings.
This theoretical-practical bridge focuses on heterophily adaptation, algorithmic advances like scalable patching in Fast-HeroFilter, and empirical gains (e.g., +5.4% over baselines on Texas, Table 1), where our solutions yield superior generalization and prediction.
- Furthermore, our framework unifies theory for heterophily focus, algorithms for adaptive efficiency, and empirics for validated performance demonstrated by consistent SOTA results.

W2: Prior studies ([1] Node-oriented spectral filtering [2] How powerful are spectral GNNs; [3] GNNs with diverse spectral filtering).

We understand your consideration, and we will cite and discuss these papers in our updated version as follows.

To begin with, we would like to point out that we are the first theoretical framework characterizing heterophily and graph filters and generalization performance. Our motivation for RQ1 (non-monotonic heterophily-spectral links) and RQ2 (adaptive, scalable filters for varying relations) goes far beyond prior explorations, introducing novel elements that address their shortcomings.

[1] establishes the localization property of node-wise spectral filters in the frequency domain, while our theoretical analysis focuses on how heterophily impacts the generalization error of GNNs with polynomial filters (RQ1). For RQ2, [1] designs node-specific spectral filters while maintaining the graph structure, whereas our GPatcher adaptively adjusts the patch extraction process based on the degree of heterophily.
[2] discusses spectral power for RQ1 but overlooks varying heterophily's complexity (e.g., mid-frequency needs in low-heterophily graphs, as in our Fig. 1); we extend this with Property 2's universal alignment proof, powering adaptive solutions absent in [2].
[3] provides diverse filters for RQ2 but statically, without adaptation to node-specific heterophily—our MLP-Mixer integration enables this dynamism, yielding +3-7% gains over similar diverse-filter baselines (Table 1). These aren't superficial similarities; our work unifies and surpasses them with new theory and adaptivity, as our consistent SOTA performance across homophilic/heterophilic graphs persuasively demonstrates.

W3: Related Work [1]

Thanks for providing a related work. We analyze the difference and promise to add it to our updated paper:

[1] generalizes MLP-Mixer using the METIS algorithm to partition graphs into subgraphs for graph-level tasks, focusing on structural tokenization without explicit heterophily handling.
In contrast, our HeroFilter Mixer operates on spectrally aligned patches generated by our adaptive patcher (Eq. 8), guided by Property 1's non-monotonic heterophily-spectral theory, to dynamically blend frequencies and adapt to varying relations.

[1] A Generalization of ViT/MLP-Mixer to Graphs. ICML 2023

W4: Unclear details

We ran all experiments using the non-fast (original) version of HeroFilter by employing approximations for eigendecomposition on large graphs, which would maintain the core spectral relevance computations (Eq. 7) while addressing scalability issues on large graphs like Snap-Patents. Our results below show a slight difference between the fast and non-fast versions of HeroFilter.

Model	Squirrel (filtered)	Chameleon (filtered)
Fast-HeroFilter	39.90±1.36	38.00±2.10
HeroFilter	40.14±1.76	38.49±2.83

W5: Fast-HeroFilter follows APPNP

We respectfully disagree that Fast-HeroFilter merely follows APPNP from two aspects.

First, our Fast-HeroFilter is directly derived from HeroFilter's spectral polynomial filter $g(\mathbf{\Lambda}) = \sum_{k=1}^K \mathbf{w}k \odot \mathbf{\Lambda}^k$ in Eq. 7 by approximating the eigendecomposition-free form via iterative power series expansion $\mathbf{R}$ in Eq. 9, where $c$ serves as a tunable damping factor (optimized via grid search) to balance local/global importance while explicitly handling non-monotonic heterophily (per Property 1);
Therefore, the fast version also enables computation of spectral-relevance scores $\mathbf{R}$ per node for top- $p$ patch selection (Eq. 10, which is absent in APPNP), integrated with our Mixer for ordered aggregation—yielding 4-7% gains over APPNP on heterophilic graphs (Table 1) and scalable operation, distinguishing it as a heterophily-targeted advancement rather than generic personalization like APPNP's fixed alpha-based propagation.

Q1: Significance of $\mathbf{R}$

We would like to clarify that the spectral correlation matrix $\mathbf{R}$ (Eq. 7) is pivital in our model design since it forms the backbone of our adaptive patching, enabling HeroFilter to address topological limitations and capture heterophily-varying relevance in a way that enhances generalization and prediction performance.

To be more specific, $\mathbf{R}$ computes pairwise node similarities in the spectral domain via learned polynomials ( $\sum \mathbf{w}_k \odot \mathbf{\Lambda}^k$ ), directly embodying Property 2's alignment guarantee to match arbitrary labels under non-monotonic heterophily (Property 1).

It filters neighbors based on frequency-specific relevance—e.g., emphasizing high-freq for heterophilic nodes—reducing errors per Theorem 1.
Further: In heterophily=0.9 graphs (Fig. 1), $\mathbf{R}$ prioritizes dissimilar but informative nodes, explaining our SOTA on Squirrel (+6.8% over baselines).

Q2: Top-k operation and gradient backpropagation.

We would like to clarify that the top-k selection in our patcher (Eq. 8) is a precise, differentiable operation that enables end-to-end learning of adaptive neighborhoods (from PyTorch). Specifically:

For each node's scoring vector in $\mathbf{R}$ , torch.topk selects the top-p values and indices (p=32 default, tunable), forming patches by gathering features at those indices while preserving spectral order (as validated in Table 3, where ranked selection outperforms random by +3.4% by aligning with Property 1's frequency variations).
For backpropagation: Gradients naturally flow through the selected values to the corresponding input positions (non-selected receive zero gradients), ensuring robust training without additional approximations.

Q3: Extra results and qualitative analysis

Our results show HeroFilter outperforms both [1] and [2] on heterophilic and large-scale datasets

model	Arxiv-year	Ogbn-Arxiv
NFGNN	40.38±0.46	69.20±0.45
JacobiConv	40.34±0.46	68.18±0.15
HeroFilter	54.66±0.28	69.61±0.33

To assess patcher and mixer robustness, we analyzed sensitivity to polynomial terms (k) and conducted ablations on filtered Squirrel and Chameleon datasets. Patcher results show high stability (accuracy variance <0.5% across k), highlighting hyperparameter insensitivity and reliable heterophilic performance. Mixer ablations reveal that removing patch or feature mixing gradually degrades performance, confirming their synergistic role in heterophily adaptation.

	Squirrel(filtered)	Chameleon(filtered)
k=5	40.24±0.21	38.93±0.29
k=10	40.05±0.17	39.40±0.23
k=50	40.26±0.12	38.04±0.30
k=150	40.04±0.19	38.47±0.24
k=300	40.05±0.17	38.05±0.24
with Patch Mixer	39.28±1.76	37.75±2.69
with Feature Mixer	39.17±1.73	37.21±3.15
No Mixer	37.69±1.87	36.12±2.46

Q4: Explanation for adding two-layer MLP

We would like to point that this is a common practice as shown from recent literature [1,2]. Many baselines produce feature embeddings without built-in classifiers, requiring a projection head for node classification tasks; our two-layer MLP (simple linear+ReLU+linear) adds no core architectural changes but standardizes prediction, as raw softmax on embeddings underperforms (~1-3% drop, per our tests). This doesn't "modify" originals as it's post-processing, akin to common practices for equitable evaluation from some other papers, such as [1].

[1] A Fair Comparison of Graph Neural Networks for Graph Classification. ICLR 2020 [2] Benchmarking Graph Neural Networks. JMLR 2023

To further validate this, we conducted experiments removing the MLP from baselines, which show that raw softmax on embeddings leads to significant underperformance (~5-11% drop), making them much worse baselines. The results below on Arxiv-year and Ogbn-Arxiv datasets demonstrate this:

Model	Arxiv-year	Ogbn-Arxiv
BernNet(w/o MLP)	32.94±0.71	65.02±0.08
APPNP(w/o MLP)	32.96±1.48	65.51±0.15
BernNet(w/ MLP)	37.92±0.16	69.50±0.10
APPNP(w/ MLP)	44.04±0.58	68.51±0.15
HeroFilter	54.66±0.28	69.61±0.33

2025-08-04

The author's response has resolved some of my concerns, but I still have reservations about weaknesses 3 and 5. Specifically, the author's derivation of the fast version of HeroFilter is still consistent with APPNP, and this fast version is also adopted by many GNNs. Therefore, I still do not believe that 5 can be considered a contribution of this paper.

Based on the above,, I will raise the score 3 -> 4.

2025-08-08

Thank you very much for your valuable feedback and for raising the score from 3 to 4! We sincerely appreciate your support and will incorporate the suggested revisions and detailed clarifications into the future version.

审稿意见

评分: 5置信度: 42025-07-05

The paper investigates graph heterophily, where connected nodes in a graph have different labels, challenging the conventional use of fixed spectral filters in Graph Neural Networks (GNNs). Traditional approaches apply low-pass filters for homophilic graphs and high-pass filters for heterophilic graphs, but the study reveals that the relationship between heterophily and filter response is complex and not strictly monotonic. This suggests that fixed filter designs are insufficient, as optimal filter responses vary across frequency components. Through theoretical analysis, they confirm that GNN performance does not correlate monotonically with heterophily, necessitating adaptive filters for better generalization. They propose HEROFILTER, a novel GNN that extracts and combines salient representations across the heterophily spectrum using adaptive mixing. HEROFILTER outperforms leading baselines by up to 9.2% in accuracy across both homophilic and heterophilic graphs, demonstrating its effectiveness and versatility.

优缺点分析

Strengths

Well-Motivated Approach: The study clearly articulates the limitations of conventional fixed filter designs and effectively motivates the development of adaptive filtering to address varying heterophily degrees.
Robust Theoretical Analysis: The paper provides a thorough theoretical foundation, demonstrating that the relationship between graph heterophily and GNN frequency response is not strictly monotonic, justifying the need for adaptive filters.
Strong Performance: The proposed HEROFILTER achieves up to 9.2% accuracy improvement over leading baselines across both homophilic and heterophilic graphs, showcasing its effectiveness and versatility.

Weaknesses:

The paper appears comprehensive, with no clear weaknesses noted in the provided context.

问题

N.A.

局限性

yes

格式问题

N.A.

作者回复

2025-07-31

Dear Reviewer Yps2,

Thanks very much for your review and appreciation. We greatly appreciate your recognition of our work's strengths, such as the rigorous theoretical connection between heterophily, spectral filters, and GNN performance; our challenge to oversimplified low/high-pass assumptions with empirical evidence of non-monotonic relationships; and our extensive experiments demonstrating HeroFilter's effectiveness across homophilic and heterophilic graphs. Your feedback validates the core contributions of our paper and motivates us to continue advancing this area. Should you have any additional questions, suggestions, or clarifications during the discussion phase, we would be delighted to address them promptly.

Best,

Authors

最终决定Accept (poster)

2025-09-17

This work proposes HEROFILTER, a new GNN which addresses the limitations of fixed spectral filters on heterophilic graphs. It works by adaptively extracting and combining representations across the heterophily spectrum, based on the theoretical finding that the relationship between heterophily and filter response is non-monotonic.

Reviewers assessed this paper generally positively, and several of their opinions improved during the rebuttal period given engagement from the authors.

Main feedback points:

some concerns about novelty (e.g. key modules like Mixer have been explored in prior studies), and the proposed method is similar to existing attention or diffusion mechanisms (zcnE, BfgS).
some concerns re: missing missing references to relevant work and positioning around novelty being undermined by insufficient comparisons (zcnE, BfgS).
confusion around experimental details -- several reviewers pointed out a lack of understanding the purpose of specific functions, inconsistency in theoretical presentation of results, and missing clear tradeoffs between fast and non-fast versions of the model (zcnE, BfgS, EZ4C).