4.0

/10

Rejected3 位审稿人

最低3最高6标准差1.4

3.3

置信度

正确性2.7

贡献度1.7

表达2.3

ICLR 2025

Understanding When and Why Graph Attention Mechanisms Work via Node Classification

Zhongtian Ma,Qiaosheng Zhang,Bocheng Zhou,Yexin Zhang,Shuyue Hu,Zhen Wang

OpenReview PDF

提交: 2024-09-26更新: 2025-02-05

摘要

Despite the growing popularity of graph attention mechanisms, their theoretical understanding remains limited. This paper aims to explore the conditions under which these mechanisms are effective in node classification tasks through the lens of Contextual Stochastic Block Models (CSBMs). Our theoretical analysis reveals that incorporating graph attention mechanisms is *not universally beneficial*. Specifically, by appropriately defining *structural noise* and *feature noise* in graphs, we show that graph attention mechanisms can enhance classification performance when structural noise exceeds feature noise. Conversely, when feature noise predominates, simpler graph convolution operations are more effective. Furthermore, we examine the over-smoothing phenomenon and show that, in the high signal-to-noise ratio (SNR) regime, graph convolutional networks suffer from over-smoothing, whereas graph attention mechanisms can effectively resolve this issue. Building on these insights, we propose a novel multi-layer Graph Attention Network (GAT) architecture that significantly outperforms single-layer GATs in achieving *perfect node classification* in CSBMs, relaxing the SNR requirement from $ \omega(\sqrt{\log n}) $ to $ \omega(\sqrt{\log n} / \sqrt[3]{n}) $. To our knowledge, this is the first study to delineate the conditions for perfect node classification using multi-layer GATs. Our theoretical contributions are corroborated by extensive experiments on both synthetic and real-world datasets, highlighting the practical implications of our findings.

关键词

Graph attention mechanismsnode classificationcontextual stochastic block modelover-smoothing

评审与讨论

审稿意见

评分: 3置信度: 32024-10-29

This paper analyzes the graph attention mechanism by CSBMs, revealing that graph attention mechanisms can enhance classification performance when structure noise exceeds feature noise. Conversely, simpler graph convolution operations are more effective when feature noise predominates. Then, the authors design a new GAT that outperforms traditional GAT.

优点

Strengths

The paper is theoretically sound.
The proposed method is easy to follow.

缺点

Weaknesses

The main finding—that GAT performs better with higher structural noise and fails with predominant feature noise—seems intuitive. Recent research on graph heterophily also indicates that when structure is heterophilic (i.e., noisy as described here), GAT models tend to fail.
The paper analysis is based on artificially generated datasets, how to judge whether GAT is suitable in the real application?
The proposed method can be seen as hard attention, which is a discontinuous function, and I think the gradient cannot be returned. I hope the author gives further explanation to dispel my concerns.
Many recent GT models lack in comparison[1,2,3].
The readability of the paper is not good.

minor problem： homogeneous -> homophilic? in line235.

[1] Zhang, Heng-Kai, et al. "HONGAT: Graph Attention Networks in the Presence of High-Order Neighbors." Proceedings of the AAAI Conference on Artificial Intelligence.

[2] Lee, S. Y., Bu, F., Yoo, J., and Shin, K. Towards deep attention in graph neural networks: Problems and remedies. In International Conference on Machine Learning(ICML),

[3] Brody S, Alon U, Yahav E. How attentive are graph attention networks?

问题

See weaknesses.

伦理问题详情

评论- Reply to Reviewer Afdj

2024-11-20

We sincerely thank the reviewers for their valuable feedback. We would like to clarify that although the primary finding of our paper—the graph attention mechanism does not always work—has been mentioned empirically in some prior works, the primary goal of our study is to provide a theoretical understanding of this phenomenon. Specifically, our work achieves the following:
- (i) Characterizes theoretically when the graph attention mechanism works and when it fails (i.e., identifies the precise regime);
- (ii) Quantifies the extent to which the mechanism improves performance when it works (i.e., provides the regime where multi-layer GATs achieve perfect node classification and demonstrates the improvement over GCNs);
- (iii) Examines the role of GATs in mitigating the over-smoothing problem.
Providing a theoretical foundation for these empirically observed conclusions is a significant contribution of our work.
Thank you for raising this point. We emphasize that the CSBM used in our study is an random graph model derived from a wide range of real-world graph data. It captures fundamental properties of real-world graph data, especially featured graphs, and has been widely adopted for theoretical analysis in GNN research (as discussed in the related works section of our paper). Furthermore, the conclusions derived from our theoretical analysis have been validated on three commonly used real-world graph datasets (Cora, Citeseer and Pubmed), which sufficiently demonstrates their correctness (see Figure 2 and related discussions in our paper).

Regarding more complex scenarios, we note that the CSBM model offers many adjustable parameters and several variants, enabling the generation of datasets that reflect diverse real-world scenarios. Since our study is the first to analyze multi-layer GATs for node classification under the CSBM model, we opted for a relatively simple version of CSBM for clarity. We leave the analysis of more complex models for future work.
This is a good question. The attention mechanism we designed in this paper serves two purposes:
- (i) To facilitate theoretical analysis by enabling precise characterization of the node feature distributions after applying the attention layer;
- (ii) To represent the core principle of most graph attention mechanisms, namely assigning higher weights to nodes with more similar features during message passing.
This setup is used only in the theoretical analysis and experiments on synthetic datasets. For real-world datasets, we employed the default GAT layers provided in the PyG library.
As mentioned above, the graph attention mechanism we designed was tailored for theoretical analysis rather than for proposing a more advanced attention mechanism for practical applications. Additionally, our experiments on real-world datasets, using standard GAT layers, have validated the correctness of our conclusions.
As a paper containing a substantial amount of theoretical results, we have made every effort to improve readability, which has been acknowledged by another reviewer (Reviewer pv7Y: "The paper is reasonably well-written and easy to follow"). We warmly invite the reviewer to raise questions about any sections that remain unclear, and we are happy to provide clarifications.
For the minor issue: We greatly appreciate you pointing this out. It was indeed a typo, and we have now corrected it.

Once again, we thank the reviewers for their insightful comments and suggestions, and we look forward to further discussions with you!

评论- Supplementary Explanation Regarding Weakness 1 and Gentle Reminder

2024-11-25

Dear Reviewer Afdj,

We would like to provide a supplementary clarification regarding your summary of Weakness 1, which appears to misinterpret our results. Specifically, the statement “Recent research on graph heterophily also indicates that when structure is heterophilic (i.e., noisy as described here), GAT models tend to fail” conflicts with the conclusions presented in our paper.

You suggest that GAT models generally fail on graphs exhibiting heterophily. However, according to our findings, graphs with heterophilic structures—or less apparent homophily—are characterized by larger structural noise. In such cases, if the feature noise is relatively small, this falls within a regime where GAT models are effective. Conversely, for graphs with strong homophily (i.e., low structural noise), GAT models are less likely to perform effectively.

Given this misalignment with our conclusions, we respectfully question your assertion in Weakness 1 that “the main finding in our paper is intuitive.” We believe this interpretation may not fully reflect the nuances of our analysis and contributions.

As the rebuttal phase is nearing its conclusion, we have not yet received a response to our previous clarification. We would greatly appreciate it if you could review our explanation and share your feedback at your earliest convenience. Your insights are critical to ensuring an accurate assessment of our work.

Thank you again for your time and thoughtful review.

评论- A gentle reminder

2024-12-02

As the rebuttal phase is approaching its end, I would like to kindly request that you check whether your concerns have been addressed. Thank you for your efforts; we believe your feedback is essential for a fair evaluation of our paper.

审稿意见

评分: 6置信度: 42024-11-03

This paper studies properties of graph attention mechanisms in the context of node classification in the contextual stochastic block model (CSBM). The authors derive several theoretical results. First, when compared with the simple graph convolution operation, graph attention can either be beneficial or not beneficial, depending on the level of structure noise in the graph and the level of feature noise in node features. This complements prior study of graph attention in a similar setting. Second, under assumptions of sufficiently small feature noise and sufficiently dense graph connectivity, the authors show that graph attention can effectively avoid the issue of over-smoothing, up to order n layers where n is the number of nodes in the graph. Third, the authors show that a multi-layer graph attention network can achieve perfect node classification, up to a signal threshold that is much smaller than what is required by a single-layer graph attention network. The authors empirically valid their theoretical claims over both synthetic data and semi-synthetic data (i.e. real-world networks with synthetic node features).

优点

The paper is reasonably well-written and easy to follow.
The results in this paper extend our understanding of graph attention to multi-layer settings. In particular, the separation between simple graph convolution and graph attention in terms of over-smoothing is intuitive and interesting.

缺点

Given prior work on the analysis of single-layer graph attention [1] and the combination of simple graph convolution with graph attention in a multi-layer architecture [2], the current work does not offer a strikingly new perspective. The technical results are not surprising.
The assumptions on p, q, SNR are kind of strong. For example, Assumption 1 requires both p and q to be log^2 n / n. There are also gaps in SNR in Corollary 1. I understand that prior work also rely on similar assumptions. This paper does not offer an improvement in terms of the parameter regimes (i.e., ranges of p, q, and SNR) required to analyze graph attention.
The authors should cite [2] and compare with the results in [2] both theoretically and empirically. In [2], a combination of GCN and GAT is proposed and the authors show that the required SNR to achieve perfect node classification in CSBM can be significantly reduced. It seems that if one assumes both p and q are constants, then Corollary 2 in [2] is much stronger than Theorem 4 in this paper. The authors should discuss this.
Minor:
- Line 57, CSBM has been used as a data model analyze the performance of various GNNs, I believe that this is mostly due to the simplicity of CSBM. I don't think I should agree with the claim that CSBM is "a powerful tool ... to model real graph data". The authors should either revise this claim or provide justifications for such a claim.
- Line 148, the authors use one-dimensional features throughout this paper. They should comment here if and how their results generalize to higher-dimensional features.
- Line 226, I don't think Assumption 1 covers most practical graph data. On the contrary, a lot of graph data in practice are sparse and may not even be homophilic. I would suggest the authors change the word "most" to "many" at the minimum, and provide some context.
- Line 317, and line 327, typo: F_noise and S_noise should be swapped.
- Line 493, the authors used t = [0, 0.5, 0.5, 5] for GAT*. This is different from what is considered in Appendix J, where the first L layers have t = 0. The authors should explain why t = 0.5 was chosen for both the 2nd and the 3rd layers.

Refs:
[1] Kimon Fountoulakis, Amit Levi, Shenghao Yang, Aseem Baranwal, and Aukosh Jagannath. Graph Attention Retrospective. JMLR 2023.
[2] Adrián Javaloy, Pablo Sanchez-Martin, Amit Levi, Isabel Valera. Learnable Graph Convolutional Attention Networks. ICLR 2023.

问题

Please see my comments above.

评论- Reply to reviewer pv7Y (Part 1/2): Comparison with [2]

2024-11-23

First, we sincerely thank the reviewers for their valuable suggestions. We noticed that the first three weaknesses identified are related to the comparison of our work with previous studies, particularly with [2], which was not discussed in our original manuscript. We would like to address this issue in detail here.

We appreciate the reviewer bringing this to our attention, as we had not identified this paper prior to submission. After carefully reading it, we acknowledge that there are some similarities in research motivation. However, there are significant differences in the research process and final results, which we outline as follows:

While both our work and [2] highlight that GAT does not always outperform GCN, the focus of [2] is on when introducing graph convolution before GAT is beneficial and designing a learnable graph attention layer to determine whether such a preprocessing step is necessary. Specifically, [2] concludes that when structural noise is low, introducing graph convolution before GAT improves overall classification performance by expanding the regime where perfect node classification is achievable. However, it does not explicitly address when GCN outperforms GAT or vice versa.

In contrast, our work innovatively designs a graph attention mechanism that enables precise computation of node feature distributions after the GAT layer (Theorem 2). This allows us to accurately discuss the conditions under which GCN or GAT performs better. We define two types of noise to characterize these regimes more intuitively. Our findings indicate that when structural noise is low and feature noise is high, GCN can outperform GAT even without using graph attention layers. Interestingly, this explains some results in Figure 2c of [1], specifically that "under low structural noise, GCN outperforms MLP-GAT," which was not discussed in [1]. Our conclusions can be viewed as a synthesis of [1] and [2], offering a more precise analysis and explanation.
Our paper further analyzes the impact of multi-layer GAT on the over-smoothing problem (Theorem 3), which we are grateful the reviewer recognized. We emphasize that this analysis is also dependent on our ability to precisely compute the node feature distributions after the GAT layer (Theorem 2), underscoring its importance once again.
Regarding the comparison between Theorem 4 in our paper and Corollary 2 in [2], we would like to clarify that Corollary 2 is arguably not "much stronger" than our result; rather, the two have their respective strengths. Specifically, when $\frac{\log^2 n}{n} \ll p, q \ll \frac{\log^k n}{n}$ , where $k$ is any constant, our Theorem 4 provides stronger results than Corollary 2 in [2]. When $p, q = \Theta(1)$ as mentioned by the reviewer,, the conclusion in Corollary 2 of [2] is stronger. In summary, our results are more advantageous for sparser graphs.

Finally, we would like to clarify that we have added a citation to [2] in the revised version of the paper, along with a corresponding summary.

评论- Reply to reviewer pv7Y (Part 2/2): Regrading to minor questions

2024-11-23

Thank you for pointing this out. We have addressed and corrected it in the revised version.
We appreciate your comment. According to Lemma 5.1 in [3], for an $m$ -dimensional CSBM (where the covariance matrix of the $m$ -dimensional features is a scalar matrix), it can be transformed into a one-dimensional censored CSBM through a linear transformation. This transformation can be implemented by a GCN or GAT layer. Based on this lemma, for high-dimensional features, we can first reduce them to a one-dimensional CSBM before further analysis. It is worth noting that directly computing the exact distribution of high-dimensional features after passing through a graph attention layer poses certain challenges, and we leave this analysis as future work.
Thank you for your suggestion; we have made the necessary corrections.
Thank you for pointing out this typo; it has been corrected.
This is an excellent question. In Appendix J, we used GCN layers for the first $L$ layers for simplicity in the proof. In practice, based on the findings in Theorem 2, we infer that the graph attention mechanism can provide certain benefits when the structural noise and feature noise are comparable, but it requires a smaller $t$ (see Remark 3 for details). This explains why we set $t = 0.5$ for the second and third layers in our experiments in Line 493.

[3] Wang R, Baranwal A, Fountoulakis K. Analysis of Corrected Graph Convolutions. NeurIPS 2024.

评论- A Gentle Reminder

2024-11-25

Dear Reviewer pv7Y,

Thank you for your insightful comments and suggestions on our submission. We have been actively working to address your feedback during the rebuttal phase and greatly value your input in improving our work.

We noticed there hasn't been a follow-up response to our replies or clarifications. If there are any further concerns or suggestions, we would be more than happy to address them before the rebuttal period concludes.

Thank you for your time and effort in reviewing our work. We truly appreciate your help in this process and look forward to hearing back from you.

审稿意见

评分: 3置信度: 32024-11-04

This paper provides a theoretical study of graph attention mechanisms (GATs) through the lens of Contextual Stochastic Block Model (CSBM) on node classification tasks. It presents an analysis of when structure noise outweighs feature noise, graph attention mechanisms offer an advantage over simple graph convolutional operations. Additionally, the paper explores well-known over-smoothing problem and demonstrates that multi-layer GATs can mitigate this problem under certain condition related to signal-to-noise ratio (SNR). Lastly, the authors proposes a multi-layer GAT architecture that achieves better performance of perfect node classification with relaxed SNR requirement

优点

The paper investigates when graph attention mechanisms (GATs) outperform simpler graph convolutional networks (GCNs) in node classification tasks by analyzing the balance between two types of noise: structure noise and feature noise. Specifically, the authors identify that GATs are beneficial when structure noise is higher than feature noise. It, to certain extend, provides an actionable rule-of-thumb that the decision to use GATs or GCNs should be informed by the relative levels of structure and feature noise in the graph data.

The paper explores the over-smoothing problem, a well-known challenge in graph neural networks (GNNs) where increasing the network depth leads to node representations becoming indistinguishable. It provides a refined definition of over-smoothing in GNNs, incorporating a formal measure of node similarity. The authors argue that GATs can mitigate over-smoothing, especially in high signal-to-noise ratio (SNR) scenarios. This structured approach to studying over-smoothing adds to the understanding of how attention-based models can maintain informative node representations over deeper layers. The paper also presents a synthetic experiment regarding the claim

缺点

Much of the paper builds directly on the study by Fountoulakis et al. (2023), which also analyzed graph attention mechanisms in noisy settings. While the paper provides an extension to multi-layer GATs and refines SNR requirements, these contributions are largely incremental and do not significantly advance the foundational insights established in previous work.

The paper’s reliance on the Contextual Stochastic Block Model (CSBM) framework and assumptions limits its applicability to real-world graphs, which often have more complex and varied structures. This strong dependence on CSBM makes it challenging to generalize the findings to other types of graph data, reducing the paper's practical value.

The experimental section is relatively narrow, relying heavily on synthetic data generated from CSBMs and only including three standard, small real-world datasets (Cora, Citeseer, and Pubmed). The lack of diverse and larger datasets limits the empirical validation of the findings and raises questions about their robustness in more complex, real-world settings.

问题

Given that graph attention mechanisms (GATs) are no longer state-of-the-art in graph deep learning, how do you see the practical relevance of your findings for more recent models, such as Graph Transformers or advanced message-passing networks? Could your theoretical insights be extended or adapted to these contemporary architectures?

How would you suggest practitioners apply your findings in real-world scenarios, where graphs often do not conform to CSBM and the noise characteristics may be less controlled or well-defined? Are there specific graph properties or types of datasets where your approach would be most applicable?

Perfect node classification, as discussed in your paper, is a rigorous but often impractical benchmark since real-world graph data rarely allow for flawless classification, especially in noisy settings. Could you clarify how your findings on perfect node classification translate to more realistic, imperfect classification tasks? Are there insights from your work that could help improve performance on standard metrics used in practical graph learning applications?

The experimental section primarily focuses on synthetic CSBM data and three small real-world datasets. Given the assumptions in your theoretical analysis, have you considered evaluating your approach on larger and more complex datasets or on datasets with varied noise characteristics? How do you anticipate your findings will generalize to such settings?

评论- Reply to reviewer AYEX (Part 2/2)

2024-11-20

Regrading questions:

1. Thank you for raising this point. Our study focuses on the mechanisms of graph attention, which are also used in many SOTA models, including the graph transformers mentioned by the reviewer. Thus, our analysis provides valuable insights for these models as well. A more targeted analysis of specific models is left for future work.

2. We explained the purpose of using the CSBM model in our response to Weakness 2. In practice, every graph dataset contains structural information (e.g., whether it has clear community structures) and feature information (e.g., whether the features of nodes from different classes are clearly distinguishable). By estimating the strength of these two aspects, one can decide whether to use graph attention mechanisms.

3. Perfect node classification is mainly a theoretical metric, less commonly used in practice, but it is strongly correlated with standard metrics like classification accuracy. The easier it is to achieve perfect node classification, the higher the accuracy, and vice versa.

In our experiments, we used classification accuracy on real-world datasets, not perfect node classification, and the results confirm the practical relevance of our analysis.

4. As a theory-focused study, our primary goal is to provide insights and theoretical explanations for certain phenomena, with less emphasis on the experimental aspect. Nevertheless, we have designed experiments to validate all our theoretical results on three representative real-world datasets, which are commonly used in GNN theoretical studies. We believe this is sufficient to support the conclusions presented in our paper.

If the reviewer anticipates that large-scale datasets might affect our results, we are open to adding relevant experiments in future versions.

Finally, we sincerely appreciate the reviewers' valuable feedback and warmly invite them to continue engaging in discussion with us.

评论- Reply to reviewer AYEX (Part 1/2)

2024-11-20

Regarding weaknesses:

1. We sincerely thank the reviewer for raising this point. While our work is inspired by Fountoulakis et al. (2023), we must emphasize that it is far from a straightforward extension in either theoretical or applied aspects. It introduces numerous novel proof challenges, techniques, perspectives, and discoveries. Below, we provide a detailed explanation:

(i) Consideration of Noise: Fountoulakis et al. (2023) focuses solely on structural noise and examines the performance of graph attention mechanisms under different levels of structural noise. As a result, it only addresses scenarios where graph attention mechanisms are beneficial. However, the broader and more critical question of whether graph attention mechanisms are always effective remains unanswered in their work. This question is of great importance from both theoretical and practical perspectives.

In our paper, we formalize two types of noise to tackle this question. We identify specific regimes where graph attention mechanisms work effectively and others where they do not. These findings offer practical guidance on when to use graph attention mechanisms and when to avoid them.

Another interesting point is that, our theoretical analysis even offers an explanation for a phenomenon observed in the experimental results of Fountoulakis et al. (2023), particularly in Figure 2c. It explains why, under low structural noise (i.e., when $p-q$ is large), GCN outperforms MLP-GAT. This experimental result was not discussed in their paper, likely because their theoretical framework could not account for it.

(ii) Extension to Multi-layer Networks: As noted by the reviewer, another contribution of our work is extending the analysis from single-layer GATs to multi-layer GATs. We must emphasize that this extension is non-trivial and involves several theoretical challenges.

First, we had to redesign a graph attention mechanism that is both simple and effective—amenable to theoretical analysis while still representative of the core functionality of most graph attention mechanisms. The attention mechanism in Fountoulakis et al. (2023) is overly complex and cannot be analyzed in the multi-layer setting. Details of this comparison can be found in Sections Section 3.1 and Appendix B.

Second, multi-layer analysis demands precise derivation of node feature distributions, unlike the rough estimations (e.g., sign determination) in single-layer analysis. Notably, the graph attention mechanism is nonlinear, meaning that the node features after passing through a GAT layer no longer follow a Gaussian distribution. This makes deriving their expressions and analyzing their magnitude particularly challenging. In our paper, simply determining the precise distribution of node features after one GAT layer occupies pages 17 to 32 of the appendix. This part, rich in proof techniques and challenges, forms the core of our theoretical analysis.

Finally, we note that extending from single-layer to multi-layer analysis is a well-known challenge in deep learning theory, with related works published in top venues like FOCS, COLT, and JMLR. Below are some examples:

[1] Chen, Sitan, Adam R. Klivans, and Raghu Meka. "Learning deep ReLU networks is fixed-parameter tractable." FOCS 2022.
[2] Chen, Sitan, et al. "Learning narrow one-hidden-layer ReLU networks." COLT 2023.

(iii) Over-smoothing Problem: Another key contribution of our work is analyzing the impact of multi-layer GATs on the over-smoothing problem. Over-smoothing is a significant challenge for graph neural networks in practical applications, and the question of whether graph attention mechanisms alleviate this issue, and to what extent, remains open.

Our theoretical analysis shows that with sufficiently large SNR, multi-layer GATs can address the over-smoothing problem that GCNs cannot resolve for CSBM-generated graphs. This has significant practical implications:

Graph attention mechanisms can positively mitigate the over-smoothing problem.
The effectiveness of graph attention mechanisms in addressing over-smoothing depends on the SNR of the graph data. The higher the initial SNR, the more pronounced the role of the graph attention mechanism.

2. We use the CSBM model to simulate real-world graph data and perform theoretical analysis. As noted in the related works section, random graph models like CSBM are widely used in GNN theory.

Regarding the reviewer's mention of "more complex" and "varied structures," we clarify that the CSBM model is highly flexible, with numerous variants and adjustable parameters to capture different types of real graphs. For simplicity, this paper focuses on a basic version.

If the reviewer could specify the "complex" properties of the graph data, we would be happy to discuss whether a CSBM variant can capture these characteristics.

3. See point 4 in the response to "questions."

评论- A Gentle Reminder on Pending Feedback

2024-11-25

Dear Reviewer AYEX,

Thank you for your valuable feedback and constructive comments on our submission. We have carefully addressed each of your questions and provided detailed responses to clarify our methodology, highlight the key contributions of our work, and discuss the challenges we tackled.

In particular, we conducted a thorough comparison with prior work Fountoulakis et al. (2023) and emphasized three key contributions of our approach:

A more detailed definition and explicit consideration of two types of noise, which provides a deeper understanding of the problem;
An extension of the framework to multi-layer networks, significantly enhancing its applicability;
A novel analysis of the over-smoothing issue.

These contributions demonstrate that our work is far from a straightforward extension of existing methods. Instead, it addresses substantial challenges and incorporates non-trivial innovations that advance the state of the art.

We greatly value your input and would sincerely appreciate any further thoughts or feedback you may have before the rebuttal phase concludes. Your perspective is crucial in helping us refine our work and ensure that our contributions are clearly communicated.

Thank you again for your time and effort in reviewing our submission. We look forward to your response.

AC 元评审

2024-12-20

The authors present a theoretical exploration of the conditions under which graph attention mechanisms (GATs) are beneficial for node classification within the Contextual Stochastic Block Model (CSBM) framework, specifically in terms of feature and structural noise. The paper also examines when GATs can mitigate the over-smoothing problem.

The reviewers appreciated the motivation, presentation, and theoretical rigor of the submission.

However, several concerns were raised:

The work builds on prior studies, particularly Fountoulakis et al. (2023), and some reviewers felt the new contributions were incremental relative to this earlier work.
The reliance on the CSBM framework limits the generalizability of the findings to more complex real-world graphs, raising questions about their practical applicability.
The analysis does not sufficiently extend to more recent or advanced graph attention mechanisms.
The proposed attention mechanism could be compared more comprehensively—both theoretically and empirically—to other advanced (multi-layer) attention mechanisms.

Considering the reviews and the provided rebuttals, the meta-reviewer agrees with the reviewers that there is room for improvement in the areas highlighted by the reviewers. Therefore, the meta-reviewer recommends that the authors revise the paper to address these concerns and submit it to a future conference, rather than accepting it in its current form.

审稿人讨论附加意见

Despite the detailed rebuttals and encouragement for further engagement, reviewer participation during the discussion phase remained limited. The meta-reviewer carefully examined the authors's rebuttals alongside the initial reviews and made a decision based on both.

最终决定Reject

2025-01-22

Reject