PaperHub
4.5
/10
Rejected4 位审稿人
最低3最高5标准差0.9
5
5
3
5
3.5
置信度
正确性2.3
贡献度2.0
表达2.8
ICLR 2025

Scalable Message Passing Neural Networks: No Need for Attention in Large Graph Representation Learning

OpenReviewPDF
提交: 2024-09-28更新: 2025-02-05

摘要

关键词
Message Passing Neural NetworksGraph Representation LearningLarge Graphs

评审与讨论

审稿意见
5

In this paper, the authors propose a new Scalable Graph Convolution Network. The author develops a scalable message passing block which involves a residual connection that connects two parts. The first part is a GCN block, and the second part is a point-wise feed forward layer as in transformer. This architecture retains computation efficiency. Meanwhile, the author illustrates how their method solves the over smoothing problem. The original graph convolution is not a universal approximator. However, with a residual connection, the graph convolution can turn to a universal approximator. And the authors provide extensive experiment results to prove the efficiency of their method.

优点

Pros:

1: The motivation of this paper is clear. The author adapts the transformer architecture to address the scalability issue in Graph Neural Networks.

2: The author presents both theoretical analysis and experimental results for their methods, offering a comprehensive approach.

3: The authors provide extensive experiments on different dataset and relevant ablation study to prove the effectiveness of their method.

缺点

Cons:

1: The SMPNN can maintain its performance but does not gain any advantages from a deeper network. What causes this?

2: For larger graph datasets, the improvement from SMPNN is less pronounced. For example, on the ogbn-papers-100M dataset, the improvement is only 0.2%. Could this suggest that the model size is still inadequate? If we use a larger network for both SMPNN and Graph Transformer, SMPNN should experience less oversmoothing and demonstrate a more significant accuracy improvement.

问题

Please see weakness

审稿意见
5

This paper proposes Scalable Message Passing Neural Networks (SMPNNs), a deep, scalable GNN framework that omits attention in favor of local message-passing, achieving efficient performance on large graphs. It claims to outperform transformer-based models while avoiding issues like over-smoothing.

优点

  1. The paper provides solid theoretical support, particularly on residual connections and universal approximation, which strengthens the SMPNN design and its claims.

  2. By replacing attention with scalable message-passing, SMPNN achieves efficient performance and good experiment results on large graphs, offering a notable advancement for scalable GNN applications.

缺点

  1. In Section 3.2, the authors label their proposed block as a "transformer block." However, the SMPNN framework lacks any attention mechanism, which is a core component of transformers. Consequently, SMPNN functions more like a deep GCN with residual connections rather than a genuine transformer model. This categorization is misleading, as attention mechanisms fundamentally distinguish transformers by enhancing scalability and representation capacity in large graph models. Existing models such as GCNII [1], EGNN [2], and DeeperGCN [3] have already explored architectures with enhanced depth. Although these models improve scalability, they are still linear and inherently limited compared to transformers.

  2. Table 1 mentions that the SMPNN has training time complexities comparable to other transformer-like models, such as GraphGPS and Exphormer. However, the experiments do not include comparisons with these transformer-based models, especially models with higher computational complexity, like Graphormer, that might showcase different trade-offs in performance versus scalability. While SMPNN is intended to scale to larger graphs, demonstrating performance across various data scales, especially small to medium datasets, would test its broader applicability.

  3. SMPNN relies on local message-passing operations without incorporating global attention mechanisms, which may limit its ability to capture long-range dependencies effectively. For tasks where global context is essential, SMPNN's performance could be suboptimal compared to transformer-based models that use global attention mechanisms. A discussion on this limitation and potential strategies would strengthen the paper.

[1] Chen, Ming, et al. "Simple and deep graph convolutional networks." International conference on machine learning. PMLR, 2020. [2] Zhou, Kaixiong, et al. "Dirichlet energy constrained learning for deep graph neural networks." Advances in Neural Information Processing Systems 34 (2021): 21834-21846. [3] Li, Guohao, et al. "Deepergcn: All you need to train deeper gcns." arXiv preprint arXiv:2006.07739 (2020).

问题

same to weakness

审稿意见
3

This paper proposes to substitute the multi-head attention in graph transformers with pure message passing. The authors provide theoretical analysis to prove that combined with residual connection, graph transformers with pure message passing can achieve universal approximation and alleviate over-smoothing.

优点

  • The proposed method is simple.

  • The paper focuses on over-smoothing, an important problem in graph learning.

  • The paper provides both empirical and theoretical analyses.

缺点

  • The contribution of this paper is weak. The main focus is replacing the attention module in the transformer with a message-passing module and using the residual connections to alleviate the over-smoothing problem. However, the use of residual connections to address over-smoothing has already been explored in DeepGCN[1], which this paper does not mention or compare. Additionally, the implementation of deep GNNs has been studied in several other works[1]-[4].

  • The paper contains a substantial amount of repetitive descriptions of existing work, such as message-passing neural networks and GCN (Eq. 2/3 vs. Eq. 6). Additionally, the theorem introduced in Section 4.1 is neither utilized later in the paper nor a contribution of this work.

  • The contributions of the paper also involve repetitive work, such as a) using residual connections to mitigate over-smoothing; b) based on reference [5], Theorem 4.4 is evidently valid.

  • Empirical analysis lacks heterophilic benchmarks to validate the effectiveness of SMPNN on over-smoothing.

  • In tab. 7, SMPNN achieves the best performance on ogbn-proteins with 12 layers. Does SMPNN require significantly more computational resources and parameters to achieve comparable or better performance to baseline models?

[1] DeepGCNs: Can GCNs Go as Deep as CNNs? ICCV'19

[2] Graph Convolutional Networks via Initial residual and Identity Mapping, ICML'20

[3] Training Graph Neural Networks with 1000 Layers, ICML'21

[4] Revisiting Heterophily For Graph Neural Networks, NeurIPS'22

[5] How Powerful are Graph Neural Networks? ICLR'19

问题

Please refer to weaknesses.

审稿意见
5

The paper proposes Scalable Message Passing Neural Networks (SMPNNs), a framework designed to scale traditional message-passing GNNs. By incorporating residual connections, it avoids the issue of oversmoothing. The method claims to work without the need for computationally and memory-intensive attention mechanisms.

优点

The paper is well-written with a clear motivation. The methodology is easy to follow, and the experiment section is well-structured.The paper is well-written with a clear motivation. The methodology is easy to follow, and the experiment section is well-structured.

缺点

The experiments are not strong enough. For instance, all SMPNN variants should be included consistently across tables and figures in the section. The same applies to baselines, unless there are reasonable explanations for exclusions. Additionally, the distinctions among SMPNN variants and the strengths of SMPNN are not clear. For example, SMPNN uses significantly more GPU memory than SGFormer, yet the paper still claims that it does not use more memory than the baselines. Some presentation needs to modified, for example: FF notation needs to be written consistently with the rest of the paper.

问题

  • See weaknesses.
  • If SMPNN w/o FeedForward uses significantly less GPU memory and performs better in accuracy than SGFormer, why is it not considered the main variant?
  • The difference between Figure 2 and Figure 3 is unclear.
  • For the SMPNN variant with attention (detailed in the Appendix), how does it differ from GAT (Veličković et al., 2017)?
  • A real-time per-epoch runtime analysis should be included alongside the big-O analysis in the paper.
  • Could you clarify why SMPNN, which uses message passing, primarily compares against an attention-based baseline?
AC 元评审

This paper proposes a simple yet effective architecture of message-passing-based Graph Neural Network which replaces the attention layers in Transformers into standard convolutional message passing. The performance matches the versions which use attentions. While the method is simple and effective, the method itself lacks novelty. For example, as reviewer Hv1b has raised, the use of residual connections to address over-smoothing has already been explored in DeepGCN, which this paper does not mention or compare. The experiments also need some enhancements, e.g., all SMPNN variants should be included consistently across tables and figures in the section. Unfortunately no rebuttal was given to address these reviewer concerns and therefore I recommend rejection.

审稿人讨论附加意见

No rebuttal was given.

最终决定

Reject