5.0

/10

Rejected4 位审稿人

最低3最高6标准差1.2

4.0

置信度

正确性2.8

贡献度2.5

表达3.0

ICLR 2025

Rethinking the Expressiveness of GNNs: A Computational Model Perspective

Guanyu Cui,Zhewei Wei,Hsin-Hao Su

OpenReview PDF

提交: 2024-09-13更新: 2025-02-05

摘要

Graph Neural Networks (GNNs) are extensively employed in graph machine learning, with considerable research focusing on their expressiveness. Current studies often assess GNN expressiveness by comparing them to the Weisfeiler-Lehman (WL) tests or classical graph algorithms. However, we identify three key issues in existing analyses: (1) some studies use preprocessing to enhance expressiveness but overlook its computational costs; (2) some claim the limited power of the identical-feature WL test while enhancing expressiveness using distinct features, thus creating a mismatch; and (3) some characterize message-passing GNNs (MPGNNs) with the CONGEST model but make unrealistic assumptions about computational resources, allowing $\textsf{NP-Complete}$ problems to be solved in $O(m)$ depth. We contend that a well-defined computational model is urgently needed to serve as the foundation for discussions on GNN expressiveness. To address these issues, we introduce the Resource-Limited CONGEST (RL-CONGEST) model, incorporating optional preprocessing and postprocessing to form a framework for analyzing GNN expressiveness from an algorithmic alignment perspective. Our framework sheds light on computational aspects, including the computational hardness of hash functions in the WL test and the role of virtual nodes in reducing network capacity. Additionally, we suggest that high-order GNNs correspond to first-order model-checking problems, offering new insights into their expressiveness.

关键词

Graph Neural NetworksExpressive PowerComputational ModelWeisfeiler-Lehman Test

评审与讨论

审稿意见

评分: 6置信度: 42024-10-27

In this paper, the authors first explain the limitations and unrealistic assumptions of several current approaches in analyzing the expressive power of GNNs, including underestimated preprocessing time, anonymous WL tests with non-anonymous features, and unrealistic assumptions in the CONGEST model. Next, the authors propose the RL-CONGEST model to address these issues. Several results are derived: (1) GNNs require substantial width and depth to simulate the WL test; (2) virtual nodes can help reduce computation costs, although they do not improve theoretical expressive power; (3) the RL-CONGEST model can solve the PNF model-checking problem with $k$ -WL graph transformation in $O(k^2)$ rounds.

优点

The paper is well-structured and nicely presented.
The stated limitations of existing approaches make sense to me, and the examples are intuitive.
The new results derived by the RL-CONGEST model are interesting.

缺点

My main concern is about the practical implication of the proposed model beyond what the author presented.

One question is how we can use the RL-CONGEST model to effectively estimate and compare the representational power of different GNN variants or even predict their performance in real-world applications.
The authors claim that the proposed framework can be used for analyses involving non-anonymous node features. I wonder how this framework can be leveraged to truly evaluate differences between various added features, such as SPD or resistance distance. In my view, although the broken symmetry introduced by these additional features is undoubtedly a source of improved expressivity, different features have varying degrees of power; some can help count more complex graph structures than others.

问题

See above.

评论- Initial Response to Reviewer b62D (1/2)

2024-11-15

Dear Reviewer b62D,

Thank you for taking the time to review our paper. We would like to address your concerns as follows:

W1:

Our primary goal is to reveal limitations in current analyses of GNNs' expressive power and to introduce a new analytical approach that addresses these issues, rather than to develop a specific GNN model with enhanced performance or expressiveness. Specifically, as demonstrated in Theorems 5-8, we leverage the RL-CONGEST framework to provide a more reasonable evaluation of GNNs' expressive power on simulating one iteration of the WL test. Additionally, in Section 5, we also propose open questions that may be investigated within the RL-CONGEST framework. It is important to note that our RL-CONGEST framework is designed to assess a model's expressive power in executing algorithmic tasks or achieving "algorithmic alignment", rather than to predict its quantitative performance on learning tasks such as node classification.

W2:

Yes, the second half of your question, "different features have varying degrees of power; some can help count more complex graph structures than others", precisely reflects what we aim to convey. Under the non-anonymous setting, CONGEST and MPGNNs can exhibit greater expressiveness than the anonymous WL test. Our main argument in Section 3.2 is that while existing works claim their models' expressiveness advantage by proving they can perform tasks beyond the WL test's scope, this approach is questionable. Equating anonymous WL with MPGNNs, as previous works have done, is not entirely reasonable, and consequently, concluding that MPGNNs are weak because the WL test is weak is also debatable. In fact, MPGNNs can perform certain algorithms (such as solving edge biconnectivity in $O(D)$ rounds within the CONGEST model [Pritchard, 2006]).

Our logical flow is as follows:

Numerous studies claim that the vanilla WL test has limited expressive power --- a claim that we affirm, as discussed in Figure 2. However, the appropriateness of using the anonymous WL test to characterize MPGNNs is debatable, given that real-world graphs often contain rich features. Additionally, [Loukas, 2020] demonstrated that with unique IDs (and other assumptions), MPGNNs can perform a wide range of algorithmic tasks.
To address the "limited" expressiveness of MPGNNs (stemming from the limitation of the vanilla WL test), some works incorporate additional features (e.g., [Loukas, 2020]) to enhance the expressiveness of their proposed models. Nonetheless, as outlined in (1), the suitability of the anonymous WL test as a characterization for MPGNNs is questionable. Consequently, the practice in some studies of demonstrating the advantage of their model's expressiveness by proving it can perform algorithmic tasks beyond the WL test's capabilities may not be entirely valid. A more reasonable approach would be to compare these models with MPGNNs under a non-anonymous setting (as suggested in [Loukas, 2020]). Further, evidence from [Loukas, 2020; Suomela, 2013; den Berg et al., 2018; You et al., 2021; Abbound et al., 2021; Sato et al., 2021] suggests that the non-anonymous setting can enhance model expressiveness, highlighting a mismatch in works that argue for a "weak MPGNN" yet use additional features that break the anonymous setting in the WL test to improve expressiveness.
As you mentioned, "different features have varying degrees of power; some can help count more complex graph structures than others", our RL-CONGEST analysis framework can be applied in studies proposing new GNN variants that use additional features and claim the ability to perform certain algorithmic tasks, with the only requirement being a clear specification of the preprocessing time complexity of the features.

Additionally, points (1) and (2) highlight the need to reconsider the validity of comparing a proposed model's expressiveness directly with the vanilla WL test. We hope this discussion encourages the community to more accurately assess existing results on GNNs' expressiveness.

Thank you again for reviewing our paper, and we are looking forward to any further discussions with you.

评论- Initial Response to Reviewer b62D (2/2)

2024-11-15

References:

[Pritchard, 2006] David Pritchard. An Optimal Distributed Edge-Biconnectivity Algorithm. arXiv 2006.

[Loukas, 2020] Andreas Loukas. What Graph Neural Networks Cannot Learn: Depth vs Width. ICLR 2020.

[Zhang et al., 2023] Bohang Zhang, Shengjie Luo, Liwei Wang, and Di He. Rethinking the Expressive Power of GNNs via Graph Biconnectivity. ICLR 2023.

[Xu et al., 2019] Keyulu Xu*, Weihua Hu*, Jure Leskovec, Stefanie Jegelka. How Powerful are Graph Neural Networks? ICLR 2019.

[Suomela, 2013] Jukka Suomela. Survey of Local Algorithms. ACM Computing Surveys (CSUR), 45(2):24, 2013.

[den Berg et al., 2018] Rianne van den Berg, Thomas N Kipf, and Max Welling. Graph Convolutional Matrix Completion. KDD 2018.

[You et al., 2021] Jiaxuan You, Jonathan M Gomes-Selman, Rex Ying, and Jure Leskovec. Identity-aware Graph Neural Networks. AAAI 2021.

[Abbound et al., 2021] Ralph Abboud, Ismail Ilkan Ceylan, Martin Grohe, and Thomas Lukasiewicz. The Surprising Power of Graph Neural Networks with Random Node Initialization. IJCAI 2021.

[Sato et al., 2021] Ryoma Sato, Makoto Yamada, and Hisashi Kashima. Random Features Strengthen Graph Neural Networks. SDM 2021.

2024-11-20

Thank the authors for the detailed response to my concerns. I have some follow-up questions.

RL-CONGEST framework is designed to assess a model's expressive power in executing algorithmic tasks or achieving "algorithmic alignment"

I am a little confused about this. First, I don't find such a statement in the paper. I assumed that the authors were talking about the general expressivity of GNNs and their downstream performance.

I quite agree with the authors for the arguments they made in the paper: (1) hidden precomputation time; (2) constrained analysis on anonymous WL test. However, I agree with them mainly from a more general perspective. If we confine the discussion to algorithmic tasks, things become different. First, most papers discuss the expressiveness of GNNs in the general setting: to improve the performance of GNNs on downstream tasks. However, downstream tasks not only contain algorithmic tasks. For example, GD-WL [1] forms the story from the bi-connectivity problem, which can be solved with less complexity than computing resistance distance. However, the GD-WL can be used to approximate many other graph properties or count important substructures [2, 3], which is crucial for downstream tasks and may not be done with an algorithm of less complexity.

Back to my original question (W1), I think the authors did a great job of formulating all these conclusions in the paper and I do find many conclusions interesting and original. However, when I start to think about these conclusions from a broader perspective, for example, whether these conclusions can help me gain more insight into evaluating or comparing existing GNNs, or can help me design new expressive GNNs, these conclusions seem limited. That is said, the GNNs are eventually designed for solving real-world problems like node classification or graph classification, which is much more boring than algorithmic tasks.

anonymous WL vs MPNN + ID

I think the authors are totally right in the statement that: by equipping MPNN with non-anonymous node features, MPNN can solve many algorithmic tasks that previous literatures claim not. However, I think the discrepancy here is still the scope of the discussion. Most existing works use anonymous WL tests as a tool because they want to make sure the result expressive GNNs are still permutation invariant and equivariant. Without this assumption, the result GNNs cannot have good performance on real-world tasks. It's true that given a unique ID, MPNN can solve many algorithmic tasks, but it cannot transfer to real-world tasks. For example, [4] injects random features into MPNN to improve expressiveness but many follow-up experiments actually show it achieves bad performance in real-world tasks.

Back to my original question, what I really want to ask is that: additional features can improve the expressive power of MPNN by (1) breaking symmetry and leveraging message passing to learn on that and enhance performance; (2) directly adding additional knowledge about graph structures. It the proposed framework quantitively or qualitatively analyze the portion of these two parts given node features? Or, whether the proposed model can be used to analyze the effect of node features in real-world datasets for the expressiveness of MPNN?

I still hold a positive perspective on the paper. But the above concerns somehow prevents me from further increasing my score.

Reference

[1] Zhang, Bohang, et al, Rethinking the Expressive Power of gnns via Graph Biconnectivity, ICLR23.

[2] Zhang, Bohang, et al, A Complete Expressiveness Hierarchy for Subgraph GNNs via Subgraph Weiisfeiler-Lehman Tests ICML23.

[3] Zhang, Bohang, et al, Beyond Weisfeiler-Lehman: A Quantitative Framework for GNN Expressiveness ICLR24.

[4] Sato, Ryoma, et al, Random Features Strengthen Graph Neural Networks, SDM21.

2024-11-21

Dear Reviewer b62D,

Thank you for your continued discussion and for maintaining an overall positive perspective on our work. We would like to further clarify our ideas and address your remaining questions.

On the "RL-CONGEST framework is designed to ..." part:

In the Introduction, we aimed to convey our idea of analyzing GNN expressiveness from the perspective of performing algorithmic tasks or algorithm alignment by surveying related works that use WL tests or other algorithmic tasks to evaluate GNNs. To make this point clearer, we have added a one-sentence explicit description in the abstract (Line 24, highlighted in blue) in our revised manuscript.

On your "General Perspective":

We respectfully disagree with your assertion due to inconsistencies in your position. It confuses us that you cite these papers as examples of what you wish our results to achieve, yet their expressiveness analysis is also limited to algorithm alignment, with downstream task results being purely empirical.

In the works you cited [1-3], the authors first theoretically analyze model expressiveness by evaluating whether the models can perform WL tests, biconnectivity decision, or subgraph counting (all algorithmic tasks, e.g., WL tests correspond to graph isomorphism tests, while the other two are direct algorithmic tasks). They then empirically evaluate the models' performance on downstream tasks. Notably, their theoretical analysis of expressiveness is also limited to the algorithmic tasks the models can perform, without guaranteeing real-world performance. Our focus aligns with this theoretical aspect of expressiveness. Besides, there are many well-known purely theoretical papers on the expressiveness of $k$ -WL tests from an algorithmic alignment perspective, such as [Cai et al., 1989; Grohe, 1998; Grohe, 2017].

You agree that analyzing expressiveness by examining the algorithmic tasks models can perform is valid, as demonstrated by your citation of [1-3]. This is precisely what we have done—evaluating models' expressiveness through the algorithmic tasks they can perform.

On "whether these conclusions can help me ... design new expressive GNNs":

Loukas has already shown that MPGNNs can compute any computable problem if nodes are provided sufficient computational resources. Therefore, designing more expressive GNNs should prioritize enhancing the expressiveness of the update function rather than pursuing higher levels in the WL hierarchy. For instance, researchers might explore replacing MLPs with LLM agents empowered with CoTs, which are claimed to have the expressiveness of the $\mathsf{P}$ class, rather than relying on feature precomputation.

On your concern about breaking equivariance or invariance:

Reviewer YAM3 raised a similar concern, which we respectfully disagree with. Providing unique identifiers does not inherently break equivariance or invariance. Our RL-CONGEST framework allows nodes to know their IDs but does not enforce their use as features, ensuring flexibility. Consider the following points:

The RL-CONGEST model only requires nodes to have unique identifiers to ensure they are distinguishable. Researchers are free to analyze equivariance or invariance by permuting node IDs during experiments.
In practical implementations (e.g., PyG), nodes are typically assigned IDs to manage their features. This setting does not conflict with equivariance or invariance, as models can freely decide whether to use these IDs as input features. RL-CONGEST explicitly states that nodes can be uniquely identified, which aligns with practical implementations and does not impose stricter conditions.

As an example, consider Zhang et al.'s GD-WL test. Under a non-anonymous setting, RL-CONGEST can solve edge-biconnectivity if nodes have unique IDs. However, this result only assumes nodes are distinguishable and does not require a specific "canonical" ID assignment. If one ID assignment solves the problem, any permuted ID assignment would also work, preserving permutation-invariance.

We hope these clarifications address your concerns and provide further insight into the rationale behind our framework. Thank you again for your thoughtful engagement.

References:

[Cai et al., 1989] Jin-yi Cai, Martin Furer, and Neil Immerman. An optimal lower bound on the number of variables for graph identification. FOCS 1989.

[Grohe, 1998] Martin Grohe. Finite variable logics in descriptive complexity theory. Bull. Symb. Log., 4(4):345–398, 1998.

[Grohe, 2017] Martin Grohe. Descriptive Complexity, Canonisation, and Definable Graph Structure Theory, volume 47 of Lecture Notes in Logic. Cambridge University Press, 2017.

2024-11-22

To make this point clearer, we have added a one-sentence explicit description in the abstract (Line 24, highlighted in blue) in our revised manuscript.

Thanks for that.

General Perspective

I try to make my point more correctly. I am wrong in assuming the analysis in the mentioned paper is not an algorithmic task. But what I want to bring up is that these models like GD-WL are not only able to solve a particular algorithmic task, but they have been proven to be much more than that. The authors state that the "RL-CONGEST framework is designed to assess a model's expressive power in executing algorithmic tasks or achieving "algorithmic alignment"". I am wondering, how the RL-CONGEST framework is able to access that one model can achieve algorithmic alignment on all tasks a model can perform, in order to decide whether a particular model is not algorithmic alignment. I believe only if a model requires more complexity than all tasks it can perform, we are safe to state that the model is not algorithmic alignment.

Loukas has already shown that MPGNNs can compute any computable problem if nodes are provided sufficient computational resources.

I didn't go deep into this reference, but I believe to make it true, you still need to break the permutation invariance or assign a unique ID for GNN. However, the ultimate goal for GNNs or all other deep learning models is that: we want to use them to solve some real-world problems, by only training the model on the training set and hoping it can generalize to unseen samples. However, by breaking the permutation invariance or equivariance, the model is just not able to generalize well, compared to models that preserve the permutation [1].

Therefore, designing more expressive GNNs should prioritize enhancing the expressiveness of the update function rather than pursuing higher levels in the WL hierarchy.

Still, my opinion is that all theoretical models or results should finally be able to have implications in a practical way. Enhancing the expressiveness of the update function is indeed important, as shown in the paper from the theoretical view. However, empirical experiments still show that by continues improve the expressive power, (from MPNN [2] to subgraph GNN [3-4], finally to even more expressive GNNs [5]), we witness better and better results on the ZINC dataset. However, the comparison in [2] shows that by simply varying the architecture of MPNN, the difference is marginal.

Our RL-CONGEST framework allows nodes to know their IDs but does not enforce their use as features, ensuring flexibility; Our RL-CONGEST framework allows nodes to know their IDs but does not enforce their use as features, ensuring flexibility

The point here is that whether we use unique ID or not in GNNs can have a significant impact on the downstream performance, which may imply the discrepancy between the theoretical model and real-world scenarios. Basically, a GNN that is trained on a specific ID assignment algorithm will not work if, in the test set, we use a different ID assignment algorithm or even if the graph distribution (like graph size) changes. Of course, we can permute the ID during the training. However, the model must see all $O(n!)$ different permutations to have appropriate generalization ability.

In practical implementations (e.g., PyG), nodes are typically assigned IDs to manage their features

The ID used in the code implementation is not the ID I am talking about. PyG uses node ID just to implement the MPNN algorithm. However, the permutation of the ID used in PyG will not result in a difference in the final computation result. However, premutate the ID in the input feature.

Or I can ask it in another way, given an RL-CONGEST model, how do you train an MPNN to solve connectivity problems using a train graph set and predict an unseen graph set with maybe a different graph distribution?

[1] Elesedy, Bryn, et al. “Provably Strict Generalisation Benefit for Equivariant Models”, ICML21.

[2] Dwivedi, Vijay, et al. "Benchmarking Graph Neural Networks", ArXiv.

[3] Zhang, Muhan, et al. "Nested Graph Nerual Networks", NeurIPS21.

[4] Zhao, Lingxiao, et al. "From Stars to Subgraphs: Uplifting Any GNN with Local Structure Awareness", ICLR22

[5] Feng, Jiarui, et al. "Extending the Design Space of Graph Neural Networks by Rethinking Folklore Weisfeiler-Lehman", NeurIPS23.

2024-11-24

Dear Reviewer b62D,

As a supplement, we have updated our PDF, replacing "anonymous" with "identical-feature" and "non-anonymous" with "distinct-feature" or "unique-feature" to make these concepts clearer and more accessible to readers. Additionally, we have included a discussion on the four common unique-feature settings in Section 3.2, highlighted in magenta.

Thank you.

2024-11-22

Dear Reviewer b62D,

Thank you for your kind reply. We believe there are two key points where we may not yet have reached a consensus, and we would like to further clarify our perspective.

On "node IDs" and "non-anonymity":

These terms refer to unique features that allow nodes to be distinguishable (e.g., $[n] = \\{0, 1, \cdots, n - 1\\}$ would also suffice). The distinct-feature setting is commonly applied in almost all existing models, as listed below:

LINKX [Lim et al., 2021]:

LINKX uses:

$\mathbf{H}^{(\mathbf{A})} = \mathrm{MLP}_{\mathbf{A}}(\mathbf{A})$
$\mathbf{H}^{(\mathbf{X})} = \mathrm{MLP}_{\mathbf{X}}(\mathbf{X})$
$\mathbf{Y} = \mathrm{MLP}(\sigma(\mathbf{W}[\mathbf{H}^{(\mathbf{A})}; \mathbf{H}^{(\mathbf{X})}] + \mathbf{H}^{(\mathbf{A})} + \mathbf{H}^{(\mathbf{X})}))$

The $\mathbf{H}^{(\mathbf{A})}$ term can be reformulated as $\mathrm{MLP}'(\sigma(\mathbf{A} \cdot \mathbf{I} \cdot \mathbf{W}))$ , which uses the identity matrix (unique node features).

GCN, GAT, etc., on real-world datasets:

These models often use real-world features, which are unique and distinguishable with high probability. Actually, models that are applicable to real-world datasets fall into this category.

GNN expressiveness works (e.g., [Loukas, 2020; Sato et al., 2021]):

These works use random features, which are unique with high probability. For example, assigning each node a feature randomly chosen from $[n^4]$ would result in distinct features with high probability.

GD-WL framework:

In the GD-WL framework by Zhang et al., resistance distances $R(s, t)$ are used as features. Since $R(s, t) = 0$ iff $s = t$ , each row of the resistance distance matrix is unique, creating distinguishable node features.

Actually, according to our theory, they are all capable of solving the biconnectivity problem using the unique features.

On whether unique features break equivariance or invariance:

Unique features do not break permutation equivariance or invariance. Instead, it is the properties of the update functions and pooling layers that determine whether a GNN model is equivariant or invariant. For example, in LINKX, when performing node classification, $\mathrm{MLP}(\mathbf{A})$ ensures permutation equivariance. To achieve permutation invariance for graph classification, we only need to add a permutation-invariant pooling layer after this step.

Similarly, consider Dijkstra's single-source shortest path algorithm. Unique IDs are used solely to determine whether the shortest path to a node has been found. The resulting shortest path distance vector is always permutation equivariant. This demonstrates that it is not the presence of unique IDs but rather the design of the update function that determines whether a GNN model is permutation equivariant or invariant.

We hope these clarifications address your concerns and further show the flexibility of our framework. Thank you again for your engagement and constructive feedback.

Reference:

[Lim et al., 2021]. Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods. NeurIPS 2021. [Sato et al., 2021] Ryoma Sato, Makoto Yamada, and Hisashi Kashima. Random Features Strengthen Graph Neural Networks. SDM 2021.

2024-11-26

The LINKX is only applicable to transductive settings, where we only have a single graph and do not require the model to generalize. At this time, by assigning each nodes a unique ID, the single MLP can achieve a "universal approximation" on any functions within this particular graph. If this is the unique ID you are referring, I think the statement becomes somehow meaningless. All GNN's expressiveness is analyzed with an assumption of the inductive setting, that is we want to train a GNN on some train graphs set and generalize it to unseen graphs with different sizes and graph distribution. And that's why the permutation invariant and equivariant are important.

I have a similar feeling to reviewer YAM3 that authors always try not to answer my concerns directly and ignore some of my questions. So I try to make my question even more direct:

Could the authors give me a concrete example of how to use RL-CONGEST model and distinct-feature to train an MPNN model that can solve the bi-connectivity problem with some training graphs and labels? How can the model generalize to unseen graph samples with different graph structures and size?

I believe if your example is reasonable, most of my concerns can be solved.

2024-11-27

Dear Reviewer b62D,

Thank you for your discussion. We now have a clear understanding of your main concern: a concrete construction of an RL-CONGEST model capable of solving the biconnectivity problem.

A Concrete RL-CONGEST Example for Edge-Biconnectivity

First, we would like to clarify that, as with most expressiveness results, our claims focus on existing results and impossibility results. Whether a real-world model can be trained to solve specific algorithmic tasks are out of the scope of our paper, and may depend on the flexibility and strength of the update functions.

Since the RL-CONGEST model uses distributed algorithms to characterize the message-passing process in GNNs, each distributed algorithm corresponds to an RL-CONGEST model. Here, we provide a sketch of constructing such a concrete RL-CONGEST model for edge-biconnectivity. The algorithm is designed by Pritchard [Pritchard, 2006], and we encourage reviewers to refer to Pritchard's slides (http://ints.io/daveagp/research/2006/ac-bicon.pdf) for visual aids and proofs of correctness.

Steps:

Build a spanning tree $T$ with the FLOOD algorithm rooted at node $0$ (Since nodes have unique features and are distinguishable, we can "rename" them as $\\{0, 1, \cdots, n-1\\}$ for the description):
Compute the number of descendants on $T$ $T$ :
- Step 2.1: The root node $0$ sends a message to its children: "Compute the number of descendants". This message propagates down the tree.
- Step 2.2: Leaf nodes determine their size as $1$ (here we define each node is its own descendant) and report this value to their parent. Internal nodes wait for responses from all their children, sum the values, add $1$ for themselves, and report the total to their parent.
Preorder (i.e., the label of a vertex is smaller than the label of each of its children) the nodes:
- Step 3.1: The root assigns itself label $1$ .
- Step 3.2: When node $v$ assigns itself label $x$ , it determines labels for its children $c_1, c_2, \cdots$ in some arbitrary order. For child $c_i$ , the label is computed as: $\ell_i = x + 1 + \sum_{j < I}\\#\text{desc}(c_j)$ .
Marking cycles (from this step, we refer to nodes by its preorder label.):
- Step 4.1: For a given non-tree edge $(u, v)$ , a message $M[u, v]$ is sent along the edge in both directions: "If you are an ancestor of both $u$ and $v$ , ignore this message. Otherwise, pass the message to your parent and mark the edge connecting you to your parent". A node $w$ checks the ancestry condition by verifying if $\\{u, v\\} \subseteq \\{w, w + 1, \ldots, w + \\#\text{desc}(w) - 1\\}$ .
- Step 4.2: Each node tracks the cumulative $\min u_i$ and $\max v_i$ of all $M[u_i, v_i]$ messages received.
- Step 4.3: Even if $v$ determines that its edge to its parent should not be marked, it sends a token message to its parent.
- Step 4.4: Once $v$ has received all non-to-parent edge messages, it sends a message to its parent.

After completing phases 1–4, the non-marked edges are bridges.

This also shows that in scenarios where distinct node features are available (as is common), enhancing the expressiveness of update functions would further enhance the expressiveness of GNNs.

On LINKX and the Inductive Setting

Actually, LINKX itself can be directly applied to the inductive setting, similar to GCN and GCNII (GCNII's paper include inductive learning experiments). The high-level idea is that the model learns a weight matrix $\mathbf{W}$ from a graph $\mathbf{A}_1$ and features $\mathbf{X}_1$ (or a training set of graphs and features) and then directly uses this learned representation for inference on new data. Although the authors of LINKX have not explicitly tested it in the inductive setting, two similar models—SA-MLP [Chen et al., 2024] and SymphoNEI [Kim et al., 2024]—both of which utilize $\mathrm{MLP}(\mathbf{A})$ , have showed effectiveness in inductive scenarios.

We hope these explanations provide clarity and address your concerns. Thank you again for your engagement.

Reference:

[Pritchard, 2006] David Pritchard. An Optimal Distributed Edge-Biconnectivity Algorithm. arXiv 2006.

[Chen et al., 2024] Jie Chen, Mingyuan Bai, Shouzhen Chen, Junbin Gao, Junping Zhang, and Jian Pu. SA-MLP: Distilling Graph Knowledge from GNNs into Structure-Aware MLP. TMLR 2024.

[Kim et al., 2024] Kyusik Kim, and Bongwon Suh. SymphoNEI: Symphony of Node and Edge Inductive Representations on Large Heterophilic Graphs. DASFAA 2024.

2024-11-30

I believe I tried my best to explain my question multiple times, but now I feel authors either don't have enough understanding of the related topic or are deliberately avoiding my central concern through sophistry and answering something that looks reasonable but actually not even close to my question. Therefore, I will stop discussing with the author here and leave my discussion to the AC-reviewer phase. This is my final response to the authors.

A Concrete RL-CONGEST Example for Edge-Biconnectivity

I know there are algorithms that can solve edge connectivity problems. But my question is there a concrete approach that can train an MPNN model to solve edge-connectivity problems for unseen graphs with different sizes and distributions based on the RL-congest framework? I will not expect authors to really train a model or achieve 100% accuracy (you are free to assume that your update function is powerful enough in this conceptual question.). I am just asking if it is possible and how, as the author continues to say the unique ID can improve the expressiveness of MPNN and enable MPNN to solve edge-connectivity problems. Using a statement like out of the scope of our paper is a sign of deliberately avoiding a direct answer and indicates the incapability of the proposed model.

Inductive learning

GCNII still falls under the message-passing category, which is fundamentally different from MLP(A). Therefore, GCNII can do inductive learning but that does not mean LINKX can do it. By using MLP(A), you already assume the ID for each node in A, if you permute the order of A, the result will change and an MLP trained on a graph of $A\in R^{n\times n}$ can not be applied on a graph of $A \in R^{m \times m}$ . Therefore, it only works on transductive settings for graph data.

SA-MLP focuses on the point cloud, where each sample has the same size (or say same number of nodes, and each node actually has its absolute position). SymphoNEI is just wrong in its statement of inductive learning. inductive learning means a model can generalize to graphs with different size (node number) and distribution (structure)

Using GCNII, SA-MLP, and SymphoNEI as examples indicates the author either doesn't understand the meaning of inductive learning or deliberately avoids answering my central concern.

2024-11-30

Dear Reviewer b62D,

Thank you for your response. We would like to further clarify our ideas and address your concerns.

On Existence and Trainability

We have clearly stated in our previous response that in our paper, we focus on existence and impossibility results, and do not address how to use real-world optimizers to train a model to solve problems like biconnectivity. These are two aspects of independent interest. It is not fair to criticize us that we are "avoiding problems" simply because we state that trainability is beyond the scope of this paper. Furthermore, we are not aware of any GNN expressiveness paper (proving that GNNs can solve specific algorithmic tasks) that has theoretically showed how to train a GNN using SGD or other optimization techniques to solve such tasks. If you are aware of such works, please list them, as we would be eager to learn from their techniques for future improvements.

In our paper, as in the works we cite, we prove theorems of the form: "for each graph $G$ , there exists an RL-CONGEST model that operates on it and solves the algorithmic task". These existence results are universal and hold for every graph. However, we do not claim to prove how such a model can be practically trained.

On LINKX in the Inductive Learning Setting

Simply stating that LINKX is not applicable to the inductive learning setting is a misunderstanding. We can address this by setting a maximum number of nodes for graphs, say $N$ , and padding all adjacency matrices $\mathbf{A}_i \in \mathbb{R}^{n_i \times n_i}$ to $\mathbf{A}_i' = \begin{bmatrix} \mathbf{A}_i & 0 \\\\ 0 & 0 \end{bmatrix} \in \mathbb{R}^{N \times N}$ . This approach mirrors the padding technique commonly used in the NLP domain.

Your concerns are akin to questions in the NLP area like, "The length of inputs varies, how can they be input into the same Transformer?" or "Your model can only handle sequences of the same length and cannot be applied to the inductive setting". The first concern is resolved using padding, and the second has been validated by the success of Transformer-based large language models.

Overall, we deeply appreciate your effort in engaging in discussions with us.

审稿意见

评分: 5置信度: 42024-10-31

The authors very correctly point out that the current theoretical analysis of GNNs is lacking in a few key ways (e.g. granularity and taking into account computational expense). To remedy that they propose using Resource-Limited CONGEST model, instead of usual CONGEST and relating WL-tests to model-checking problems that can prove a more granular expresivity testing.

优点

I agree with the authors that the theoretical expresivity analysis of GNNs is quite lacking. It makes a lot of sense to limit the computational power of the nodes (GNN update functions). As that is more realistic. The idea to use model-checking problems instead of WL to judge the theoretical power of GNNs is novel and I think quite promissing, as it allows for higher granularity.

This work also provides interesting motivation for why virtual nodes help, as they are a very common tool in practice. One of the first works to look at this theoretically to the best of my knwoledge.

It's generally well written and easy to follow.

缺点

Authors stress that "unlimited computational resources of CONGEST" is an issue and chose to just use a more restrictive computation class for the node updates. Ideally I'd like to see this being contrasted with the universal approximation theorem for MLPs. As the update function is usually an MLP it's power I'd say is more defined by approximation quality of whatever computation it needs to perform.

In the section "Additional Features Empower Models by Breaking Anonymity?" authors say that it's not good that some expressive GNNs might be breaking anonymous setting by using additional features. I would say that this is not a good way to look at this. In my opinion that the point of a good chunk of more expressive GNN research is precisely how to add pseudo-indentifiers to a graph with as few negative impacts (bad generalization).

Speaking about negative impacts of node identifiers, in the proposed computation model authors permit "nodes to be aware of their own unique IDs". This doesn't make much sense from ML perspective as generalization will be terrible if a stable ID assignment is not possible, and normally it is not possible to do so on general graphs. So for a paper arguing about making theoretical GNN analysis more realistic I think this is a noteable issue. Authors do motivate this choice by saying that "real-world graph datasets are rich in node features". I'd argue that this is still very far away from node IDs, e.g. say if features are just a few different atom types in case of many molecular tasks. I'd like to see some data analysis showing the unique identifiability of nodes in multitude of real world datasets to convince me that this is the case.

The work also lacks direct applicability to fixing or ranking GNN architectures. Which would be the main benefit of the newly proposed GNN analysis. To make the paper complete I would like to see analysis/ranking of some few popular GNN architectures and hopefully showing that this translates to some real tasks, for example ones for which the assumptions, such as unique identifiability by node features, more or less hold.

Also, speaking about popular GNN architectures, authors skipped the two first subgraph GNN papers, when discussing subgraph GNNs (https://arxiv.org/abs/2110.00577 https://arxiv.org/abs/2111.06283)

问题

Distributed computing has various computation models already, besides LOCAL and CONGEST. It would be nice if authors would dig a bit deeper in the distributed computing literature to see what alternatives already exist and if they would be more fitting than CONGEST. It's been a while since I looked at those myself, but for example https://arxiv.org/pdf/1202.1186 investigates a very restricted computational model, that should still be able to simulate a WL test (it was also used in some simpified GNNs https://arxiv.org/pdf/2205.13234). I'm sure that others exist as well.

评论- Initial Response to Reviewer DTJH (1/3)

2024-11-15

Dear Reviewer DTJH,

Thank you for reviewing our paper. We are very grateful for your detailed feedback and appreciate the opportunity to address some misunderstandings that may have arisen in the “Weaknesses” section of your review.

Regarding Unlimited Computational Resources in the CONGEST Model:

We respectfully disagree with the comment that "just use a more restrictive computation class for the node updates". Our goal is to introduce flexible constraints on the resources class $\mathsf{C}$ to derive different independent results, as discussed in Lines 381-389. For instance, setting $\mathsf{C} = \mathsf{R}$ (the class of recursive languages decidable by Turing machines) and network width $w = O(1)$ transforms our RL-CONGEST framework into the CONGEST model. By setting $\mathsf{C}$ to a class such as $\mathsf{TC}^0$ , which reflects the capabilities of MLPs, the resulting model would resemble "real-world" GNNs with MLPs as update functions. Alternatively, if node update functions used transformer-based LLM agents enhanced by Chain-of-Thought (CoT) reasoning, which are claimed to solve problems in $\mathsf{P}$ exactly [Merrill et al., 2024; Li et al., 2024], we could set $\mathsf{C} = \mathsf{P}$ to derive new theoretical results based on this adjustment. We hope that our framework can inspire future research on graph agents, and have added it in red color in the revised PDF (Lines 384-387). As discussed in Lines 381-389, adjusting $\mathsf{C}$ in different ways may yield diverse outcomes, making our RL-CONGEST framework a "framework scheme" or "framework template".

We respect your statement that "is more defined by approximation quality of whatever computation it needs to perform". We recognize the importance of the Universal Approximation Theorem (UAT) in machine learning and are aware of work addressing the approximation capabilities of GNNs, such as [Azizian et al., 2021; Wang et al., 2022]. However, as indicated by our paper's title, our work aligns with a different research path, focusing on a model's expressiveness through its capability to perform algorithmic tasks. For example, [Loukas, 2020] uses the CONGEST model to analyze MPGNNs' algorithmic abilities, while the Outstanding Paper at ICLR 2023 [Zhang et al., 2023] assesses GNNs' power to determine graph biconnectivity. These two lines of research --- expressiveness for algorithmic tasks versus approximation quality --- are largely orthogonal and develop independently. Additionally, discussions in the literature (e.g., Section 1.1 in [Loukas, 2020], which states that "Turing completeness is a strictly stronger property than universal approximation") suggest that Turing completeness is indeed a stronger property than universal approximation. Therefore, we believe that our focus on computability is sufficiently general and without loss of scope.

评论- Initial Response to Reviewer DTJH (2/3)

2024-11-15

On Additional Features Enhancing Models by Breaking Anonymity:

Our framework permits nodes to access unique IDs, but this does not imply that models must use them. This flexible setting is compatible with various feature types, including pseudo-identifiers or molecular types, as you mentioned. This choice is motivated by our observation that existing works often equate MPGNNs' expressive power with the anonymous WL test, which we find to be a mismatch due to the questionable anonymous setting. In Section 3.2, we aim to point out that previous works' equating anonymous WL with MPGNNs is not entirely reasonable, and thus concluding that MPGNNs are weak because WL test is weak is also debatable. In fact, MPGNNs can perform certain algorithms (such as solving edge biconnectivity in $O(D)$ rounds within the CONGEST model [Pritchard, 2006], Lines 311-313).

For clarity, we summarize the logical flow of Section 3.2 as follows:

Numerous studies following the seminal work GIN [Xu et al., 2019] claim that the vanilla WL test has limited expressive power --- a claim that is true, as shown in Figure 2. However, the appropriateness of using the anonymous WL test to characterize MPGNNs is debatable, given that real-world graphs frequently contain rich features. Additionally, [Loukas, 2020] demonstrated that with unique IDs (and other assumptions), MPGNNs can perform a wide range of algorithmic tasks.
To address the "limited" expressiveness of MPGNNs (stemming from the WL test's limitations), some works incorporate additional features (e.g., [Zhang et al., 2023]) to increase their models' expressiveness. Nonetheless, as discussed in (1), the anonymous WL test may not be the appropriate characterization for MPGNNs. Consequently, some studies' approach of demonstrating their model's expressiveness advantage by proving it can perform tasks beyond the WL test's capabilities may not be entirely valid. A more reasonable comparison would use MPGNNs in a non-anonymous setting (as suggested in [Loukas, 2020]). Further, evidence from [Loukas, 2020; Suomela, 2013; den Berg et al., 2018; You et al., 2021; Abboud et al., 2021; Sato et al., 2021] shows that non-anonymous settings can enhance model expressiveness, highlighting a mismatch when studies argue for "weak MPGNNs" yet use features that break the WL test's anonymity to boost expressiveness.
Our framework allows nodes to know their unique IDs, though this is optional. This flexibility is compatible with the use of features such as "a few different atom types in molecular tasks". Our RL-CONGEST analysis framework can apply to studies proposing new GNN variants that leverage additional features and claim the ability to perform specific algorithmic tasks, with the only requirement being a clear specification of the preprocessing time complexity for these features.

On "Lacks Direct Applicability to Fixing or Ranking GNN Architectures":

Our RL-CONGEST analysis framework has practical applications, as illustrated through results like model checking. For example, we show that $k$ -WL GNNs can perform PNF $C^k$ model checking --- a class of significant problems in theoretical computing --- while previous research aligned with WL tests, which are equivalent to the model equivalence problem. These results are discussed in detail in Section 4.3. However, please note that our paper's primary goal is to highlight issues in existing studies on GNN expressiveness and to propose a new analytical framework that avoids these issues. We do not aim to design a specific GNN model with improved performance or expressiveness or to provide guidance for such future work. Rather, we hope our framework will assist future research by helping to avoid issues discussed in Section 3 and encouraging a re-evaluation of common assumptions in GNN expressiveness studies.

On Subgraph GNNs:

Thank you for providing references to additional models. We have incorporated these references into the paper and marked them in red (Lines 43-44).

评论- Initial Response to Reviewer DTJH (3/3)

2024-11-15

Regarding Other Distributed Computing Models:

Indeed, we are aware of various distributed computing models, such as the CONGEST-CLIQUE, Coordinator, and Blackboard models. Some of these models can be considered special cases of the CONGEST model. For example, the CONGEST-CLIQUE model can be implemented by adding virtual edges to make the original graph a complete graph; the Coordinator model can be implemented by adding a virtual node connected to all other nodes. However, the LOCAL and CONGEST models are still the most widely mentioned in distributed computing books [Peleg 2000], courses [Hirvonen et al., 2020; Ghaffari, 2022], and conferences, so we chose to focus our discussion on these two. Additionally, some of our ideas are inspired by [Loukas, 2020], which explores the relationship between GNNs and these two models. Our framework generalizes their results, but it is based on the CONGEST model.

Thank you again for your detailed feedback. We hope our response clarifies our approach and addresses your concerns. We look forward to any further discussions.

References:

[Merrill et al., 2024] William Merrill, and Ashish Sabharwal. The Expressive Power of Transformers with Chain of Thought. ICLR 2024.

[Li et al., 2024] Zhiyuan Li, Hong Liu, Denny Zhou, and Tengyu Ma. Chain of Thought Empowers Transformers to Solve Inherently Serial Problems. ICLR 2024.

[Azizian et al., 2021] Waiss Azizian, and Marc Lelarge. Expressive Power of Invariant and Equivariant Graph Neural Networks. ICLR 2021.

[Wang et al., 2022] Xiyuan Wang, and Muhan Zhang. How Powerful are Spectral Graph Neural Networks. ICML 2022.