7.3

/10

Spotlight4 位审稿人

最低4最高5标准差0.5

3.8

置信度

创新性2.3

质量3.0

清晰度2.8

重要性3.0

NeurIPS 2025

What Expressivity Theory Misses: Message Passing Complexity for GNNs

Niklas Kemper,Tom Wollschläger,Stephan Günnemann

OpenReview PDF

提交: 2025-05-12更新: 2025-10-29

摘要

关键词

gnnmpnnexpressivitytheorycomplexitymessage passingover-squashingunder-reachingover-smoothingWLpracticehigher-order

评审与讨论

审稿意见

评分: 4置信度: 32025-06-21

The core argument is that classical expressivity theory is centered around whether a GNN can distinguish graphs under variants of the Weisfeiler-Lehman (WL) test fails to explain the practical performance of GNNs. This paper introduces Message Passing Complexity (MPC) -- a continuous complexity measure for analyzing Graph Neural Networks (GNNs), based on a probabilistic variant of the WL test. The authors argue that MPC better captures the real-world limitations (e.g., over-squashing, under-reaching) and correlates more closely with empirical behavior.

优缺点分析

Strengths:

S1: The paper is novel -- shifting focus from binary expressivity to continuous and task-specific expressivity. The chosen expressivity measure also connects well with the known GNN limitation of over-squashing.

S2: The theoretical analysis is thorough, including preserved impossibility results and proofs of key properties of MPC.

Weaknesses:

W1: The relation between MPC and existing phenomena such as over-smoothing remains vague. Although the authors claim to capture it in their "retaining information" task, there is no direct experimental study of over-smoothing, nor a clear connection to existing literature on the subject.

W2: The paper does to compare with or compare against existing expressivity measures beyond WL-based ones, which is my main concern.

W3: Visual presentation (e.g., figure readability and color scheme) detracts from clarity. Some figures are overly dense or poorly captioned.

问题

Q1: How does MPC explicitly capture over-smoothing?

Q2: Can you compare MPC empirically or theoretically (with use of simple examples) to other expressivity frameworks?

Q3: Have you tried computing MPC for real-world benchmarks (beyond synthetic graph families)? If not, can MPC be feasibly computed or approximated for those settings?

Q4: Can MPC differentiate between two models with identical MP graphs but different aggregation schemes or activation functions?

局限性

yes

最终评判理由

The authors’ rebuttal addresses the key concerns raised in my review. Most notably, they establish a clear theoretical and empirical connection between MPC and over-smoothing by relating information retention to random walk return probabilities. They also provide a thoughtful comparison to existing expressivity measures, arguing that MPC uniquely offers a continuous, task-specific framework that avoids the limitations of binary or task-agnostic approaches. These clarifications and additions motivated my score increase from 3 to 4.

格式问题

No formatting concerns

作者回复

2025-07-30

We thank the reviewer for their very valuable suggestions and questions that have helped us to improve our paper. We are pleased that you found our approach novel and liked our thorough theoretical analysis.

W3: Visual presentation

We appreciate your feedback on visual presentation, which is very important to us. The color schemes in our figures are specifically designed to visually compare MPC complexity values with empirical accuracy trends. With the higher page limit for the camera-ready version, we have extended figure captions to be more detailed and informative. We would be very happy to revise specific figures based on any additional feedback you might have to ensure maximum clarity.

Q1 & W1: Connection to over-smoothing

Over-smoothing is usually defined as the convergence of node representation to one global representation as the number of message passing layers increases with a progressive loss of initial node feature information [53,54]. This loss of individual node feature information is precisely what our retaining information task measures (Sec. 5): whether nodes can recover their initial features $f_v(G) = X_v$ after $L$ layers of message passing.

Our theoretical framework captures this phenomenon through the random walk connection. From Lemma 4.10, we can lower bound the complexity of information retention as: $\text{MPC}(f\_v,G) \ge -\log((I^L)\_{vv})$ where $(I^L)\_{vv}$ is the $L$ -step random walk return probability. A high return probability indicates that the node retains its local information, resulting in low MPC complexity. Conversely, a low return probability indicates that information diffuses throughout the graph, leading to high complexity. The scaling of complexity with depth depends on how quickly the random walk approaches its stationary distribution. This is in line with established over-smoothing literature that shows a direct connection between the rate of over-smoothing and how fast random walks approach their stationary distribution [52,54].

Empirically, our results in Section 5 validate this connection: Standard MPNNs like GCN, known to be suffering from over-smoothing, show increasing MPC complexity with depth (at least linear growth per Lemma 5.1) and correspondingly lower empirical information retention abilities.

We have added these theoretical connections to Section 4.2 and have expanded the information retention paragraph in Section 5 to explicitly discuss how MPC captures over-smoothing through this task.

Q2 & W2: Comparison to other expressivity measures

Existing expressivity measures have fundamental limitations for measuring task difficulty that make meaningful comparison with MPC across tasks difficult:

Task-agnostic measures: Most expressivity frameworks characterize architectures through specific capabilities (e.g., substructure recognition [49], biconnectivity [50]) and provide only global architecture rankings independent of the task (see Sec. 7, App. A). For example, the substructure-based measure would rank: standard MPNNs = MPNNs+virtual node < GSN < FragNet/CIN. However, our empirical results show that performance varies significantly across tasks—GCN-VN excels at information propagation while CIN excels at ring transfer—contradicting any fixed ranking.

Binary characterizations: Logic-based expressivity measures [3,22] determine theoretical feasibility but, like our WLC baseline, offer only binary possible/impossible classifications. Since all our tasks are theoretically learnable by standard MPNNs, these measures cannot reveal/explain any performance differences between architectures or tasks.

Restrictive assumptions: The only other task-specific measure—based on maximal "mixing" [14] requires twice-differentiable tasks with continuous features independent of graph topology, assumptions that don't apply to our tasks. Moreover, it also provides only binary possible/impossible statements.

To our knowledge, MPC is the first framework with continuous difficulty quantification for arbitrary graph tasks, thereby flexibly capturing practical difficulties beyond theoretical feasibility.

We have made this very important comparison more clearly in our Related Work and Evaluation sections.

Q3: Real-world benchmarks

Yes, we have evaluated MPC on graphs from real-world benchmarks, specifically graphs from the ZINC dataset (>100,000 molecular graphs, Fig. 23) and the LRGB peptides benchmark (>15,000 larger peptide molecules, Fig. 17). MPC can be very efficiently simulated (< 1 minute on consumer CPUs, see App. D) on these datasets for the tasks we tested using the approach detailed in App. D.5.

Moreover, we have now added in Sec. 5 an additional experiment on the ogbn-products graph, one of the largest OGB datasets [55] (>2 million nodes, >60 million edges) where MPC can still be efficiently simulated (< 10 minutes for simulation compared to >15 hours of training GNNs). We evaluated the retaining information task using GCN with a varying number of layers, with MPC trends aligning with empirical performance:

Num Layers	MPC	Accuracy
2	3.6	0.44
4	4.7	0.26
6	5.2	0.10
8	5.3	0.10

Besides this, MPC also allows the derivation of complexity bounds that require no computation whatsoever (e.g., Lemma 5.1 - 5.3).

Q4: Different Aggregation/Update Functions

In its current formulation, MPC cannot differentiate between models with identical message passing graphs, as it abstracts from implementation details to isolate topological complexity (Sec. 6, App. B).

Rationale: While different update functions have distinct theoretical properties, the common choice of MLPs with sum aggregation can approximate any update function [56]. This suggests that for many tasks and models, message-passing topology may dominate difficulty over specific aggregation choices. Empirically, almost all of our experiments across diverse architectures (GCN, GIN, GraphSage, FragNet, CIN) show strong MPC-performance correlations despite different aggregation schemes, indicating our abstraction captures the primary complexity source for the tasks we tested.

Extension potential: MPC can incorporate aggregation-specific effects when needed. Currently, we assume uniform message loss probabilities $I_{uv}$ for messages to $v$ , but these can be adapted—e.g., for GIN's emphasis on self-loops in the aggregation, we could set $I_{vv} > I_{wv}$ (for $w \neq v$ ).

While MPC currently abstracts from the choice of update function, it successfully isolates topological constraints that appear to be the dominant complexity source in our experiments. Our framework's flexibility allows for incorporating aggregation-specific effects when they significantly impact task complexity. We have expanded our discussion of this in the Limitations section.

Summary

We thank the reviewer for their detailed and very valuable review that has helped us improve our paper greatly by more clearly presenting the connection to over-smoothing (Sec. 4 & 5), adding an additional real-world benchmark (Sec. 5) and improving the comparison to other expressivity measures (Sec. 5 & 7).

Additional References

[52] Xu et al. Representation Learning on Graphs with Jumping Knowledge Networks. ICML 2018.

[53] Zhao et al. PairNorm: Tackling Oversmoothing in GNNs. ICRL 2020.

[54] Giraldo et al. On the Trade-off between Over-smoothing and Over-squashing in Deep Graph Neural Networks. CIKM 2023

[56] Zaheer et al. Deep sets. Neurips 2017.

2025-08-03

I thank the authors for their rebuttal. They have addressed most of my concerns -- I have therefore raised my score from 3 to 4 and now lean toward acceptance.

2025-08-04

We thank the reviewer for their thoughtful engagement that has helped to improve our manuscript and are pleased that our revisions have addressed your main concerns.

审稿意见

评分: 5置信度: 42025-06-24

The authors highlight limitations with Iso expressivity; higher-expressivity is often not required for a task, it is binary and makes idealized assumptions. To overcome this, the authors present Message Passing Complexity (MPC). Effectively, this measures how hard it is to learn a function on a node under lossy message-passing. The authors show how this relates to some current expressivity measures and how it aligns with practically relevant tasks.

优缺点分析

Strengths

Well-described problem definition. The limitations of currently used iso expressivity are very clearly described and your approach is thus extremely well motivated. In general, this problem is a huge concern for the community and addressing this problem in a positive way is very impactful.
Excellent theoretical comparison to iso expressivity. I really appreciate Section 4.2. It clearly outlines the benefits of the continuous measure and how it relates to and encompasses Iso expressivity.
Toy examples are well motivated and clearly identify benefits of the approach. The experiments are thoughtfully designed and clearly show the benefits of the approach in terms of being more fine-grained and aligning with over-smoothing/over-squashing in comparison to iso expressivity.

Weaknesses

The main weakness is the reliance on knowing the function f_v. This restricts the analysis to toy examples and we have no way of knowing how to compare models and approaches on datasets where we can't easily obtain this node-level function. Given that one of the highlighted limitations of iso expressivity is not aligning with real-world tasks, I don't think MPC solves this issue. Additionally, if the task is graph-level and depends on a complex aggregation of node-functions at different layers L (as expected in practical scenarios), then it is not clear how useful MPC can be in such a case.
The paper focuses on iso expressivity and only briefly or doesn't mention expressivity related to learning polynomial filters [1], counting substructures [2] and over-squashing [3]. The differences between these expressivity measures and iso expressivity are sometimes not clarified in the paper. For example, in line [123], the authors state that "adding a virtual node does not increase expressivity". However, in [4], although it doesn't improve iso expressivity, they do show that it improves in terms of the expressivity described in both [1] and [3]. In general, I believe a stronger discussion of [3] is warranted given the strong connection to your approach (both indicate that a task is hard if it requires combining information from many nodes through messages of low probability such as messages through bottlenecks.)
I think the paper would be improved if some suggestions could be proposed for how MPC would be used beyond previous measures in improving model design. The success of iso expressivity has arguably been how it's been quite easily used to suggest more powerful and expressive architectures. For MPC, it is not clear to me how it can be used in the same way beyond current measures [3]. What would be the difference between reducing MPC and reducing the commute time between nodes [3]? How could we use this to design better models? I think understanding this would go a long way to assessing the impact this paper would have on the field.

[1] Wang et al. How powerful are spectral graph neural networks. In International Conference on Machine Learning. PMLR 2022.

[2] Bouritsas et al. Improving graph neural network expressivity via subgraph isomorphism counting. IEEE Transactions on Pattern Analysis and Machine Intelligence 2023.

[3] Di Giovanni et al. How does over-squashing affect the power of gnns? TMLR 2023.

[4] Southern et al. Understanding Virtual Nodes: Oversquashing and Node Heterogeneity. ICLR 2025.

问题

My score will be increased with a better understanding of the impact MPC can have on the field and addressing my weaknesses. Concretely:

How can MPC be used to align expressivity to accuracy on real-world tasks?
How can MPC be used to improve models and how would this differ from reducing the commute time?

局限性

yes

最终评判理由

The authors have addressed the few concerns which I originally had

格式问题

none

作者回复

2025-07-30

We thank the reviewer for their thoughtful and detailed review, which has helped us improve our paper. We are very pleased that you appreciated our discussion of the limitations of iso expressivity and the theoretical as well as the empirical comparison to MPC.

1. Reliance on knowing task $f_v$

You raise an important point about dependency on knowing $f_v$ . While we acknowledge this limitation (Sec. 6), MPC can provide valuable insights - even if we do not know the true task - through well-chosen proxy tasks:

Fundamental capabilities: Test universal MPNN requirements (information retention, propagation) that capture empirically relevant failure modes (over-smoothing, over-squashing) affecting real-world performance regardless of specific tasks.
Domain-informed proxy tasks: Use domain knowledge to identify key capabilities—e.g., ring identification for small molecule predictions, long-range propagation for proteins.

Empirical validation: We demonstrate on the ZINC benchmark that proxy task insights can translate to unknown target tasks. While the exact function for predicting penalized logP is unknown, domain knowledge indicates that identifying rings is crucial [42]. This suggests our ring-transfer task as a suitable proxy for the complicated unknown graph-level task. Indeed, our ring-transfer task yields MPC complexities on ZINC graphs (Fig. 22) that accurately match performance trends on the actual benchmark task:

Model	MPC (Ring Task)	MAE (ZINC)
GraphSage	7.3	0.40
GCN	7.3	0.37
GSN	3.6	0.12
FragNet	3.2	0.078
CIN	2.9	0.077

Of course, no measure can capture the full complexity of unknown tasks. But unlike most existing measures limited to one specific capability, MPC enables analysis of arbitrary domain-relevant proxy tasks. In the experiment on the ZINC benchmark (that we have now included in Sec. 5), we have shown that insights from these proxy tasks can translate to the unknown task. MPC therefore narrows the theory-practice gap by flexibly focusing on capabilities that matter for specific applications.

2. Connection to other expressivity measures

Thank you for the pointer to line 123, we have now clarified in our revision that our statement about virtual nodes (line 123) refers specifically to iso expressivity. As you correctly note, virtual nodes can improve other expressivity measures while maintaining identical WL expressivity. We now highlight this explicitly in the theoretical discussion of the information propagation task in Sec. 5.

Additionally, we have expanded our discussion of non-iso expressivity measures in Sec. 7, particularly the connection to Di Giovanni et al. [14]:

Expanded related work on expressivity measures in Sec. 7:

Beyond iso expressivity, several other frameworks characterize MPNN expressivity through specific capabilities. Wang et al. [52] analyze spectral MPNNs through their ability to learn polynomial filters. Zhang et al. [49] rank architectures by the set of substructures they can recognize, while Zhang et al. [50] compare models through their ability to solve the graph biconnectivity problem. Another line of research characterizes learnable functions through fragments of first-order logic [3, 22], offering a more nuanced understanding by considering the effects of different activation functions [28]. However, these expressivity characterizations share key limitations with iso expressivity: they remain binary (can/cannot solve) and assume lossless information propagation, limiting their insights for real-world MPNNs. For a more in-depth comparison, we refer the reader to App. A.

Most relevant to our approach is the work by Di Giovanni et al. [14], which derives expressivity limitations from over-squashing theory. They show tasks become impossible when the required "mixing" between nodes (measured via maximal Hessian entries) exceeds what MPNNs with smooth activations can generate. Like MPC, this can capture practical impossibilities arising from over-squashing. However, their framework considers only pairwise interactions with restrictive assumptions on the task (twice differentiable tasks, not dependent on graph topology). In contrast, MPC applies to arbitrary graph tasks, captures difficulties beyond pairwise interactions (e.g., substructure identification, over-smoothing), subsumes impossibility results from iso expressivity, and is empirically validated across diverse tasks.

3. Impact of MPC on the future of graph learning research

Lastly, we address your two key questions about MPC's practical impact:

Q1: Better alignment between theory and practice

How can MPC be used to align expressivity to accuracy on real-world tasks?

MPC is specifically designed to better align with empirical performance (Sec. 3 & 4) by:

Capturing practical limitations arising from the lossy information propagation of real-world MPNNs (e.g., over-smoothing, over-squashing)
Providing a continuous difficulty measure rather than binary possible/impossible statements
Flexibly considering arbitrary tasks relevant to the domain instead of focusing on one fixed task.

Our empirical evaluation in Section 5 validates this: across 8 architectures, we demonstrate strong correlation between MPC complexity and performance of trained MPNNs on fundamental tasks covering common failure modes.

Moreover, our ZINC benchmark analysis (see Point 1) shows that even when the true target functions are unknown or too complex to evaluate, MPC insights from domain-informed proxies can successfully predict real-world performance trends.

While we, of course, cannot completely close the gap between theory and practice, we can at least narrow it.

Q2: Role of MPC for developing new architectures

How can MPC be used to improve models and how would this differ from reducing the commute time?

First, we show how MPC can help the theoretical analysis/design of future MPNNs, and then we give concrete practical examples of potential architectural modification guided by MPC.

Theoretical contributions on future architectural design

A first and important contribution of our work is clearly identifying all the limitations of the pre-dominant iso expressivity framework. We hope that this already will shift attention in future model design toward problems that more meaningfully constrain practical performance.

Second, MPC enables a more fine-grained theoretical analysis. For example, MPC can reveal when architectures theoretically capable of solving a task will struggle in practice due to high complexity, making previously hidden problems visible to researchers. For instance, we showed poor complexity scaling for recent higher-order MPNNs on the propagating information task.

Third, rather than seeking universally "best" models, MPC recognizes that graph tasks have diverse requirements. By analyzing complexity for specific capabilities (e.g., substructure identification), researchers can precisely optimize architectures for domain-specific needs rather than generic expressivity maximization.

Concrete examples of how MPC can guide model improvement for specific capabilities needed in certain domains:

Multi-node aggregation: Many tasks require aggregated information from groups of related nodes (e.g., identifying ring types in molecules, transaction analysis in relational learning [54]). MPC analysis suggests introducing dedicated hub nodes that consolidate information within each group (e.g., all nodes in a transaction or ring). This reduces MPC complexity by replacing many individual messages with a single message from hub to target node. This principle (partly) explains the empirical success of molecular architectures like HIMP [53] which introduce higher-order nodes for rings and junctions. Future research could apply this principle to other domains (e.g., relational deep learning) to reduce MPC complexity.
Substructure identification tasks: MPC complexity bounds (Lemma 5.3) reveal that even simple ring detection is very complex and scales at least linearly with ring size, indicating that architectural modifications alone may be insufficient for learning complex substructures, even when theoretically possible. This MPC insight suggests that expressive positional encodings or preprocessing steps (as used by FragNet [42] or CIN [6]) that directly reduce task complexity may be more effective than purely architectural approaches, highlighting a promising research direction.

Comparison to commute time: Commute time approaches [14] are limited to pairwise interactions, while MPC can quantify the difficulty of arbitrary tasks that often require coordinated information from multiple nodes (as expected in many real-world tasks and demonstrated by our multi-node aggregation and substructure identification examples). Moreover, MPC provides impossibility statements by subsuming iso expressivity results (Lemma C.3), offering a more unified theoretical framework.

We have now included a discussion of the concrete ways in which MPC can benefit future MPNN design in Sec. 6.

Summary

We thank the reviewer for the very helpful review that helped us improve our manuscript greatly by 1) comparing MPC more clearly to existing measures in Sec. 7 and 2) more concretely demonstrating how the insights of MPC can capture real-world tasks and guide future model design in Sec. 5.

Additional References

[52] Wang et al. How powerful are spectral graph neural networks. ICML 2022.

[53] Fey et al. Hierarchical Inter-Message Passing for Learning on Molecular Graphs. arXiv:2006.12179

[54] Fey et al. Relational Deep Learning: Graph Representation Learning on Relational Databases. arXiv:2312.04615.

2025-08-06

We hope our response has addressed your concerns. Please let us know if all your questions have been resolved or if there are any remaining points we can clarify. Thank you again for the thoughtful review that has already helped us improve our paper.

2025-08-06

Thank your for addressing my concerns. I have raised by score from borderline accept to accept in light of this

审稿意见

评分: 4置信度: 42025-06-29

This paper proposes a novel framework, Message Passing Complexity (MPC), to evaluate the capabilities of GNNs beyond traditional isomorphism-based expressivity theory. Different from prior work that primarily focuses on whether a GNN can distinguish graph structures in theory, MPC provides a continuous, task-specific measure that quantifies the difficulty of solving a task via message passing under realistic constraints such as over-squashing and under-reaching. Built on a probabilistic extension of the WL test, MPC better explains empirical performance across diverse tasks and architectures. Extensive theoretical analysis and empirical validation demonstrate that MPC correlates well with real-world performance.

优缺点分析

Strengths:

The paper introduces the MPC framework as a novel extension to traditional expressivity theory. It provides a continuous, task-specific, and new measure for evaluating GNNs.
MPC explicitly models real-world issues such as over-squashing and lossy information propagation, bridging the gap between idealized expressivity theory and the practical performance of GNNs.
The paper conducts extensive experiments across different tasks and widely used general GNN architectures to support the practical relevance of MPC.

Weaknesses:

It does not evaluate MPC on 3D GNNs like DimeNet++ [1], SphereNet [2], or PaiNN [3], which are widely used in molecular tasks. Whether MPC extends to architectures with geometric or directional message passing remains unclear.
The MPC is not evaluated on large-scale benchmarks such as ogbl-ppa, ogbn-products, and ogbg-molhiv [4]. Thus, MPC's scalability and practical applicability in real-world settings remain unverified.

[1] Gasteiger J, Giri S, Margraf J T, et al. Fast and uncertainty-aware directional message passing for non-equilibrium molecules[J]. arXiv preprint arXiv:2011.14115, 2020.

[2] Liu Y, Wang L, Liu M, et al. Spherical message passing for 3d graph networks[J]. arXiv preprint arXiv:2102.05013, 2021.

[3] Schütt K, Unke O, Gastegger M. Equivariant message passing for the prediction of tensorial properties and molecular spectra[C]//International Conference on Machine Learning. PMLR, 2021: 9377-9388.

[4] https://ogb.stanford.edu/docs/dataset_overview/

问题

The framework assumes that node features are reliable and informative. In practice, GNNs often deal with noisy or missing features and labels, and it is unclear how MPC accounts for this.
Can the proposed MPC framework extend to dynamic or temporal graphs where structure and features evolve?

局限性

Please see weaknesses and questions.

最终评判理由

The authors have addressed all the issues I proposed during the review process, so I raised my score from 3 to 4.

格式问题

None.

作者回复

2025-07-30

We thank the reviewer for their thoughtful review and valuable suggestions for extending MPC. We are particularly pleased that you found our work successfully bridges the gap between idealized expressivity theory and the practical performance of GNNs.

While our work focuses on establishing MPC in static graphs with general-purpose MPNNs, we want to address your specific concerns about broader applicability. Below, we describe how MPC's framework can extend to the settings you mentioned (demonstrating its flexibility), along with some first empirical evidence.

Weakness 1: 3D GNNs

While our work focuses on general purpose MPNNs (covering 8 different standard, higher-order, and topological architectures across multiple tasks), MPC can also capture 3D GNNs through appropriate message passing graph transformations, defined in our general framework (App. B).

To illustrate this, consider DimeNet [1]: This architecture maintains representations for all directed edges and incorporates angular information. In our framework, this can be captured by using the directed line graph $G'=(V',E')$ of the original graph $G$ as the message passing graph, where $V'=E$ (the original edges become nodes) and for any $e_1 = (v_j, v_k), e_2 = (v_k, v_i) \in E$ , we have $((v_j, v_k), (v_k, v_i)) \in E'$ . The angular and distance features can be incorporated as edge/node features in this transformed message passing graph.

This highlights the flexibility of MPC to capture new architectures. We have added the complete MP graph definition for DimeNet to our general framework in App. B and added a paragraph on the potential of studying the complexities of 3D GNNs to our future work section.

Weakness 2: Large-scale Benchmarks

First, we note that we already evaluated MPC on graphs from large real-world datasets: the ZINC dataset (>100,000 molecular graphs, Fig. 23) and the LRGB peptides benchmark (>15.000 larger peptide molecules, Fig. 17). As described in App. D.1, MPC can be simulated in <10 seconds for our considered tasks on these graphs and has linear complexity in the number of nodes and edges reachable from v. Additionally, MPC enables derivation of theoretical bounds (e.g., Lemma 5.1-5.3) requiring no computation, providing architectural insights without any expensive training.

To directly address scalability concerns, we conducted additional experiments on ogbn-products [3], one of the largest OGB datasets (>2 million nodes, >60 million edges). We evaluated the retaining information task using GCN with a varying number of layers:

Num Layers	MPC	Accuracy
2	3.6	0.44
4	4.7	0.26
6	5.2	0.10
8	5.3	0.10

Key observations:

MPC trends match trends in empirical performance.
GCN training required >15 hours on A100 GPUs while MPC simulation required <10 minutes on consumer CPU (using approach in App. D.5).

This demonstrates that MPC maintains both its predictive power and computational advantages for this large-scale dataset. We have included this additional experiment in Sec. 5 and thank the reviewer for this suggestion, which has strengthened our manuscript.

Question 1: Missing features

While our current work specifically focuses on isolating message-passing complexity from other sources, the probabilistic foundation of MPC offers potential for extension to additional complexity sources like missing features.

We can extend MPC by modifying the initial step in Def. 4.3 to model feature availability: $\text{lossyWL}_v^0 = L_v \cdot X_v$ where $L_v \sim \text{Bernoulli}(p_v)$ represents the probability of feature availability.

For a first empirical demonstration, we conduct our information propagation experiment (D=5, GCN) with varying levels of missing node feature probabilities:

Missing Feature Probability	MPC	Accuracy
0.0	4.9	0.69
0.2	5.1	0.47
0.4	5.4	0.14
0.6	5.8	0.11

The results show that the adapted MPC accurately captures the increasing difficulty with feature loss in this initial experiment.

More broadly, this demonstrates the potential of our probabilistic formulation to extend to other complexity sources beyond message passing — something existing expressivity measures based on logic or WL tests cannot seamlessly accommodate. We have added a discussion of this extension to the future work section of our manuscript.

Question 2: Temporal Graphs

While our current work focuses on static graphs, MPC can be naturally extended to temporal graphs: MPC becomes time-dependent, i.e., $\text{MPC}_t(f_v, G)$ , by performing our probabilistic WL test (lossyWL) on snapshot graphs $G^t$ as defined in [2].

For an empirical demonstration of this capability, we conducted an experiment on random dynamic graphs where edges are added progressively (p=0.06 per timestep), modeling the densification common in real-world networks (e.g., social interactions, biological systems). We evaluated MPC for a 4-layer GCN on the retaining information task:

Time Step	MPC	Accuracy
1	1.9	0.52
2	2.9	0.35
3	3.4	0.22
4	3.6	0.17
5	3.7	0.16
6	3.8	0.15

Key observations:

MPC correctly captures the decreasing empirical accuracy as graphs densify over time
Thereby, MPC quantifies how structural evolution over time affects task difficulty

This demonstrates MPC's adaptability to temporal settings and opens interesting research directions for understanding GNN behavior in dynamic environments. We have added this extension of MPC to the future work section of our manuscript.

Summary

We thank the reviewer again for their valuable suggestions. They demonstrate the flexibility of our probabilistic MPC formulation and its potential for extensions beyond standard graph learning settings, and have greatly improved our manuscript.

[1] Gasteiger et al. Directional Message Passing for Molecular Graphs. ICLR 2020.

[2] Longa et al. Graph Neural Networks for temporal graphs: State of the art, open challenges, and opportunities. arXiv:2302.01018

[3] Hu et al. Open Graph Benchmark: Datasets for Machine Learning on Graphs. Neurips 2020.

2025-08-03

Thanks for your response. I still have one concern. In Q1, my question is related to noisy or missing features, but the authors missed the impacts of noisy data? Can you provide more discussions about noisy data on graphs with your proposed methods in this manuscript, such as graph classifications?

2025-08-06

We thank the reviewer for their thoughtful engagement and valuable suggestions. We are pleased that our revisions have addressed your concerns and appreciate your constructive feedback, which has helped improve the paper throughout this process.

2025-08-04

We thank the reviewer for the question and clarification. You are right - we only showed the extension to missing features. While our focus is to isolate the difficulty arising from the message passing process (Sec. 4), we can also extend our framework to include complexity arising from noise.

Extension to noisy features: Very similar to how feature loss can be incorporated into our framework, we can also incorporate noisy features. In general, let $X_v$ be the true feature of node $v$ , and let $\tilde{X}_v$ be the observed noisy feature according to some noise distribution $\tilde{X}_v \sim N_v(X_v)$ .

We modify the initial step in Definition 4.3 to incorporate noise: $\text{lossyWL}_v^0 = \tilde{X}_v$ where $\tilde{X}_v \sim N_v(X_v)$ . Notice that this approach subsumes feature loss as a special case.

Empirical demonstration: Similar to our feature loss experiment, we conducted an initial experiment on the information propagation task ( $D=4$ , GCN) with noisy features. For our categorical features (with $k$ classes), we consider a noise distribution where $\mathbb{P}[\tilde{X}\_v = c] = 1-p\_{\text{noise}}$ for $c = X_v$ and $\mathbb{P}[\tilde{X}_v = c] = p\_{\text{noise}}/(k-1)$ for $c \neq X\_v$ . This models uniform noise on discrete node features.

$p_{\text{noise}}$	MPC	Accuracy
0.0	4.9	0.69
0.2	5.1	0.29
0.4	5.4	0.13
0.6	5.8	0.10

The results show that MPC successfully captures the increasing task difficulty as feature noise increases, with performance degrading correspondingly.

Broader significance: This demonstrates an advantage of our probabilistic framework—it can flexibly incorporate various sources of complexity beyond the difficulty arising from message passing, including both missing and noisy features. To our knowledge, no other graph expressivity framework studies these sources of difficulty or can seamlessly accommodate them.

We have modified our future work discussion to include this general approach to include feature noise and thank the reviewer for this valuable suggestion that highlights MPC's broader applicability to real-world scenarios.

2025-08-06

Thanks for your response. My concern has been solved. I will raise my score. Good luck!

审稿意见

评分: 5置信度: 42025-07-04

This paper studies the topic of understanding (expressive or complexity) power of a graph neural network model that has traditionally been done using Weisfeiler leman tests. Such existing theory for GNNs are limited which is well-known. This paper provides an excellent discussion on this limitation and proposes Message Passing Complexity (MPC) as an alternative framework. It argues that traditional WL expressivity measurements has 2 limitations (i) it assumes idealized conditions that may not often reflect practical GNN behavior and (ii) it provides only binary characterizations rather than quantifying difficulty in a task. As a result, this paper introduces MPC, which is called a continuous complexity measure based on a probabilistic improvement of the Weisfeiler-Leman test called lossyWL. Through lossyWL message loss during propagation can be measured with probabilities based on the influence matrix which is a function of the graph structure. Controlled experiments on information retention, propagation and topological characteristics are done on both synthetic and real graphs where the MPC measure is proportional to task complexity and model performance which traditional WL like tests fail to incorporate or model. Overall, the paper is clearly written, the proposed method helps explain phenomenons such as virtual nodes, long distance complexity, etc. which are of much interest in GNN literature.

优缺点分析

Strengths:

The paper includes a convincing discussion on the limitations of existing WL expressivity framework and shows how normalized adjacency matrix can be used to improvise the WL kind of coloring.
The proposed MPC framework provides a new continuous measure to understand how difficult a task is and what would its affect be on message passing mechanism based GNNs.
MPC helps explain effectiveness of components such as virtual nodes and challenges of distant information propagation which are not traditionally explained by WL test.
The proxy tasks design in both synthetic and real world graphs and the evaluation using multiple architectures demonstrates correlation between MPC and performance, and is well executed.
Overall, the insights provide a better understanding of message passing as a mechanism and how certain shortcomings can be addressed, for instance why virtual node improves reachability.

Weaknesses:

One analysis that seems missing (and this perhaps may not be straightforward) is how the proposed framework quantifies/ranks current representative GNN architectures? This has often been seen for prior studies with WL tests.
There can be complexity in computing MPC in real tasks.
The implications of MPC in future architecture design is not explicit in the paper which can be through a resulting model, or through guidance on how the presented insights can help develop better architectures.

问题

included in weaknesses

局限性

yes

格式问题

作者回复

2025-07-30

We thank the reviewer for their thorough review of our work. We are particularly happy that you found that our discussion of the limitations of WL-based expressivity is convincing and that the insights of MPC provide a better understanding of message passing.

1. Complexity in computing MPC

First, note that for our evaluated tasks, MPC can be efficiently simulated using the approach detailed in App. D. However, we acknowledge that computing exact MPC values can become challenging for sophisticated tasks (see our discussion in Sec. 6).

Practical solutions: We address this limitation through two complementary approaches:

Proxy task analysis: Focus on fundamental capabilities required across domains (e.g., information retention, propagation) or domain-specific operations informed by the true task or prior knowledge (e.g., ring identification for molecular tasks). Our results (see Answer to Reviewer 46h7) show that these proxy insights can translate to real-world performance trends even on sophisticated tasks.
Theoretical bounds: Even when exact computation is infeasible, complexity bounds (Lemmas 5.1-5.3) can provide sufficient insight to capture empirical performance trends (Sec. 5) requiring no computation at all.

In summary, exact MPC computation faces limitations for sophisticated tasks—as one would expect for any practical complexity measure. However, our framework still provides valuable task-specific architectural insights through proxy analysis and theoretical bounds that do capture practical performance trends—insights that binary expressivity measures cannot offer.

2. MPC for future architecture design

First, we show how MPC can help the theoretical analysis/design of future MPNNs, and then we give concrete practical examples of potential architectural modification guided by MPC.

Theoretical Contributions on future architectural design

By clearly identifying the limitations of iso expressivity, we shift attention in future model design from iso expressivity maximization towards problems that more meaningfully constrain practical performance (e.g. lossiness of message passing).
MPC enables a more fine-grained theoretical analysis that enables uncovering previously hidden problems (e.g., high complexity scaling for substructure identification, bottlenecks in long-range propagation) and allows a more nuanced comparison of models.
By analyzing complexity for specific tasks (e.g., substructure identification), researchers can precisely optimize architectures for domain-specific needs rather than generic expressivity maximization (see also Point 3).

Concrete examples of how MPC can guide model improvement for specific capabilities needed in certain domains:

Multi-node aggregation: Many tasks require aggregated information from groups of related nodes (e.g., identifying ring types in molecules, transaction analysis in relational learning [4]). MPC analysis suggests introducing dedicated hub nodes that consolidate information within each group (e.g., all nodes in a transaction or ring). This reduces MPC complexity by replacing many individual messages with a single message from hub to target node. This principle (partly) explains the empirical success of molecular architectures like HIMP [1] which introduce higher-order nodes for rings and junctions. Future research could apply this principle to other domains (e.g., relational deep learning) to reduce MPC complexity.
Substructure identification tasks: MPC complexity bounds (Lemma 5.3) reveal that even simple ring detection is very complex and scales at least linearly with ring size, indicating that architectural modifications alone may be insufficient for learning complex substructures, even when theoretically possible. This MPC insight suggests that expressive positional encodings or preprocessing steps (as used by FragNet [2] or CIN [3]) that directly reduce task complexity may be more effective than purely architectural approaches, highlighting a promising research direction.

We have included a discussion of the concrete ways in which MPC can benefit future MPNN design in Sec. 6.

3. Ranking of MPNNs

While many existing expressivity works rank architectures globally based on one specific binary measure (e.g., distinguishing non-isomorphism graphs, identifying substructures), a key result of our work is that architecture performance varies significantly across tasks: Our evaluation shows no single architecture dominates across all tasks. For example, GCN-VN performs best for information propagation while FragNet/CIN excel at ring transfer.

Therefore, we provide task-specific complexity comparisons (e.g., Lemmas 5.1 - 5.3) rather than universal rankings:

Ring Transfer task: FragNet/CIN < GSN < standard MPNNs (with/without virtual node)
Information propagation task: GCN-VN < all other

These complexity rankings align very well with trends in empirical performance for these individual tasks: Lower MPC complexity correlates with better empirical performance, explaining the superior performance of the respective models in the respective tasks (Sec. 5).

In summary, the task-specific formulation of MPC enables a more fine-grained and application-specific architecture comparison than universal rankings.

Summary

We thank the reviewer for the thoughtful review. It helped improve our manuscript greatly by clarifying how MPC can guide future model design (in Sec. 5), how MPC can be used for complex tasks (in Sec. 5), and stressing the importance of considering individual tasks (in Sec. 3 & 4).

[1] Fey et al. Hierarchical Inter-Message Passing for Learning on Molecular Graphs. arXiv:2006.12179

[2] Wollschläger et al. Expressivity and Generalization: Fragment-Biases for Molecular GNNs. ICML 2024.

[3] Bodnar et al. Weisfeiler and Lehman Go Cellular: CW Networks. Neurips 2021.

[4] Fey et al. Relational Deep Learning: Graph Representation Learning on Relational Databases. arXiv:2312.04615.

2025-08-06

2025-08-09

Thank you for addressing my questions on the complexity and implications of the MPC framework.

最终决定Accept (spotlight)

2025-09-17

The paper argues that the current dominance of WL-style iso-expressivity for analyzing GNNs is misaligned with practical performance. It proposes Message Passing Complexity (MPC)—a continuous, task-specific measure derived from a probabilistic WL (lossyWL)—to quantify how difficult it is for an MPNN to solve a task under lossy message passing. MPC preserves impossibility results from iso-expressivity, captures practical phenomena such as over-squashing and over-smoothing, and correlates with empirical performance across multiple architectures and tasks. The authors support these claims with theory (e.g., lemmas bounding complexity) and controlled studies on synthetic and real graphs, plus added analyses during rebuttal.

Strengths

Timely reframing: Moves beyond binary expressivity to a continuous measure aligned with practice; well-motivated and clearly articulated.
Bridges theory and practice: Explains phenomena (over-squashing/over-smoothing, long-range propagation) and why components like virtual nodes help.
General framework: Clean abstraction via message-passing graphs; preserves impossibility results while enabling task-specific analysis.
Empirical validation: Consistent correlation between MPC and accuracy across 8+ architectures and tasks, with real-graph evidence (ZINC, LRGB) and a large-scale ogbn-products demonstration added in rebuttal.
Actionable insights: Task-specific complexity suggests design levers (e.g., hub nodes / higher-order nodes for multi-node aggregation; substructure-aware encodings such as CIN/FragNet) rather than chasing iso-expressivity.

Weakness

Computation at scale / complex tasks: Exact MPC can be costly or intractable; reliance on proxy tasks and bounds is sensible but not a full substitute.
Presentation: Some figures dense; authors commit to improved captions/visual clarity.

Reasons for decision.

During rebuttal, the authors focused on making MPC practical and broadly applicable. They outlined an efficient estimation path using proxy tasks and theoretical bounds (App. D), and showed that MPC tracks accuracy on large graphs. They mapped the framework to 3D/directional GNNs via directed line graphs, and discussed extensions to settings with missing/noisy features and to temporal graphs. They also clarified the link to over-smoothing through random‑walk return probabilities, and strengthened baselines, figures, and exposition. Collectively, these updates addressed the central concerns, prompting score increases and a confident Accept (Spotlight) consensus.

What Expressivity Theory Misses: Message Passing Complexity for GNNs

摘要

评审与讨论

优缺点分析

问题

局限性

最终评判理由

格式问题

W3: Visual presentation

Q1 & W1: Connection to over-smoothing

Q2 & W2: Comparison to other expressivity measures

Q3: Real-world benchmarks

Q4: Different Aggregation/Update Functions

Summary

Additional References

优缺点分析

问题

局限性

最终评判理由

格式问题

1. Reliance on knowing task fvf_vfv​

2. Connection to other expressivity measures

3. Impact of MPC on the future of graph learning research

Q1: Better alignment between theory and practice

Q2: Role of MPC for developing new architectures

Summary

Additional References

优缺点分析

问题

局限性

最终评判理由

格式问题

Weakness 1: 3D GNNs

Weakness 2: Large-scale Benchmarks

Question 1: Missing features

Question 2: Temporal Graphs

Summary

优缺点分析

问题

局限性

格式问题

1. Complexity in computing MPC

2. MPC for future architecture design

3. Ranking of MPNNs

Summary

Strengths

Weakness

Reasons for decision.

1. Reliance on knowing task $f_v$