Towards characterizing the value of edge embeddings in graph neural networks
We characterize the added representational power conferred to by edge embeddings, considering additional constraints like memory and symmetry.
摘要
评审与讨论
In this paper, the authors investigate the influence of embedding edges instead of embedding nodes in GNN models. By proving a series of theorems, the authors claim that the representational power of edge-based GNNs is inherently superior to that of node-based GNNs. The authors further validate their theorems through multiple experiments.
优点
- The authors provide a systematic analysis of edge-based GNN models, which have been under-investigated in previous literature.
- The writing is well-structured and helps readers quickly grasp the main ideas, despite the theory-heavy style of the paper.
缺点
-
I am not sure if I missed anything, but I feel the proof of the main theorem is somewhat inconsistent. First, the authors assume the graph has edges, implying a constant degree for the graph. However, in the proof, the authors leverage a -path graph, where a hub node connects to every node in the graph. In this situation, the assumption no longer holds, as the hub node has a degree of .
-
Following the above point, in Theorem 1, it seems that we assume constant memory for each embedding. However, if there is a hub node with a degree of , the total memory leveraged to absorb the hub node's information will be for edge-based GNNs, whereas for node-based GNNs, we only have constant memory. This difference could explain why the authors achieve their final proof result. However, this comparison is unfair and not realistic for real-world applications. If a real-world graph has an approximately constant average degree, I believe that by increasing the memory by a constant factor for node-based GNN models, they could achieve the same theoretical power as edge-based models. On the other hand, if the graph has a degree around , the edge-based GNN model would use significantly more memory, which becomes impractical if is extremely large.
-
In the real-world experiments, although the absolute metrics of edge-based GNN models are better, the improvement is marginal. Considering the extra memory used by the edge-based GNN model to store embeddings, I question the practical value of this approach.
-
Regarding the synthetic task, I wonder if the authors constrained the total memory available to node-based and edge-based models. For example, suppose a graph with nodes and edges, where the embedding sizes for node-based and edge-based models are and , respectively. We should have for each GNN layer to ensure a fair comparison.
问题
See above. Please correct me if I misunderstand it.
We thank the reviewer for their time and effort, and for appreciating our study of an “under-investigated” model and our “well-structured” writing. We address the reviewer’s comments below:
- “I am not sure if I missed anything, but I feel the proof of the main theorem is somewhat inconsistent. First, the authors assume the graph has O(N) edges, implying a constant degree for the graph.”
Indeed, the graph has O(1) average degree, and it has O(n) maximum degree. These facts are consistent with each other, and not even unnatural in real-world settings where graph degrees often roughly follow e.g. Zipfian distributions.
- “Following the above point, in Theorem 1, it seems that we assume constant memory for each embedding…If a real-world graph has an approximately constant average degree, I believe that by increasing the memory by a constant factor for node-based GNN models, they could achieve the same theoretical power as edge-based models”
If the graph has constant maximum degree, then this statement is true (Proposition 3). If the graph has constant average degree, then this statement is false (Theorem 1) unless one allows a different amount of memory per node (i.e. scaling with the node’s degree), in which case one essentially recovers the edge model. However, non-uniformity in the dimensions of different node embeddings adds engineering challenges and hence computational overheads—motivating our work, which studies when and whether this overhead is worthwhile.
Finally, we remark that in Theorem 1, the constructed edge protocol does not use extra total memory compared to the node protocols, since the graph has constant average degree.
- “In the real-world experiments, although the absolute metrics of edge-based GNN models are better, the improvement is marginal. Considering the extra memory used by the edge-based GNN model to store embeddings, I question the practical value of this approach."
Indeed, the improvement is marginal. This means that the common benchmarks may be too easy to witness dramatic performance gains, but the theoretical results and synthetic tasks (e.g. Table 2) demonstrate that settings with dramatic gains do exist. As with any architectural design choice, there is always a trade-off: in this case, there is a trade-off between computational overhead and performance/representational advantage.
The practical value of edge GNNs has already been robustly demonstrated; see the references on lines 64–67. The purpose of this work was not to redo those experiments. As discussed in the paper (lines 67–70, lines 449–450), the purpose of this paper is to ablate the effect of maintaining edge embeddings via rigorous theory and controlled experiments. From this perspective, we would argue that our experiments are sufficient evidence for our main point: the edge-embedding architectural choice, when disentangled from the (many other) design choices made in prior works, does not hurt performance (across many common benchmarks) and can significantly help (in theoretically-grounded synthetic experiments).
- “Regarding the synthetic task, I wonder if the authors constrained the total memory available to node-based and edge-based models.”
Yes. In the synthetic tasks, the graph is a tree, so E = N-1. We gave both models the same embedding dimension (d = 10). Thus, the comparison is fair: the only architectural change we made was placing embeddings on edges vs nodes (and giving the node GNN up to 2x the depth, which makes its high error an even stronger result).
We hope these clarifications are helpful. Please let us know if you have further questions!
Thank the authors for their detailed responses to my questions. Now I think I understand most parts of the story in the paper. However, I still have some follow-up questions.
maximum degree vs average degree
So basically, my understanding is that the edge-based model can theoretically have better representation power only when some hub node exists with a degree of . However, this time, the corresponding edge model will also need embeddings for encoding all edges linked to a single hub node. So, it is not about saving memory, but just how to allocate the memory making a difference (allocate memory to each edge but there are edges or allocate memory to a single hub node, which is not practical given fixed size embedding).
In Theorem 1, you first assume edges without mentioning the maximum degree need to be to make Theorem 1 to be true. This may secretly hide the trade-off mentioned above. I suggest the author add a discussion in the main text about it or explicitly mention it in the theorem to avoid confusion.
The purpose of this paper is to analyze the effect of maintaining edge embeddings via rigorous theory and controlled experiments.
I understand that the paper is not to propose a new architecture but to analyze the effectiveness of an existing model. However, a complete analysis should shed light on the further use case or practical value of the model. A good theoretical result should finally guide us on future model/architecture/data design and selection for practical challenges. For example, In what real-world tasks can the edge-based model be better than the node-based model? What type of graph data is more suitable for the edge-based model? How to balance the embedding dimension between the node-based and edge-based models (as the edge-based model at least introduces a constant factor of more embedding, I would expect a smaller embedding size to makethe edge-based model work. But is there a quantitive analysis to that)?
I have raised my score accordingly given the author's clarification on my first and second concerns. but the remained concern is that I feel the current theoretical results and experiments are too divorced from practical scenarios and the practical value of these results is somehow limited.
Dear Reviewer fq9o,
Thanks for your thoughtful follow-up!
Maximum degree vs avg degree:
Your understanding of the implications of Theorem 1 is basically correct, though we would add that it’s not necessary for the max degree to be O(n): for any bound d, a straightforward corollary of Theorem 1 is that edge embeddings yield better representational power (by a factor of sqrt{d}) even when the graph is restricted to have max degree d.
You are also correct that it’s not an issue of saving memory if we allow variable-dimension embeddings per node. But it’s much more practically reasonable to have fixed-dimension embeddings, and in this case the edge-based embeddings can (and do, in the construction from Theorem 1) save memory.
We will clarify above Theorem 1 that the max-degree in the construction is large, as you suggest.
The purpose of this paper is to analyze the effect of maintaining edge embeddings via rigorous theory and controlled experiments:
We find all the questions the reviewer raises interesting! It would be great to have fine-grained understanding of properties of data different architectures help with—even better if we can have quantitative guidance. These are difficult questions that seem unlikely to be resolved in a single paper. For example, though we have gained some mathematical understanding about how depth helps feedforward [1] and Transformer architectures [2], it’s unclear we have precise quantitative guidance on how to choose depth given a particular dataset of choice—despite years of work on the problem.
The guidance of our theoretical results is largely qualitative: namely, that high-degree nodes may create representational issues for node-based models that could be alleviated by edge-based models. This may seem quite intuitive in retrospect, but prior to our work the machinery for even formally posing this question did not exist.
Given the conceptual scaffolding is now set from our paper, we hope more people will work on the problem, and a more fine-grained and quantitative understanding (including the effect of the task and not just graph topology) will be eventually gained.
Regarding the question about recommendations for “real-world tasks”: our paper is largely methods-centric and not application driven, but we did our due diligence by evaluating on a range of “standard” benchmarks. However, benchmarks are never exhaustive, and for GNNs in particular benchmarking is not yet a mature science — for example, many papers like [3] point out how the dataset collection procedure introduces undesirable artifacts that lead to evaluation biases. To us, one takeaway of our benchmark evaluations (where the separation between architectures is much smaller than in the synthetic “planted” experiment) is that common GNN benchmarks may be too easy to be useful discriminators for various architecture design choices. In fact, such sentiments have been echoed much more broadly than just GNNs, e.g. [4] in the domain of PDE solvers.
[1] Telgarsky, “Benefits of depth in neural networks.” COLT 2016.
[2] Sanford et al. “Transformers, parallel computation, and logarithmic depth” ICML 2024.
[3] Platonov et al., “A critical look at the evaluation of GNNs under heterophily: are we really making progress?” ICLR 2023.
[4] McGreivy and Hahim. “Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations”, Nature Machine Intelligence 2024.
Dear Reviewer fq9o,
Thanks again for your time reviewing our paper. We would greatly appreciate hearing if we have addressed your follow-up questions! Since the discussion period has been extended, we are happy to provide further clarifications if needed.
The authors investigate the conditions under which a local edge processing framework is better than the more common, node-wise message passing framework in the processing of graphs. They present theoretical results that indicate a limitation of the latter on certain tasks and under severe memory restrictions, whereas the local edge processing framework is more efficient in these tasks. What remains unclear is if the converse also holds, namely how are the performances of the local edge processing framework compete with classical message passing on tasks that are more suited to the node-centric view.
The authors also show that the two paradigm are almost equivalent, up to a round of “message passing”, if no memory constraints are imposed. Finally, the authors support the theoretical findings with experiments that show the potential benefits of the local edge processing framework.
优点
The authors rightly point out that theoretical contributions in the graph machine learning field are quite scarce if one leaves the “expressiveness” bubble. Contributions like this one are very welcome, especially when they bridge the gap with a set of well-known frameworks like LOCAL that were mostly ignored over the decades by the graph machine learning community.
The contributions of the paper are novel (to the best of my knowledge), and the author make us think about kinds of problems that are easier to solve with slightly different processing frameworks than classical message passing. This is a worthy contribution as it contributes to the development of a mostly stagnating field.
The empirical contribution focuses mostly on showing how the edge processing paradigm can lead to better performances in some synthetic and real world benchmarks. The authors’ intent is not to provide a full-scale empirical comparison nor to establish a better performing model, which is ok considering the theoretical contributions and nature of the paper. While one might argue that with more rounds of message passing GCN would have obtained the same result as Edge-GCN in Table 1, but the assumption here is that GCN had been properly cross-validated by the authors of Dwiedi et al. [*]
Practically speaking, the results of Section 7 are the most realistic and relevant, as they establish an equivalence between the two methodologies up to an extra round of message passing, where the rather unrealistic memory restrictions of the previous sections are relaxed. This might open to new ways of studying the nature of node and edge embeddings under different families of graphs, which looks like an interesting research direction.
[*] actually we know this not to be true, since Tönshoff et al, 2023 showed how GCN could achieve much better results when properly cross-validated. The reviewer is assuming the authors performed a thorough model selection to identify the best number of hidden neurons and graph convolutional layers.
Finally, I did not check thoroughly the proofs in the Appendix, but overall the technical contribution of the paper seems valuable to the graph machine learning community.
缺点
Although one might argue that the memory assumptions of the Sections up to 6 are quite unrealistic, and therefore some results might not be of great interest after all, the biggest weakness of the paper is definitely the presentation and its organization.
In general, it is extremely difficult not to get lost. The organization of the paper seems to reflect more the thought process of the authors, instead of developing a narrative that helps the reader to follow. Truth be told, there seems to be an effort of the authors in presenting the same topics in an informal way before moving to the details, but in doing so the authors achieved two unintended effects: 1) duplicating a lot of information, some of which is also inconsistent (for example O(1) in lines 137-138 becomes O(log n) in lines 381-382); and 2) not having the space to explain the theoretical and empirical setups in detail. For instance, proof sketches take entire pages of the main paper, and the proofs in the appendix are not easy to follow, so merging them into a unique proof in the appendix might be useful; most empirical details of the experiments are missing from the paper/appendix (hyper-parameters, evaluation procedures, anything that might help to reproduce results without accessing the code) except for partial information in Appendix E. Notation is also extremely verbose, and there would have been tons of ways to improve their readability.
Another problem is that the introduction of the paper does not have a proper structure. It is generally not a good idea to introduce message passing in an introduction for a general reader (being the paper submitted to ICLR and not LoG), and notation is introduced before its time in Section 4.
The paper makes a few overstatements, claiming that the edge framework is more expressive, then it is stated that it is “at least as expressive”, and in Section 7 it becomes “equivalent” when memory constraints are relaxed. My recommendation is to take particular care when making statements about expressiveness that cannot be proven.
Overall, it is my current belief that the structure of the paper needs to be substantially revised and the paper resubmitted. The contributions are certainly valid but the authors should improve the accessibility of the paper if they want it to have success and spread through the graph community and perhaps beyond it. I will not propose a full reject, though, because the contribution has technical merits.
问题
- Are there more practical/real-world problems where you envision a particularly successful application of the edge framework compared to classical message passing?
- Do you think it would be better to replace "symmetry" with "weight sharing"? Symmetry might refer to group theory concepts being applied to GNNs.
- Could the authors at least try to revise the notation during the rebuttal to make it less verbose and more readable?
I acknowledge the authors' response, which shows complete disinterest in taking my comments seriously, disregards my effort spent reviewing the paper and trying to give an objective but constructive perspective on the work. The authors do not seem to perceive the rebuttal as a place to discuss how to improve the paper and get it accepted but rather as a "battle arena".
I will nonetheless continue with my reviewer's duties and provide further comments below:
- The fact that the other reviewers found a high-level summary of the results readable does not necessarily mean (statistically speaking) that the paper is well structured or organized. The other reviewers, due to the same time constraints, might have decided to focus more on empirical aspects (indeed, see the general response of the authors) and decided not to focus too much on how the theoretical concept have been conveyed. I have read many theoretical papers whose presentation elegantly mixes theoretical results and high-level discussions without the redundancy of the manuscript under review, and for this reason I think that the manuscript could have much improved. So no, this is not "common practice" in theoretical papers.
- I think the authors are underestimating ICLR reviewers, who are well aware that rigor is necessary. Following previous papers does not guarantee that the authors' choice of notation is the best out there. More importantly, verbosity may imply rigor but the converse is not true. For instance, in most parts of the paper the subscript can be removed as clear from the context without extreme loss of rigor (the authors could just write a note in the paper); putting some effort in reducing the verbosity of terms like would not hurt readability as well. These are just a few of many tricks that would not break rigor but reduce verbosity.
- I would not dare to underestimate the ICLR audience. I simply recognize good practice: a manuscript submitted to a general ML venue should be written accordingly for a general ML audience, and it is the reviewer's job to make sure this happens. Not all readers of ICLR proceedings know about graph machine learning, in fact I know quite a few myself. The authors' comment is presumptuous and unprofessional to say the least.
- I was referring to the assumption of O(1) bits per node processor. It is likely I did not understand something, but this is not something that happens in practice, where embeddings can have much higher dimensionality and does not require to go to O(n).
Questions:
- Lines 64-67: thank you for the clarification. What I can see is a list of application areas, not a specific set of use cases where the inductive bias of the edge-processor is helpful. As for previous answers, the "this is the list of previous works" is not a valid answer to my question. I was expecting a more fine-grained discussion on real-world problems, perhaps in the domain of "molecular property prediction" where predicting the edge information is necessary.
- I still believe that the "connection" might be the source of confusion, but I will not argue further about it.
- See comment above.
Overall, I do not think the authors gave me convincing reasons to believe that my arguments are incorrect. I will therefore keep my score.
Dear Reviewer YXsg,
We apologize for the tone our rebuttal conveyed; we should have thought more carefully about our phrasing. We assure you, we are hugely appreciative of the time and effort you put into reviewing our paper. In order to engage with your constructive suggestions (and to indicate our respect for your effort in reviewing our paper), we have uploaded a revision making a host of changes (the changes are in blue, to make them easier to track):
Introduction. We have edited the introduction so that the message-passing equations are not introduced until Section 4, replacing them with informal descriptions of node embeddings vs edge embeddings, that we hope may be more accessible to readers unfamiliar with GNNs.
Consistency. We have removed or edited phrases such as “at least as expressive” and “more expressive” that were informal and not backed by formal proofs; we also fixed the typo pointed out by the reviewer.
Proof sketches. In an attempt to make the proofs in the appendix easier to follow, we have significantly expanded the explanatory text within the proofs (e.g. with intuition about the application of Lemma 2 in Theorem 1, and the construction of the edge-based protocol for Theorem 1).
Notation. We have suppressed the subscript of G whenever clear from context. We have also made a concerted attempt to simplify the notation I({u,v}) to I(e) where possible (e.g. Definition 6; unfortunately it’s not possible in Definition 5). For verbose proofs such as that of Theorem 5, we have introduced auxiliary notation which shortens and hopefully clarifies long chains of equations.
We hope that this revision helps address some of your concerns with the presentation of the paper, and if you have other concrete suggestions we would be deeply appreciative.
Dear Reviewer YXsg,
Thanks again for your time reviewing our paper! We apologize again for the tone in our original response, and we hope that the paper revision addressed some of your concerns, as detailed above.
I thank the authors for their response. I have had a look at the paper and the minor changes do not justify a score increase, but they are on the right direction in my opinion.
We thank the reviewer for their time and effort. We are glad that the reviewer found our contributions “worthy” and “very welcome”, and for appreciating our attempt to “bridge the gap” with well-known frameworks that have received little attention in the graph machine learning community. We address the reviewer’s comments below:
- “The organization of the paper seems to reflect more the thought process of the authors, instead of developing a narrative that helps the reader to follow.”
We regret that the reviewer found the exposition confusing. Since the other reviewers found that the paper is “well-structured”, “clear and accessible”, and that the writing “helps the readers quickly grasp the main ideas”, we are not sure how to respond.
(1a) “there seems to be an effort of the authors in presenting the same topics in an informal way before moving to the details”
This is common practice in theoretical papers, particularly in notationally heavy papers such as ours: Section 2 discusses our results in English, without the burden of formality, and subsequent sections make the results mathematically rigorous.
We are happy to add additional proof sketches in the appendix, but removing proof sketches from the main body seems counterproductive for non-theory readers. We are also happy to add details about e.g. hyperparameter sweeps to the appendices.
(1b) “Notation is also extremely verbose, and there would have been tons of ways to improve their readability.”
Notation is a necessary evil for theoretical rigor. We have tried to follow notation in prior works as closely as possible, see e.g.:
- (Xu et al., 2019) “How Powerful Are Graph Neural Networks?”
- (Huang and Villar, 2021) “A Short Tutorial On The Weisfeiler-Lehman Test And Its Variants”
As such, we believe our notation is as simple as possible, but if the reviewer has concrete suggestions we are happy to adjust.
(1c) “It is generally not a good idea to introduce message passing in an introduction for a general reader”
We believe that the reviewer is underestimating the ICLR audience.
- “O(1) in lines 137-138 becomes O(log n) in lines 381-382”
Thanks for pointing out this typo. We will update lines 137-138 to say O(log n).
- “One might argue that the memory assumptions of the Sections up to 6 are quite unrealistic”
What is the argument for this? We would argue that memory limitations are highly realistic. In large graphs it would be completely prohibitive to store O(n) memory per node. Space constraints have been a fundamental consideration in computer science and machine learning for decades.
Next, we address the reviewer’s questions.
- “Are there more practical/real-world problems where you envision a particularly successful application of the edge framework compared to classical message passing?”
On lines 64–67, we discuss the extensive list of real-world applications where the edge GNN framework has already found success. Do these answer the reviewer’s question?
- “Do you think it would be better to replace "symmetry" with "weight sharing"? Symmetry might refer to group theory concepts being applied to GNNs.”
“Symmetry” is the most common terminology in the theoretical GNN literature. The connection to group theory is not accidental, since symmetry constraints imply e.g. that the computation respects graph homomorphisms.
- “Could the authors at least try to revise the notation during the rebuttal to make it less verbose and more readable?”
Again, we believe that the notation is already as minimal as possible without sacrificing rigor. However, if the reviewer has any concrete suggestion we would be happy to discuss.
The paper introduces the significance of edge embeddings in GNNs to improve understanding of GNN architecture design. The authors present theoretical insights indicating that edge embeddings can allow for shallower network architectures in specific graph topologies by considering memory constraints, thus offering a fresh perspective beyond traditional invariance-based views. Empirical results demonstrate that GNNs utilizing edge embeddings consistently outperform those based solely on node features, particularly in graphs featuring hub nodes. Additionally, the paper proposes a novel method for layer parameterization using edge embeddings, establishing its expressiveness compared to conventional approaches. The authors highlight the computational costs of using edge embeddings in dense graphs and suggest future research directions aimed at developing more efficient edge-based architectures that retain their advantages.
优点
-
The paper presents a novel perspective on the comparative advantages of edge-based architectures in GNNs, contributing fresh theoretical insights to an evolving field. The separation results are particularly noteworthy, addressing a significant gap in the understanding of message-passing protocols.
-
Theoretical rigor is evident throughout the paper, with clear definitions, theorems, and proofs that are well-structured and logically sound. The methodology for proving depth separation between node and edge message-passing protocols is a solid contribution to the literature.
-
The writing is generally clear and accessible, effectively communicating complex ideas in a manner that is understandable for readers familiar with GNNs.
缺点
-
While the theoretical contributions are substantial, the empirical validation relies on a relatively small set of benchmarks and synthetic tasks. Expanding the experiments to include more diverse datasets and real-world applications would strengthen the claims made about the edge-based architecture's advantages.
-
The paper primarily focuses on comparing two architectures (node-based and edge-based) without considering hybrid approaches or other recent advancements in GNN architectures that might offer competitive performance. A broader comparative analysis could provide a more comprehensive view of the landscape.
-
While theoretical insights are valuable, the paper could benefit from a deeper discussion on the practical implications of adopting edge-based architectures. Addressing potential challenges in implementation, particularly regarding scalability and computational resources, would provide a more rounded perspective.
-
The exploration of symmetry constraints in Theorem 5 is interesting, but the practical relevance of this focus could be questioned. Clarifying how these theoretical results translate into actionable insights for GNN design would enhance the paper's impact.
问题
-
Could the authors elaborate on the intuition behind the theoretical separation results for edge and node message-passing protocols? Specifically, what practical implications arise from the hub node bottleneck, and how might this influence GNN architecture decisions?
-
How generalizable are the findings in Theorems 1 and 4 to other types of graphs beyond the specific constructions discussed? Are there certain classes of graphs where edge-based protocols are expected to perform particularly poorly, or vice-versa?
-
Could you consider incorporating visual representations of key concepts to enhance reader understanding and engagement in the main content?
伦理问题详情
N/A
We thank the reviewer for their time and effort. We are glad they find our results “noteworthy” and “contributing fresh theoretical insights”, and that our proof methodology is “a solid contribution to the literature”. Moreover, we are glad they found our paper to be “effectively communicating complex ideas”. See below for our responses to the reviewer’s comments:
- “While the theoretical contributions are substantial, the empirical validation relies on a relatively small set of benchmarks and synthetic tasks. Expanding the experiments to include more diverse datasets and real-world applications would strengthen the claims made about the edge-based architecture's advantages.”
We would argue that our empirical validation—five common benchmarks, and two carefully-designed synthetic tasks, along with a fairly extensive sweep over hyperparameters for each considered architecture—is not “relatively small”. Please note that the purpose of this paper is not to propose a new architecture. Edge GNNs have already seen robust successes in many real-world settings (lines 64–67), well beyond common benchmarks.
As discussed in the paper (lines 67–70, lines 449–450), the purpose of this paper is to ablate the effect of maintaining edge embeddings via rigorous theory and controlled experiments. From this perspective, we would argue that our experiments are sufficient evidence for our main point: the edge-embedding architectural choice, when disentangled from the (many other) design choices made in prior works, does not hurt performance (across many common benchmarks) and can significantly help (in theoretically-grounded synthetic experiments).
- “The paper primarily focuses on comparing two architectures (node-based and edge-based) without considering hybrid approaches or other recent advancements in GNN architectures that might offer competitive performance. A broader comparative analysis could provide a more comprehensive view of the landscape.”
To reiterate a previous point, the point of our paper is not to introduce a new architecture, but rather to isolate one important design choice (edge vs node embeddings) and understand its impact. To this end, simplicity is a virtue. A “broader comparative analysis” would be counterproductive, and even risks producing spurious scientific findings.
- “While theoretical insights are valuable, the paper could benefit from a deeper discussion on the practical implications of adopting edge-based architectures. Addressing potential challenges in implementation, particularly regarding scalability and computational resources, would provide a more rounded perspective.”
We do point out in the conclusion that edge GNNs have computational overhead, and developing more efficient architectures that do not sacrifice the representational advantages is an interesting direction for future research. We are happy to expand this discussion if the reviewer wishes.
- “The exploration of symmetry constraints in Theorem 5 is interesting, but the practical relevance of this focus could be questioned. Clarifying how these theoretical results translate into actionable insights for GNN design would enhance the paper's impact.”
Symmetry constraints are the standard theoretical lens for studying GNNs in prior literature; see e.g. the ICLR 2019 paper “How Powerful Are Graph Neural Networks” by Xu et al., which currently has over 9000 citations.
But the reviewer is not wrong: in fact, the purpose of Theorem 5 was precisely to show that the lens of symmetry constraints alone may not be the right theoretical tool for principled GNN design, since it fails to adjudicate the distinctions between edge and node embeddings (lines 152–153).
To address the reviewer’s questions:
- “Could the authors elaborate on the intuition behind the theoretical separation results for edge and node message-passing protocols? Specifically, what practical implications arise from the hub node bottleneck, and how might this influence GNN architecture decisions?”
As the reviewer alludes to, the basic intuition is that hub nodes serve as bottlenecks to information flow for node-based protocols, but not for edge-based protocols which essentially have “more scratch space” near the hub node. This is made formal using techniques inspired by time-space tradeoffs and distributed computation lower bounds, as discussed in Remark 9. We hope that our results may inspire more efficient architectures that avoid the representational limitations of node GNNs, perhaps by adding more embeddings near bottlenecks, but we leave this interesting research question to future work.
- “How generalizable are the findings in Theorems 1 and 4 to other types of graphs beyond the specific constructions discussed? Are there certain classes of graphs where edge-based protocols are expected to perform particularly poorly, or vice-versa?”
Intuitively, we expect that edge GNNs will be better in graphs with hub node bottlenecks. Lemma 2 is a quite general statement, and a separation arises whenever there is an edge-based protocol that “beats” the lower bound from Lemma 2.
From a representational perspective, it’s not hard to show that edge GNNs cannot be worse than node GNNs. We believe it unlikely that there is a clean, complete characterization of when edge GNNs are strictly better, due to the similarity between this question and deep questions in communication complexity and circuit complexity.
- “Could you consider incorporating visual representations of key concepts to enhance reader understanding and engagement in the main content?”
If the reviewer has any concrete suggestions of useful visualizations, we are happy to discuss.
Thank you for your thoughtful response. I appreciate the time and effort you've invested in addressing my questions. However, I have a few additional points I would like to clarify.
While I understand that the goal of the paper is to isolate the effect of edge embeddings, and that the chosen benchmarks and synthetic tasks are tailored to support this investigation, I would like to raise a point for further clarification regarding the types of tasks evaluated in the experiments. Specifically, I noticed that the experiments cover graph classification and regression tasks (e.g., ZINC, Peptides-func, Peptides-struct) as well as image classification tasks (MNIST, CIFAR-10). However, link prediction tasks, which are often seen as a natural fit for edge-based models, do not appear to be included. Since edge embeddings are particularly suited to modeling the relationships between nodes, I would expect link prediction to be a highly relevant task for evaluating edge-based architectures. Could you provide insights into why this task was not included?
Regarding the suggestion of incorporating visual representations, I appreciate your openness to this idea. In particular, I was thinking of diagrams that could help illustrate the following concepts:
- The intuition behind the "hub node bottleneck" and how it impacts the performance of node-based versus edge-based protocols.
- A visual summary of the main theoretical results (e.g., from Theorems 1, 4, and 5), potentially including a depiction of the performance trade-offs between the two architectures.
Would you be able to provide any suggestions or examples of visualisations that could make these concepts more accessible and engaging to readers? I believe that well-designed diagrams could make the theoretical ideas much clearer and more intuitive.
Dear Reviewer eeGn,
Thanks for your follow-up!
About link prediction tasks: for our benchmark evaluations, we wanted tasks that can be naturally implemented by both edge-based and node-based architectures, and where the only salient architectural difference is “edge-based vs node-based”. Implementing link prediction with a node-based architecture requires an extra step at the end where each edge collates its prediction from the embeddings of the incident nodes. The choice of collation function is an important design element for which there is no “canonical” choice (e.g. [1] uses a sigmoid function, [2] uses a Fermi-Dirac decoder, etc.). Since an edge-based architecture would not have this collation step, this introduces a confounder into any comparative experiment.
Additionally, in line with the message of the paper, we wanted to show that even in benchmarks in which the only benefit of the edge-based architecture is computational, rather than the task being a more “natural” fit for edge-based architectures—there’s still benefits from edge embeddings.
About diagrams: Thanks for the suggestions! In the revision we just uploaded, we have moved a figure illustrating the construction for Theorem 1 which was previously in the appendix to the main body. We have also added a new figure (to the appendix, due to space constraints) that visually illustrates the main intuition behind Theorems 1 and 4: namely, how hub nodes lead to an “information bottleneck”.
We hope that these address your remaining concerns – if so, we would greatly appreciate if you wish to increase your score!
[1] Kipf and Welling, “Variational Graph Auto-Encoders”. NeurIPS Bayesian DL Workshop, 2016.
[2] Chami et al., “Hyperbolic Graph Convolutional Neural Networks”. NeurIPS 2019.
Dear Reviewer eeGn,
Thanks again for your time reviewing our paper. Please let us know if you have any remaining concerns! Since the discussion period has been extended, we are happy to provide further clarifications if needed.
This paper explores the architectural benefits of graph neural networks (GNNs) that maintain and update edge embeddings, highlighting both theoretical and empirical advantages. Theoretically, under certain computational and memory constraints, edge-embedding-based architectures can achieve greater efficiency, solving some graph tasks with fewer layers compared to node-based models. Empirically, these architectures often outperform node-only models in graph structures with hub nodes.
优点
1.The paper is well-structured, with clear explanations of the theoretical models and empirical methodologies. 2.The paper provides proofs showing that edge-based GNNs can achieve better depth/memory trade-offs than node-based GNNs under certain conditions. 3. Cases where edge embedding substantially help is interesting
缺点
-
For dense graphs, maintaining edge embeddings requires substantial memory and can introduce considerable computational overhead, which may limit the scalability of edge-based architectures.
-
Although the paper provides insights into the role of edge embeddings into graphs with hub nodes. The paper generally lacks practical guidance on which types of graphs would benefit most from edge embeddings. Providing examples or criteria for when edge embeddings are advantageous would enhance its applicability.
-
With more depths, node embedding can simulate edge embeddings. Would be good to see such experiments and memory consumption comparison
问题
please see weaknesses
We thank the reviewer for their time and effort, and we are glad they appreciate our exposition and “insights into the role of edge embeddings into graphs with hub nodes”. See below for responses to the reviewer’s comments:
- “For dense graphs, maintaining edge embeddings requires substantial memory and can introduce considerable computational overhead, which may limit the scalability of edge-based architectures.”
Indeed, we are not suggesting that edge GNNs are a “one-size-fits-all” solution to graph machine learning. As we point out in the conclusion, we agree that on dense graphs, edge GNNs incur significant computational overhead over node GNNs. Despite this overhead, edge GNNs have found various applications in practice (see lines 64–67).
In any architectural design choice (even beyond just graph machine learning), there are inevitable trade-offs. Our theoretical results demonstrate potential representational advantages of edge GNNs—which, in any application, must be weighed against the potential computational drawbacks.
Finally, we remark that our separations hold on graphs with O(n) total degree, i.e. the representational advantages of edge GNNs can arise even in an “apples-to-apples” comparison where the total memory of the edge-based protocol (and the parallelized runtime) is comparable to that of the node-based protocol.
- “Although the paper provides insights into the role of edge embeddings into graphs with hub nodes. The paper generally lacks practical guidance on which types of graphs would benefit most from edge embeddings. Providing examples or criteria for when edge embeddings are advantageous would enhance its applicability.”
“Providing examples or criteria for when edge embeddings are advantageous” is exactly what our paper does ! Theorem 1 provides an example where edge embeddings are provably advantageous from a representational perspective. As discussed after Theorem 1, the intuition is that in a graph with high-degree “hub” nodes, these nodes may be a bottleneck to the flow of information for any node-based protocol, whereas they do not bottleneck edge-based protocols.
We complement this result with the observation that in graphs with low maximum degree (i.e. no “hub” nodes), there is no representational advantage (Proposition 3).
If the reviewer is asking for a complete characterization of which graphs or tasks are easier for edge-based protocols — we believe it is unlikely for a clean characterization to exist, given the connections between this problem and theoretically deep areas of research such as communication complexity.
- “With more depths, node embedding can simulate edge embeddings. Would be good to see such experiments and memory consumption comparison”
We prove that with more memory, node embeddings can simulate edge embeddings (Proposition 3). With the same memory but much more depth, it is an open theoretical question whether node embeddings can simulate edge embeddings.
Our first synthetic experiment does address this question empirically: Table 2 shows that even if the node GNN is given 2x the depth, it cannot simulate a planted edge-based protocol (whereas the edge GNN does learn this planted protocol).
If the reviewer has a concrete experimental setup in mind that differs from this one, we would be happy to discuss it.
Dear Reviewer 59h4,
Thanks again for your time reviewing our paper. Please let us know if you have any further questions! Since the discussion period has been extended, we are happy to provide further clarifications if needed.
Thank you for your response.
I would like to encourage the reviewers to engage with the author's replies if they have not already done so. At the very least, please acknowledge that you have read the rebuttal.
The paper studies the benefits of edge embeddings in GNNs. The goal is not to introduce a new architecture, but rather to understand the (theoretical) benefit of edge embeddings which have shown empirical improvements over GNNs that only use node embeddings (albeit only marginal improvements, and only on certain tasks). The key insight is that hub nodes induce an information bottleneck. The findings indicate that edge embeddings can be provably beneficial assuming certain topologies and memory constraints (which where debated).
While the theoretical contributions seems to be solid (based on the reviews without reading the paper myself in detail) it seems that it would benefit from incorporating the reviewers's feedback before being ready for acceptance. I am in agreement with 3/4 reviewers that the paper is marginally below the acceptance threshold. For future iterations, including an in-depth discussion on which real-world graphs (and tasks) match the assumptions in the theory (at least approximately) would strengthen the paper as mentioned by one of the reviewers.
As a side node: I find it surpursing that there is no discussion on the connections to the problem of oversquashing.
Finally, I would like to advise the authors to be more careful in the future with accusations towards the reviewers, especially when it comes to the question whether some text is "LLM-generated" or not. The reviews seem genuine to me. Checker tools are unreliable, indeed the authors' reply, the abstract of the paper, and the reviews were all flagged as "generated" in all checkers that I tried, which is a clear false positive. Moreover, as acknowledged by the authors the tone of some of the responses can be improved.
审稿人讨论附加意见
The reviewers and authors discussed whether the memory constraints are "realistic" or not. While the authors made some good counter points, I tend to agree with the reviewers. In any case, the discussion shows that the issue is more nuanced that it might seem on a first glance, and the paper should be updated to reflect this. There was a disagreement between the reviewers regarding the clarity, structure and readability of the paper.
Reject