On Explaining Equivariant Graph Networks via Improved Relevance Propagation
摘要
评审与讨论
The paper introduces a novel method called EquiGX aimed at enhancing the explainability of equivariant graph neural networks (GNNs) specifically designed for 3D geometric graphs via the deep Taylor decomposition framework. In the initial version, the authors used incorrect notations (e.g., the relevance score for spherical harmonics in Eq. (7)), which led to a big misunderstanding, so I questioned whether the model might violate equivariance. This has been clarified during the discussion, and I have raised my score from "clear reject (1)" to "clear accept (4)".
给作者的问题
See review above.
论据与证据
Yes.
方法与评估标准
Yes.
理论论述
After resolving the symbol misunderstanding, no other theoretical problems were found.
实验设计与分析
Yes.
补充材料
I checked the demo notebook provided by the author during the rebuttal.
与现有文献的关系
The subject of this article is "Equivariant Graph Networks", and the article only discusses TFN. After discussion, the authors agreed to open a "Future Work" section to discuss the following models:
- Invariant Models: SchNet, SphereNet, ComENet
- Scalarization-based Models: EGNN, SaVeNet, LEFTNet
- Tensor-product-based Models: EquiformerV2, MACE, PACE
- Spherical-scalarization Models: SO3KRATES, HEGNN, GotenNet
SchNet: A continuous-filter convolutional neural network for modeling quantum interactions, NIPS'17.
SphereNet: Spherical Message Passing for 3D Molecular Graphs, ICLR'22.
ComENet: Towards Complete and Efficient Message Passing for 3D Molecular Graphs, NeurIPS'22.
EGNN: E(n) Equivariant Graph Neural Networks, ICML'21.
SaVeNet: A Scalable Vector Network for Enhanced Molecular Representation Learning, NeurIPS'23.
LEFTNet: A new perspective on building efficient and expressive 3D equivariant graph neural networks, NeurIPS'23.
EquiformerV2: EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations, ICLR'24.
MACE: MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields, NeurIPS'22.
PACE: Equivariant Graph Network Approximations of High-Degree Polynomials for Force Field Prediction, TMLR'24.
SO3KRATES: A Euclidean transformer for fast and stable machine learned force fields, NC'2408.
HEGNN: Are High-Degree Representations Really Unnecessary in Equivariant Graph Neural Networks?, NeurIPS'24.
GotenNet: Rethinking Efficient 3D Equivariant Graph Neural Networks, ICLR'25.
遗漏的重要参考文献
The calculation of the relevance scores for spherical harmonics in Eq. (7) is very similar to the core formula of spherical-scalarization models (e.g. SO3KRATES, HEGNN, GotenNet), which may explain why this method makes sense.
其他优缺点
All the problems have been solved, which are mainly due to misunderstandings caused by symbols. Therefore, I recommend that the authors revise their writing using the symbol system of HEGNN and GotenNet to avoid readers' misunderstanding of scalars and tensors. The authors said that they will accept this suggestion in the subsequent revision of the manuscript.
Although ICML does not allow submission of new PDFs, I choose to believe that the authors will make appropriate changes.
其他意见或建议
- [Line 76, right] in Eq. (2) should be .
- [Line 205, left] should be .
We thank Reviewer 6J4F for comments on the paper. We have provided pointwise responses below.
According to the formula in this paper, the contribution of the atom may change with the reference frame in which the molecule is observed.
We believe there is a misunderstanding. The node explanations by EquiGX are invariant to rotations. In other words, the node importance scores from EquiGX remain unchanged when the input molecule is rotated by a random rotation matrix. Therefore, EquiGX preserves the equivariance of the network. We have also conducted experiments to verify and confirm this invariance.
Can EquiGX be extended to other equivariant models?
EquiGX can be readily extended to other spherical equivariant models, such as EquiformerV2, MACE, and PACE, which also rely on tensor product operations. However, due to the significant computational cost, we leave such extensions for future work. We would also like to point out that models like SphereNet and ComENet are different, as they operate solely on invariant features without incorporating equivariant ones, and thus do not rely on tensor product operations. We believe these models are beyond the scope of this paper.
In line 205, should be In Eq 2, should be .
Thanks for pointing out the typos. We will update the manuscripts.
I appreciate the authors' responses, but they were very brief and I was not able to resolve my queries at all.
In addition, I would like to ask other reviewers to check if the authors' model is equivariant. I don't see its equivariance at all, but other reviewers have not commented on this. I tried to implement the author's formula, and the result is still not equivariant.
R1. About Equivariance.
The authors claim that "node explanations by EquiGX are invariant to rotations", but I don't see any proof of equivariance/invariance. I would like to point out that since the authors' dataset seems to contain a variety of poses, the model may have learned equivariance driven by the data (rather than from the architecture). Therefore, the experimental verification of equivariance is not convincing.
I think the following should be added (all of them):
- Rigorous theoretical proof, especially for all formulas using Hadamard multiplication and division.
- Equivariant loss results for randomly rotated inputs tested on untrained models.
- Open source code for reproducing the results.
Since my own implementation shows that the algorithm is not equivariant, I am not sure if there is some omission in my implementation. I urge the authors to open source their code to check reproducibility, or a notebook demo.
R2. About More Models to Analysis.
I think it is unreasonable that the author claims that generalization is easy, but refuses to add analysis of other models in this paper. Especially in fact, scalarization-based models such as EGNN, LEFTNet and spherical-scalarization models such as HEGNN and GotenNet are also special cases of tensor products (and invariant models, '0e'x'0e'->'0e' is of course a special case of tensor products, see e3nn). By analogy, the author casually said that they can expand the analysis, but in fact it is not convincing.
Considering the "Equivariant Graph Networks" in the title of the paper, I think it is very inappropriate that the paper only includes the analysis of a single TFN model.
[2nd Updated] Oh, I suddenly found a way to add replies, I hope the authors can notice it. It seems that everyone can directly modify the second reply to achieve interaction.
R3. New Guesses and Suggestions
I re-conjectured according to Reviewer V1gi's suggestion. If I understand correctly, here represents a scalar coefficient about th-degree steerable features, which is actually an invariant.
So, at this time, let me take the last line of Eq. (7) as an example to judge, the situation is as follows: the two scalars are used in Hadamard division, so the result they get is still a scalar. The latter uses Hadamard product, omitting the summation sign mentioned by the author in the reply, so it is actually the inner product of the two, similar to Eq. (6) in HEGNN and Eq. (11) in GotenNet.
So, the whole formula should actually be described as follows:
\mathcal{R}(Y^{(l\_2)}(\vec{\boldsymbol{r}}_\{ij})=\sum\_{l\_3}\left(\frac{1}{3}\mathcal{R}(\boldsymbol{M}\_{i\to j}^{(l_3)})\oslash \boldsymbol{M}\_{i\to j}^{(l_3)}\right)^{\top}\left\langle\frac{\partial \boldsymbol{M}\_{i\to j}^{(l_3)}}{\partial\tilde{\boldsymbol{v}}\_{ij}^{(l_2)}}, \tilde{\boldsymbol{v}}\_{ij}^{(l_2)}\right\rangle,where is an th-degree steerable features, and others are all scalars.
If so, then the problem is clear, but I am not sure if the authors think so. It seems that the main reason for such a big misunderstanding is that the author omitted the very important summation sign and the symbol system is very confusing. I think it is necessary to modify it.
If my speculation this time is correct, this is probably a promising way. Because this formula is actually the core formula of spherical-scalarization models, this may also explain to some extent why these new models are superior. I suggest that the authors can add more discussion in this regard.
I would raise my rating if the authors later confirm that my conjecture is correct and revise their formula.
Thank you for your response. We provide further clarification regarding your concerns below.
About Equivariance.
Thank you for the suggestion. We have provided a notebook demo to illustrate both the equivariance of the model and the invariance of EquiGX. The notebook is available at the following link: [https://anonymous.4open.science/r/EquiGX_Demo-5797/TestEquivariance.ipynb].
We would like to clarify that the model's equivariance is not learned from data, but is inherently guaranteed by the model's architectural design. In the demo, we use a randomly initialized model and compare the outputs for inputs that are randomly rotated. The difference in outputs is less than 1e-6, which is a negligible numerical difference. This confirms that the model is intrinsically equivariant, rather than learning equivariance from the dataset. Additionally, we demonstrate that the node importance scores computed by EquiGX are invariant to input rotations. In other words, the node importance scores remain unchanged when random rotations are applied to the inputs. This invariance is also verified on randomly initialized models, further avoiding any potential reliance on learned equivariance or invariance. Finally, we would like to emphasize that, when computing the node importance scores, we sum the relevance scores across the feature dimensions. Thus, the use of Hadamard multiplication and division does not break the invariance of the node importance scores. Overall, our experimental verification strongly supports the model's equivariance and EquiGX's invariance.
About More Models for Analysis.
We are open to extending EquiGX to additional backbone models. We also agree that invariant models can be seen as a special case of tensor products. However, the widely adopted implementations of these models are typically not based on tensor products via the e3nn library. Incorporating them as backbones would require re-implementing existing models using tensor product operations, which is a non-trivial task. In addition, training these models is time-consuming and would demand substantial engineering effort, making it infeasible within the limited timeframe of the rebuttal period. We hope the reviewer understands these constraints.
Explaining equivariant GNNs for 3D geometric graphs is challenging due to their complex architectures and the difficulty of handling positional data. Existing explainability (XAI) methods mainly focus on 2D graphs and struggle to adapt to equivariant GNNs. To address this, this paper introduces the EquiGX, a novel explanation framework based on Deep Taylor decomposition, extending layer-wise relevance propagation for spherical equivariant GNNs. It decomposes prediction scores and back-propagates relevance through each layer, providing detailed insights into how geometric and positional data influence model decisions. Experiments on synthetic and real-world datasets show that EquiGX effectively identifies critical geometric structures and outperforms existing baselines, offering significantly improved explanations for equivariant GNNs.
Update after rebuttal
The authors have solved most of my questions. I will maintain my score.
给作者的问题
See the weaknesses.
论据与证据
Yes
方法与评估标准
Yes
理论论述
See the weaknesses.
实验设计与分析
See the weaknesses.
补充材料
Appendix
与现有文献的关系
See the summary.
遗漏的重要参考文献
No
其他优缺点
Strengths Refer to the summary.
Weaknesses This paper presents an insightful analysis of LRP and its extension to equivariant GNNs. This is an interesting topic and the framework might be helpfu for other researches on equivariant GNNs. However, several details could be further refined for clarity and completeness.
-
Figure 1 is confusing. It is recommended to explicitly illustrate the cube motif and highlight the data points with high importance within the motif. Additionally, the meaning of "the other areas" in the figure caption should be clarified.
-
Clarification on importance calculation. The authors separate the layer-wise relevance into TP-based and invariant feature-based propagation. Are these components eventually merged? If so, what is the weighting scheme between equivariant and invariant features? Providing a clear explanation would enhance understanding.
-
Figures 3 and 4, while visually compelling, are somewhat unclear. For instance, in Figure 3, the authors state: "Since the sample is an all-beta protein, ideally the β-sheets should have high importance scores, i.e., be red in the figure." However, it is unclear which part of the visualization corresponds to β-sheets. Additionally, the LRI-Bern method results in an entirely red visualization—does this indicate it is the most effective? The authors should provide a more precise explanation.
其他意见或建议
No.
We are very glad Reviewer ffwb had a positive initial impression and appreciate your constructive comments. We provide pointwise responses below.
Figure 1 is confusing.
We apologize for the confusion. In Figure 1, the ground truth is shown in the upper-left corner. Nodes forming the cube motif are highlighted in red. Across all examples, node positions remain consistent, so better alignment with the ground truth reflects a more accurate explanation. In the figure caption, “the other areas” refers to nodes that do not belong to the cube motif. In other words, these nodes form a pyramid shape. We also highlight the nodes in blue in the ground truth.
Clarification on importance calculation.
As described in Section 3.2, for a single TP-based message passing layer, we decompose the relevance score of each message into three components, including the hidden features, the directional part, and the distance part. The relevance score attributed to the hidden features is further backpropagated toward the input. The relevance scores of the directional and distance components do not propagate beyond this layer. Within each message passing layer, the scores assigned to edge direction and edge distance capture their respective contributions to the final prediction. To compute overall contributions of edge directions and edge distances, we sum these directional and distance relevance scores across all layers. To obtain the final importance score for each input node, we combine the relevance score of the node itself with the relevance scores of its connected edges. In other words, the node-level explanation is calculated as the sum of a node’s own relevance plus half of the relevance of its neighboring edges.
Figures 3 and 4, while visually compelling, are somewhat unclear. It is unclear which part of the visualization corresponds to β-sheets.
Beta sheets typically appear as flat, arrow-shaped ribbons pointing in a specific direction, often aligned side-by-side to form sheet-like structures. In Figure 3, the regions with arrows represent the beta-sheets. For reference, the red arrows in this Wikipedia image illustrate their typical appearance. We will revise the manuscript to clarify this in the figure captions and text.
Additionally, we would like to point out that the entirely red visualization produced by the LRI-Bern method does not necessarily imply higher effectiveness. An ideal explanation should selectively highlight the beta-sheets in red, while keeping less relevant regions such as the coiled structure near the top and the thin, string-like lines in blue.
The paper proposes a new method, EquiGX, to explain equivariant GNNs. The method is based on Deep Taylor Decomposition and extends it to perform layer-wise relevance propagation for spherical equivariant GNNs. Specifically, the authors propose new rules to attribute tensor product operations. The experiments show that EquiGX performs better than existing baselines.
给作者的问题
- The experiments focus on graph-level prediction tasks. Why not include node-level or edge-level predictions?
- Why isn’t ActsTrack included in the qualitative evaluation?
论据与证据
The claims are well supported.
方法与评估标准
The synthetic and real-world datasets make sense and evaluation criteria are common choices for model explainability.
理论论述
Yes I checked the equations and didn't find any issue.
实验设计与分析
The experiments are well designed and the paper include qualitative and quantitative evaluations.
补充材料
No
与现有文献的关系
The paper extends LRP to 3D equivariant GNNs and proposes a new LRP rule for tensor product operations, which can be useful for other works that use equivariant GNNs or tensor product operations.
遗漏的重要参考文献
Not found.
其他优缺点
Strengths:
- The method is well-founded theoretically, and the derivation is clear.
- The writing is easy to follow.
- Including real-world protein datasets in the experiments is a nice addition.
Weaknesses:
- The method is limited to TP-based models.
- The source code is not provided.
- The limitations are not discussed.
其他意见或建议
The Shapes and Spiral Noise datasets are very similar, and their visualizations look a bit repetitive. It might be better to show the visualization of one and move the other to the appendix. The same applies to the protein datasets.
We are very glad Reviewer KEDe had a positive initial impression and appreciate your constructive comments. We provide pointwise responses below.
The method is limited to TP-based models.
We admit that our method is focusing on spherical equivariant GNNs (TP-based models) and leaving generalization to other architectures as future work. However, we would like to highlight that TP-based equivariant networks represent a broad and widely adopted family of models in the AI for science domain. Developing explainability techniques for these models is important, as it can enhance our understanding of existing architectures and potentially lead to their improvement.
The limitations are not discussed.
Thanks for the suggestion. One limitation of our method is its current focus on tensor product operations, specifically within spherical equivariant GNNs. While this represents a subset of equivariant GNNs, it is a significant and impactful area. Consequently, our method does not yet extend to all types of equivariant GNNs. However, this focus allows for a deep and thorough exploration of tensor product operations, and we plan to generalize our approach to other architectures in future work.
The Shapes and Spiral Noise datasets are very similar, and their visualizations look a bit repetitive. It might be better to show the visualization of one and move the other to the appendix.
Thanks for the suggestion. We would update the visualization during camera-ready stage.
The experiments focus on graph-level prediction tasks. Why not include node-level or edge-level predictions?
Evaluating explainability in equivariant networks is inherently challenging, largely due to the difficulty of obtaining meaningful and verifiable ground truth for explanations. In this paper, we address this by constructing synthetic datasets and carefully selecting real-world datasets where such ground truths are available. However, edge-level tasks like force prediction present additional complexity. While molecular dynamics can provide an approximate reference, the underlying rationale behind force prediction remains an open research question. Thus, we leave the exploration of node-level and edge-level tasks to future work.
Why isn’t ActsTrack included in the qualitative evaluation?
ActsTrack has an average of over 100 nodes per graph, and unlike proteins, it lacks a well-established visualization standard. Additionally, the ground-truth explanations are not immediately obvious, making clear and interpretable visualizations more challenging. As a result, including ActsTrack in the qualitative evaluation could lead to confusing visuals. We plan to include it in the paper later with more refined visualization methods.
The source code is not provided.
We plan to release our code during camera-ready stage.
The paper introduces EquiGX, an explanation method for (3D) equivariant GNNs. Existing graph explanation methods mainly focus on 2D GNNs and struggle to explain 3D GNNs. EquiGX extends Deep Taylor decomposition to derive layer-wise relevance propagation (LRP) rules for equivariant GNNs. The method is evaluated on synthetic and real-world datasets, outperforming existing explanation methods.
update after rebuttal
After reading the other reviewers' comments and engaging in discussions with them, I now recognize that this work makes a highly significant contribution toward explaining tensor-product GNNs. Its impact goes beyond simply clarifying how TP-GNNs make decisions—it also provides insight into why TP-GNNs outperform invariant GNNs. The perspectives offered in this paper represent a fundamental advancement in the field, with the potential for broader impact. In light of these considerations, I have raised my score from 3 to 4.
给作者的问题
- Equivariant GNNs, including the ones referred to in the paper e.g. Equiformer, SE(3)-Transformer, are conducted on widely-used molecular datasets such as QM9 and MD17. Is there a specific reason why the experiments on these datasets are not conducted?
- In the last paragraph of Sec. 2, the author claims that LRI treats the model as a black box and overlooks the equivariance of the model. Can the authors further explain this part? There are two methods, LRI-Gaussian and LRI-Bern, in the LRI paper, which one does this refer to? What does it mean for them to overlook the equivariance of the model?
- In both Equations 2 and 5, shouldn’t the be in ?
论据与证据
Yes, the main claims are all clear with convincing evidence.
方法与评估标准
Yes, the method and evaluation criteria make sense.
理论论述
I checked the derivations of the layer-wise decomposition in the main paper. I believe that they are correct.
实验设计与分析
Yes, I read through all of Sec. 4 (Experiments), and I don't see any issues.
补充材料
I briefly checked the Appendix.
与现有文献的关系
It is highly relevant to the field of AI for science.
遗漏的重要参考文献
Not that I’m aware of.
其他优缺点
Strengths:
- Significance: Though explanation methods for (2D) GNNs have advanced over the past few years, explaining equivariant 3D GNN models remains challenging. This is the first few works that target 3D GNN explanation. I appreciate this paper’s contribution in extending Deep Taylor decomposition to equivariant architectures, addressing the limitations of prior graph explanation methods for 3D geometric graphs.
- Relevance: Understanding how equivariant GNNs utilize geometric and positional information is crucial for their application in AI for science, especially in molecular modeling.
Weaknesses:
- Technical Contribution: This paper is an extension of the LRP framework that has been widely applied for vision tasks. Although tensor-product networks are different from normal MLPs or CNNs, the technical contribution of this work could be limited.
- Limited Backbone Models: Only one model, TFN, is tested. However, this could be due to the extensive computation requirement of equivariant GNNs.
其他意见或建议
- The overall presentation of this work could be improved, particularly by addressing issues such as the excessive white space around some equations.
- I do think it would be beneficial to put more effort into discussing and presenting convincing arguments as to why existing (2D) GNN methods are not suited for explaining 3D-equivariant GNNs.
Minor issues:
- Line 205, not
- Line 151, “One way to compute such relevance is to the whole neural network as a mathematical function and use the first-order term from the Taylor series expansion.” Should be “… is to treat the whole neural network as…”
We are very glad Reviewer V1gi had a positive initial impression and appreciate your constructive comments. We provide pointwise responses below.
This paper is an extension of the LRP framework that has been widely applied for vision tasks. Although tensor-product networks are different from normal MLPs or CNNs, the technical contribution of this work could be limited.
We would like to humbly clarify our contribution. We admit that EquiGX is an extension of LRP rules for spherical equivariant graph neural networks. We would like to point out that there are currently no established LRP rules for spherical equivariant Graph Neural Networks. Besides, developing propagation rules for new architectures is a challenging task. For instance, references [1] and [2] extend LRP rules to transformers, and references [3] and [4] extend them to traditional 2D GNNs. Additionally, different approaches to using Taylor decomposition can yield various LRP rules, such as rule and rule. Furthermore, spherical equivariant GNNs, despite having aggregation operations similar to those of traditional GNNs, fundamentally differ due to their reliance on tensor product operations. We are the first to explicitly consider tensor product operations to develop new LRP rules for spherical equivariant GNNs.
Limited Backbone Models: Only one model, TFN, is tested.
Given the widespread use of powerful spherical equivariant GNNs, understanding their key components, i.e. the Tensor Product (TP), is a fundamental step toward explainability in equivariant models. In this paper, we focus on TFN, the most classical and representative spherical equivariant GNN. Our method is general and can be extended to other spherical equivariant GNNs. However, due to the high computational cost, we leave such extensions to future work.
Why existing (2D) GNN methods are not suited for explaining 3D-equivariant GNNs?
Thanks for the question. Most existing 2D GNN explanation methods are developed for graph data that capture only topological relationships. However, 3D equivariant GNNs go beyond topology by incorporating rich geometric information, such as interatomic distances, angles, and torsion angles, which are essential for many tasks in scientific domains. These models are specifically designed to respect spatial symmetries, enabling consistent predictions under 3D transformations. 2D explanation methods typically do not consider atomic positions in space or spatial relationships such as angles and distances. As a result, they often fail to capture critical geometric cues. For instance, a small change in a node's 3D position can lead to significant shifts in angles or torsion angles, which may affect the model’s prediction. Since 2D methods are not sensitive to these changes, they tend to provide limited or misleading explanations when applied to 3D-equivariant GNNs.
Equivariant GNNs are conducted on widely-used molecular datasets such as QM9 and MD17. Is there a specific reason why the experiments on these datasets are not conducted?
While Equivariant GNNs are commonly evaluated on datasets like QM9 and MD17, these datasets lack ground-truth explanations. The underlying rationale for quantum chemical properties and energy predictions is still an open research question, with no definitive consensus. As a result, these datasets are not well-suited for evaluating explainability methods. Therefore, in this paper, we construct synthetic datasets and carefully select real-world datasets where meaningful and verifiable explanations can be obtained.
What does it mean for LRI treats the model as a black box and overlooks the equivariance of the model?
Both LRI-Gaussian and LRI-Bern, like other perturbation-based XAI methods, treat the model as a black box. This means they rely solely on input-output behavior without requiring access to the model’s internal parameters or gradients. In other words, they evaluate how changes in the input influence the output, without considering the internal structure of the model.
Besides, LRI-Gaussian overlooks the equivariance of the model, because the learned Gaussian noise is not equivariant. Specifically, when the input point cloud is rotated, the learned noise does not necessarily rotate accordingly. As a result, the explanations can become sensitive to the choice of the input’s reference frame, which is undesirable for explaining equivariant models.
In both Eq 2 and 5, should it be () in ?
We apologize for the typo. Yes, it should be .
Reference
[1] Transformer interpretability beyond attention visualization. CVPR 2021.
[2] XAI for transformers: Better explanations through conservative propagation. ICML 2022.
[3] Higher-order explanations of graph neural networks via relevant walks. TPAMI 2021.
[4] Relevant walk search for explaining graph neural networks. ICML 2023.
Thank you for your response. Most of my concerns have been resolved. After reading other reviewers' comments and discussion, this work can be an important step in explaining tensor product networks, especially why they are better than invariant networks.
I have raised my score.
This paper introduces EquiGX, a novel method for explaining equivariant graph neural networks (GNNs). Building on Deep Taylor Decomposition, the authors extend layer-wise relevance propagation to spherical equivariant GNNs by proposing new attribution rules for tensor product operations. Experimental results demonstrate that EquiGX outperforms existing baselines.
Reviewers commend the paper for its novelty, significance, and clarity. While some questions remain regarding equivariance satisfiability, technical contributions, baseline comparisons, and experimental results, all reviewers unanimously support acceptance after reviewing the authors' rebuttal. I concur with their assessment and recommend acceptance.