Quadruple Attention in Many-body Systems for Accurate Molecular Property Predictions
摘要
评审与讨论
The paper introduces MABNet, a machine-learning model designed to improve molecular property predictions by explicitly modeling four-body interactions. The model is designed to be computationally efficient and maintains E(3)-equivariance, ensuring consistency with physical symmetries. Experiments on MD22 and SPICE datasets suggest that MABNet outperforms existing methods in predicting molecular energies and forces.
给作者的问题
N/A
论据与证据
The claims are clear:
- Introduction of an attention layer to model four-body terms in molecular conformations
- Competitive performance on two relevant benchmarks.
Given the importance of many-body interactions in determining molecular properties, the proposed architecture is sensible, and the reported experimental results are reasonable.
方法与评估标准
The evaluation is based on well-established molecular property benchmarks (MD22 and SPICE), using metrics like Mean Absolute Error (MAE) for energy and force predictions. The comparison against multiple baselines ensures a reasonable assessment.
理论论述
The paper is methodological and does not present any new theoretical claim
实验设计与分析
The experimental design follows a standard energy+forces regression paradigm. However, I didn't find any information concerning:
- The model hyperparameters for the baselines. It would be good to compare models that are "close" either in terms of the number of weights or in terms of the embedding dimension. (Table 8 discusses the embedding dimension of MABNet only, right?)
- Some timings for the other baselines.
- Error bars are missing. Either from multiple independent initializations of the networks or it is too costly, it is also sufficient to report the error variance on the test set. This would help in understanding whether the superior performance of MABNet is statistically significant.
补充材料
I've checked mainly concerning the additional details on the experiments.
与现有文献的关系
The work builds on advances in geometric deep learning, molecular property prediction, and many-body physics. It references prior methods, such as SE3Set, VisNet, or QuinNet. The "Related Work" Section provides enough context to the understanding of the paper.
遗漏的重要参考文献
N/A
其他优缺点
Strengths:
-
Novel direct modeling of four-body interactions.
-
Strong empirical performance on molecular benchmarks.
-
Theoretical grounding in many-body physics and equivariance.
Weaknesses:
-
Limited discussion on computational cost.
-
Limited discussion of the baseline hyperparameters.
-
While the paper is well written overall, I've found the equations in section 3 difficult to understand: the authors only explain what are the q and k variables without mentioning what are the "s", "a" or "v" variables.
其他意见或建议
Throughout the paper, the architectures based on message-passing and attention are presented as complementary options. However, they are not, and it is not clear where MABNet stands. As far as I understood, MABNet can be classified as a specific form of graph attention, and I think that giving a clearer context on the fundamental paradigms used to model molecules can improve the paper.
We sincerely thank the reviewer for the detailed review and insightful comments. Below, we provide point-by-point responses.
Limited discussion of computational cost.
R: We have added a detailed discussion of computational cost in the revised manuscript, along with timing comparisons against baselines. We have included timing information for key baselines to provide a better understanding of the computational cost of our method relative to others. The updated results will be included in the supplementary material.
| Methods | Memory Used | Training Time (mins) | Inference Time (mins) |
|---|---|---|---|
| ViSNet | 15962 MiB | 7.12 | 38.95 |
| MABNet (3-body) | 16542 MiB | 8.05 | 42.65 |
| MABNet (4-body) | 17400 MiB | 9.39 | 49.79 |
Limited discussion of baseline hyper-parameters.
R: We have added a detailed description of the hyperparameters used for baselines in the revised manuscript and supplementary materials. Additionally, we have ensured that the comparisons involve models with similar embedding dimensions or comparable parameter counts.
| Parameter | MABNet | ViSNet | Equiformer |
|---|---|---|---|
| emb. dim. | 256 | 256 | 128 |
| layers | 9 | 9 | 6 |
| Param. | 12.3M | 9.8M |
Missing error bars to assess the statistical significance of MABNet’s performance.
R: We have conducted multiple independent runs of MABNet with different initializations and computed error variances on the test set. These error bars are now included in the revised manuscript to demonstrate the statistical significance of MABNet’s performance.
| Ac-Ala3-NHMe | AT-AT | ||
|---|---|---|---|
| Energy | Forces | Energy | Forces |
| 0.053360.00117 | 0.077250.00126 | 0.063750.00183 | 0.073230.00043 |
Equations in Section 3 are difficult to follow.
R: We apologize for the lack of clarity in our equations. In the revised manuscript, we have provided clear definitions of all variables, including "s," "a," and "v," along with detailed explanations of their roles in the four-body attention mechanism. This should make Section 3 more accessible to readers.
We hope these revisions address your concerns and further strengthen our manuscript. Thank you again for your valuable feedback.
This paper introduces MABNet, an attention-based framework for molecular property prediction that explicitly models four-body interactions. The authors argue that current Graph Neural Networks (GNNs) and Transformers, while promising, struggle to directly capture complex many-body interactions, often relying on approximations. MABNet addresses this by enabling direct communication between atomic quartets through a new "quadruple attention" mechanism. This mechanism incorporates geometric features and E(3) equivariance, and employs spatial sparsity and quantum-inspired pruning to manage computational cost. The key algorithmic innovation is the many-body attention module within equivariant message-passing blocks. The paper claims state-of-the-art performance on challenging molecular property prediction benchmarks, MD22 and SPICE.
给作者的问题
- The term "many-body" is broad. Can authors clarify if MABNet only focuses on dihedral/torsional interactions or is it intended to capture broader quantum many-body effects?
- How does the performance of MABNet scale with molecular size, particularly for systems larger than those in the MD22 and SPICE datasets?
- Could you clarify the relationship between your implementation and VisNet? Specifically, which modules/sections in methodology are completely novel and which modules/sections are the same as previous work
- Your method appears to be specifically designed for many-body systems, whereas VisNet is a more general framework. Could you discuss the trade-offs of your more specialized approach versus a more general framework that could potentially be adapted for many-body interactions?
论据与证据
The claims of state-of-the-art performance on MD22 and SPICE benchmarks are presented, but the evidence is undermined by the lack of transparency regarding the codebase's origin. If the performance is primarily achieved by leveraging and slightly modifying existing VisNet code without proper attribution, then the claim of "state-of-the-art contribution" is significantly weakened. Most of the codebase is adopted from VisNet without any proper citation.
The central issue is not whether the numbers are good, but whether the claimed algorithmic contribution is genuine and properly attributed. If the performance gains are largely due to the underlying VisNet framework, which is not acknowledged or properly cited in the methodology, then the claims are misleading and unsupported in terms of originality. A convincing demonstration of independent algorithmic contribution, beyond incremental modification of VisNet, is severely lacking.
方法与评估标准
The evaluation criteria (MD22, SPICE benchmarks, MAE metrics) are standard and appropriate if the presented method were genuinely novel. However, the methodology section is deeply flawed by its complete omission of any mention or citation of VisNet or any related works, despite strong indications that the MABNet codebase is substantially derived from it. This omission is a critical ethical and scientific failing. The evaluation, therefore, becomes questionable because it's unclear what is truly being evaluated: a novel method, or a minor modification of VisNet presented as a major breakthrough.
In addition, the authors didn't compare with any baselines regarding the efficiency analysis, which is crucial in many cases.
Lastly, it is generally recommended to use rMD17 with standard splits compared with MD17 with random splits. Since the codebase provided by the author has the rMD17 dataset file, its questionable about the dataset choice, and why the authors chose to use MD17.
理论论述
The authors claim the network to be E(3) equivariant message-passing, but didn't provide any theoretical proof on this.
实验设计与分析
The ablation studies in Table 4, while showing some performance changes, are insufficient to demonstrate genuine algorithmic novelty independent of the underlying VisNet architecture or different modules discussed in the methodology section. There's no ablation study with the removal of certain modules discussed in the methodology section. The computational cost analysis is also questionable when it lacks of baselines.
补充材料
I reviewed the appendices included in the manuscript, which contain additional details on feature embedding, datasets, baselines, hyperparameters, and computational costs.
与现有文献的关系
The paper discusses its relation to prior work in molecular property prediction, Graph Neural Networks, Transformers, and methods for modeling many-body interactions. However, the literature review could be more comprehensive. The authors should provide a broader overview of molecular property prediction works, even if they aren't used for direct comparison in the experiments. This would give readers a clearer understanding of the current landscape of the domain.
遗漏的重要参考文献
The paper discusses its relation to prior work in molecular property prediction, Graph Neural Networks, Transformers, and methods for modeling many-body interactions. However, the literature review could be more comprehensive. The authors should provide a broader overview of molecular property prediction or equivariant GNN or Geometric Graph Transformer works, even if they aren't used for direct comparison in the experiments. This would give readers a clearer understanding of the current landscape of the domain and where this article fit in the big picture. Review [1] gives a great overview of the geometric GNNs, several references are missing in this article, especially the Equivariant GNNs and Geometric Graph Transformers listed in Fig.4
[1] Han, Jiaqi, et al. "A survey of geometric graph neural networks: Data structures, models and applications." arXiv preprint arXiv:2403.00485 (2024).
其他优缺点
- The computational cost analysis in Table 10 shows that their 4-body attention approach is significantly slower than 3-body attention (2.35 it/s vs. 3.93 it/s for training), which could limit its applicability to very large molecular systems.
- The paper could benefit from a more detailed discussion of how their approach might scale to even higher-order interactions (e.g., 5-body or beyond).
- The presentation of the method could be improved, with more consistent notation and clearer symbolic systems throughout the paper.
- The paper narrows the use of the network to many-body systems, while VisNet offers a more general framework that could potentially be adapted for many-body systems if needed.
其他意见或建议
- A visualization of the attention patterns learned by their model could provide additional insight into how it captures four-body interactions.
- The presentation can be further improved. It is very hard to read and understand this article because of the presentation. There are notation inconsistencies throughout the manuscript (e.g., "Sotfmax" vs. "Softmax", inconsistent bold/non-bold for learnable matrices).
- For MD17 evaluations, the authors should use the standard predefined splits on rMD17 dataset rather than random splits on MD17 dataset to ensure fair comparison with existing methods. The rMD17 dataset, a revised version of MD17, addresses the issue of numerical noise present in the original data, thus is preferred over the original MD17 dataset.
We sincerely thank the reviewer for the detailed feedback and constructive suggestions to improve our manuscript. Briefly, the reviewer has two main concerns: (1) the contribution of MABNet compared to VisNet, and (2) missing details for baselines, computational efficiency, and visualization of attention patterns. Below, we summarize the reviewer’s questions and provide point-by-point responses.
The contribution of MABNet compared to VisNet
We respectfully disagree with the reviewer’s assertion and would like to clarify the difference between MABNet and VisNet.
VisNet’s primary contribution is the runtime geometry calculation (RGC) mechanism, which extracts angular and dihedral torsion information using vectors that connect the target node to its neighbors. This mechanism relies on pairwise interactions (message passing) to simulate higher-order (three-body, four-body) interactions.
In contrast, MABNet introduces a novel many-body attention mechanism, which directly enables communication among multiple atoms (e.g., four atoms involved in a dihedral angle). For a dihedral angle, MABNet computes attention scores directly among the four atoms involved. This allows their features to interact directly, which results in a single joint update for the features of all four atoms.
Additionally, we have included ablation studies in the revised manuscript to validate the impact of these modules. MABNet shows significant improvement (32.9% of energy) over VisNet on the MD22 dataset.
| MD22 (Ac-Ala3-NHMe) | MAE |
|---|---|
| VisNet | 0.079 |
| MABNet (2-body) | 0.072 |
| MABNet (3-body) | 0.061 |
| MABNet | 0.053 |
Q1. The broader quantum many-body effects
While our current implementation of MABNet focuses on explicitly modeling four-body interactions (e.g., dihedral/torsional interactions), the underlying many-body attention mechanism is designed to be flexible and can be extended to capture broader many-body effects, including higher-order interactions (e.g., five-body or beyond).
Q2. The performance of the MABNet scale with molecular size
The computational complexity of MABNet is O(N × N × E), where N is the number of atoms and E is the number of edges in the molecular graph. By controlling the number of combinations during many-body attention computation, MABNet can be efficiently scaled to larger molecular systems. We would include additional experiments on larger molecular systems to demonstrate the scalability of MABNet.
Q3. Module Comparison between MABNet and VisNet
While MABNet shares conceptual similarities with VisNet in graph construction, feature embedding, and message-passing techniques, the core interaction mechanism is fundamentally different:
-
VisNet relies on pairwise interactions and runtime geometry calculation (RGC) to indirectly simulate higher-order interactions. MABNet introduces a novel many-body attention mechanism that directly models four-body interactions by enabling simultaneous communication among multiple atoms (e.g., four atoms in a dihedral angle).
-
VisNet considers pairwise interactions between all atom pairs. MABNet, by contrast, requires considering combinations of atomic groups. To manage computational cost, MABNet restricts the number of many-body combinations (e.g., N × N × E, where N is the number of atoms) while maintaining high accuracy and efficient computations.
Q4. The generalizability of MABNet
MABNet is explicitly designed to model four-body interactions, but it can also capture two/three-order interactions effectively. Results on MD17 and QM9 show that MABNet achieves SOTA performance on small molecular systems, while its ability to handle high-order interactions gives it a significant advantage on larger molecular systems. Thus, MABNet is a general framework that is effective across molecular sizes.
S1. The visualization of the attention patterns
We agree that visualizing the attention patterns enhances interpretability. We will include visualizations of the learned attention patterns, showing how MABNet captures meaningful four-body interactions.
S2. The performance on the rMD17 dataset
We acknowledge the reviewer’s concern regarding the use of random splits in our original MD17 evaluation. To address this, we have re-evaluated MABNet on the rMD17 dataset. The updated results, included in the revised manuscript, show that MABNet maintains competitive performance and further validate its robustness.
| rMD17 | Azobenzene | Paracetamol |
|---|---|---|
| VisNet | 0.0156 | 0.0258 |
| MABNet | 0.0153 | 0.0252 |
S3. Efficiency analysis
We have included timing information for key baselines in the supplementary material.
| Methods | Training(mins) | Inference(mins) |
|---|---|---|
| ViSNet | 7.12 | 38.95 |
| MABNet 3-body | 8.05 | 42.65 |
| MABNet 4-body | 9.39 | 49.79 |
This paper introduces MABNet (MAny-Body interaction Network), a novel geometric attention framework designed to explicitly model four-body interactions for accurate molecular property predictions. MABNet aims to address the limitations of existing models that often implicitly approximate these complex interactions or handle at most triplets of nodes. The paper demonstrates that MABNet achieves sota performance on challenging benchmarks such as MD22 and SPICE.
给作者的问题
- How does the training and inference cost compare against other geometric transformers or models?
论据与证据
- Explicit modeling of 4-body interactions - there's convincing evidence
- sota performance on MD22 and SPICE - results demonstrate this
- Efficient handling of higher-order interactions -While the complexity is stated as O(|N|^2 · |E|), the ablation study in Section 5.4 and the computational cost analysis in Appendix B.2 (Table 10) indicate that the increase in computational cost compared to methods considering fewer-body interactions is manageable
方法与评估标准
- MD22 and SPICE datasets were chosen as they have larger graphs and are known benchmarking datasets in this field - which makes sense.
理论论述
N/A
实验设计与分析
- [Table 2] Compares results with previous sota approaches on MD22 benchmark
- [Table 3] Compares results with previous sota approaches on SPICE benchmark
- [Table 4] Ablation study demonstrating the impact of equivariance, 4-body attn w/ different cutoffs
- [Table 10] Computational cost comparison across the ablations
All of these experiments make sense.
补充材料
Yes, I reviewed the Supplementary Material.
- Table 8: The hyperparameters chosen across datasets are consistent and that's not the driving factor for accuracies.
- Table 10: To look into the Memory, Training, Inference costs
与现有文献的关系
- Prior work have encoded triplets for attentions explicitly [TGT, Graphormer], however this work extends it to 4-body attn and make it scalable with cutoffs
遗漏的重要参考文献
- This work does not discuss a vast literature of geometric transformers and doesn't compare with them as well [1, 2, 3, 4, 5]. Even though none of these explicitly model 4-body attn in the way this work does. If this work would have have benchmarked their results on QM9 datasets, it would make it easy to compare against the vast literature in this field.
- Wang, Yusong, et al. "Geometric transformer with interatomic positional encoding." Advances in Neural Information Processing Systems 36 (2023): 55981-55994.
- Kwak, Bumju, et al. "Geometry-aware transformer for molecular property prediction." arXiv preprint arXiv:2106.15516 (2021).
- Shi, Yu, et al. "Benchmarking graphormer on large-scale molecular modeling datasets." arXiv preprint arXiv:2203.04810 (2022).
- Luo, Shengjie, et al. "One transformer can understand both 2d & 3d molecular data." arXiv preprint arXiv:2210.01765 (2022).
- Choukroun, Yoni, and Lior Wolf. "Geometric transformer for end-to-end molecule properties prediction." arXiv preprint arXiv:2110.13721 (2021).
其他优缺点
- A stronger comparison against prior literature on geometric transformers would solidify the importance of this work.
其他意见或建议
- There appears to be a typographical error in Equation (3) on line 19, where "Sotfmax" should likely be "Softmax".
We appreciate the reviewer’s careful review of our paper, positive feedback, and recognition of our work, particularly the novelty of our approach and the meaningfulness of our experiments. Below, we address the concerns point by point:
Comparison with Geometric Transformers
R: We have included a discussion of the geometric transformer literature [1, 2, 3, 4, 5] in the revised manuscript. Additionally, we conducted benchmarking experiments on the QM9 dataset to provide a more direct comparison. Due to time constraints, we focused on two properties. Our method achieves an MAE of 0.038, compared to Transformer-M (0.041) and Geoformer (0.040). These results demonstrate that our approach achieves state-of-the-art performance, even on smaller molecular systems where higher-order multi-body interactions are less prominent. This highlights the strength and versatility of our method.
| QM9 | ||
|---|---|---|
| Transformer-M | 0.041 | 17.5 |
| Geoformer | 0.040 | 18.4 |
| MABNet | 0.038 | 14.4 |
Training and Inference Costs
R: In Section B.2 of the supplementary material, we have included a detailed discussion regarding the training and inference costs of our method. Additionally, we performed a comparative analysis of inference costs with Geoformer [1] in the MD22 dataset. Our results indicate that the improvement in prediction accuracy comes at a modest cost, with no significant degradation in training speed. This demonstrates the efficiency of our approach compared to similar models.
| Methods | Memory Used | Inference Time (mins) |
|---|---|---|
| Geoformer | 14362 MiB | 33.43 |
| MABNet (3-body) | 16542 MiB | 42.65 |
| MABNet (4-body) | 17400 MiB | 49.79 |
We hope these revisions address your concerns and further strengthen our manuscript. Thank you again for your valuable feedback.
Thanks for the additional results clarifying my concern on apples to apples comparison. In light of these results, I've updated my score.
Thank you for your valuable feedback. We sincerely appreciate your thoughtful comments and the constructive discussion, which have been very helpful in improving the quality of our paper.
We will incorporate these suggested results and discussions in the revised manuscript to enhance clarity and completeness.
The authors present a novel machine-learned force field based on a four-body formulation of self-attention, which goes beyond existing approaches like the triangle attention used by AlphaFold. This enables the inclusion of dihedral angles and other rich four-body geometry in these force fields. They demonstrate state of the art results on MD22. Many reviewers had concerns about the additional computational overhead of the 4-body attention compared to existing methods, and it would be good to show if comparable accuracy could be achieved simply by increasing model size. One reviewer felt that the paper did not give VisNet sufficient credit, and claimed that the method in the paper was only an incremental advance, however the authors provided ablation experiments showing a noticeable increase in performance on MD22 relative to VisNet. As long as VisNet is properly credited and compared against in the paper, I am happy to accept.