PaperHub
6.3
/10
Poster4 位审稿人
最低5最高8标准差1.1
5
6
8
6
3.5
置信度
ICLR 2024

Long-Short-Range Message-Passing: A Physics-Informed Framework to Capture Non-Local Interaction for Scalable Molecular Dynamics Simulation

OpenReviewPDF
提交: 2023-09-22更新: 2024-04-21
TL;DR

We proposed a long-range short-range message-passing framework to capture non-local interactions and demonstrate the state-of-the-art results with up to 40% error reduction for molecules in MD22 and Chignolin datasets.

摘要

关键词
Molecular ModelingQuantum ChemistryFragmentationNon-Local InteractionsEGNN

评审与讨论

审稿意见
5

This paper proposes new message-passing neural networks that capture long-range interactions by generalizing equivariant graph neural networks inspired by fragmentation-based approaches. For the implementation, BRICS fragmentation was leveraged. The authors demonstrated the effectiveness of the proposed method with a recently proposed architecture ViSNet and achieved considerable improvement in large molecule benchmarks: MD and Chignolin datasets. To evaluate the proposed method’s applicability, the authors provided results with other EGNNs such as Equiformer, PaiNN, and ET. They show consistent improvement.

优点

  1. Strong empirical results The proposed method achieved competitive performance in MD22 and Chignolin datasets. In Table 2 for MD22, the proposed method with ViSNet shows achieved the best performance in various settings. More importantly, the proposed method, LSRM, shows consistent improvement compared to the vanilla model without LSRM.
  2. General applicability The proposed method with various EGNNs base networks shows consistent performance improvement.
  3. Computational efficiency The proposed method shows great performance without significant computational overhead. Rather, the proposed method has the smallest model size and the shortest training time.
  4. Comprehensive experimental results The paper provides many details and additional experiments.

缺点

  1. Limited impact. The technical contributions of this paper has limited impact. Although the method show overall comparable performance, it is a model that is manually-designed by domain knowledge.
  2. Narrow perspective. Basically, the proposed method uses two different types of graphs. Then the problem can be viewed as learning on heterogeneous graphs. In recent years, learning graph neural networks on heterogeneous graphs by manually/automatically transforming graphs has been actively studied. The authors may want to include the related work and potentially compare with them. Beyond long-range dependency, non-local/semantic relations also have been utilized.
  3. Far-fetched claim. In Figure 2., I do not see anything but overall performance gap. I do not think that the graphs support the claim that LSRM helps models to capture long-range dependency. All three models exhibit similar behaviors.

问题

  1. How about inference time? It was not clear how ViSNet has more parameters than ViSNet-LSRM. Also, the training time was reported, but inference time was not available. In real-world applications, inference time is more important for deployment. I believe that shorter training time would imply shorter inference time, but it should be explicitly discussed to be more comprehensive. Fig. 2, (c)(f) partially show the inference time for the subset of baselines
  2. Figure 3 is confusing. The legend should be updated.
  3. Table 4 is not explicitly referred to in the text, although the paragraph of the text of Q3 in Section 5.2 discusses the result. It will be a quick fix.
  4. Typo (?) in Proposition 4.1 Hamdard -> Hadamard product (?)
评论

Thank you for your constructive comments and suggestions, and they are exceedingly helpful for us to improve our paper and our point-to-point responses to your comments are given below:

Limited impact. The technical contributions of this paper has limited impact. Although the method show overall comparable performance, it is a model that is manually-designed by domain knowledge.

We would highlight that LSR-MP is a combination of ML techniques and domain knowledge, and this combination should not be considered as a trivial manual design. Furthermore, molecular system modeling inherently necessitates a deep understanding of domain knowledge. This is evident in successful models that incorporate concepts like the CG tensor product (combine angular momenta of atomic orbitals, facilitating the calculation of molecular orbital properties and electron interactions) [1], and angular description (molecular empirical potential always consider the angles between different atoms or bonds) [2]. In AI4Science, the integration of domain knowledge is not just beneficial but often crucial for achieving meaningful and accurate results. Our approach aligns with this paradigm, utilizing the fragmentation approach prevalent in quantum chemistry to improve the model's accuracy to real-world molecular systems.

In terms of broader impact, our proposed methodologies hold significant potential for application in the realms of molecular property prediction and drug design. This represents a promising avenue for managing large biomolecular systems effectively. Furthermore, the introduction of equivariant networks in our approach opens up possibilities for application in the analysis of some other areas such as point cloud analysis in computer vision. However, it is important to note that our expertise does not extend to the domain of point clouds at this juncture.

[1]: Edmonds, A. R. (1957). Angular Momentum in Quantum Mechanics. Princeton, New Jersey: Princeton University Press. 

[2]: Abell, G. C. (1985). "Empirical chemical pseudopotential theory of molecular and metallic bonding". Phys. Rev. B31 (10): 6184–6196.

Narrow perspective. Basically, the proposed method uses two different types of graphs. Then the problem can be viewed as learning on heterogeneous graphs. In recent years, learning graph neural networks on heterogeneous graphs by manually/automatically transforming graphs has been actively studied. The authors may want to include the related work and potentially compare with them. Beyond long-range dependency, non-local/semantic relations also have been utilized.

Indeed, our proposed method incorporates learning on heterogeneous graphs. We have thoroughly reviewed related literature in this area, which is detailed in our revised appendix.

In response to your suggestion, we have conducted benchmarking against a well-known heterogeneous graph learning framework, specifically the Heterogeneous Graph Attention Network (HAN) [3]. Our comparative analysis involved substituting our proposed Distance-Dependent Bipartite Geometric Transformer with the HAN in the LSR-MP framework. The results are as follows:

Force MAE (kcal/mol/Angstrom)
Distance-Dependent Bipartite Geometric Transformer0.1063
HAN1.512

A critical aspect we wish to highlight is that employing HAN in our framework would inevitably disrupt the symmetry, leading to a non-equivariant function. Achieving equivariance with HAN is non-trivial and presents significant challenges. To our knowledge, few works have been dedicated to modeling heterogeneous geometric graphs. Our Distance-Dependent Bipartite Geometric Transformer, in contrast, maintains the necessary symmetry and equivariance, ensuring the physical relevance and robustness of our model in capturing molecular interactions.

[3]: Wang, Xiao, et al. "Heterogeneous graph attention network." The world wide web conference. 2019.

评论

In Figure 2., I do not see anything but overall performance gap. I do not think that the graphs support the claim that LSRM helps models to capture long-range dependency. All three models exhibit similar behaviors.

We apologize for any confusion caused by the presentation in Figure 2 of our paper and appreciate the opportunity to clarify the intended message behind it: We highlight two key points regarding Figure 2.

  1. Increasing cutoff induces over-squashing (Panels a, b, c):
    • In these panels, we aimed to illustrate that a straightforward approach to tackling long-range dependency, such as increasing the cutoff radius, could lead to information oversquashing. This issue is prevalent across all EGNNs, including the short-range component of VisNet-LSRM. We wanted to emphasize that merely extending the cutoff does not effectively address the long-range dependency challenge due to this inherent limitation.
  2. Effectiveness of the Long-Range Model (Panels d, e, f):
    • The key comparison here is between the depth of short-range models and the integration of long-range modules. To compare horizontally, increasing the depth of the short-range models can only bring marginal improvements to the model, this is attributed to vanishing gradients induced by information-oversquashing. To compare vertically, our results demonstrate that augmenting a 3-layer short-range model with 2 long-range modules significantly outperforms an 8-layer pure local model. This enhancement is not only in terms of accuracy but also in inference speed, achieving up to twice the speed.

Based on your valuable comments, we have updated the captions and added additional visual cues to better illustrate Figure 2.

How about inference time? It was not clear how ViSNet has more parameters than ViSNet-LSRM. Also, the training time was reported, but inference time was not available. In real-world applications, inference time is more important for deployment. I believe that shorter training time would imply shorter inference time, but it should be explicitly discussed to be more comprehensive. Fig. 2, (c)(f) partially show the inference time for the subset of baselines.

The inference time is attached below, and the detailed settings of each model can be found in Appendix L.2.

Model (Force MAE kcal/mol/A)ViSNet (0.16)ViSNet-LSRM (0.13)PaiNN (0.35)ET (0.29)Allegro (0.13)Equiformer (0.13)
# of Parameters2.21M1.70M3.20M3.46M15.11M3.02M
Inference Time / per molecule (ms)14.97.457.2310.52295.4154.52
  1. Figure 3 is confusing. The legend should be updated.
  1. Table 4 is not explicitly referred to in the text, although the paragraph of the text of Q3 in Section 5.2 discusses the result. It will be a quick fix.
  2. Typo (?) in Proposition 4.1 Hamdard -> Hadamard product (?)

We are grateful for your keen observations and suggestions regarding Figure 3, Table 4, and Proposition 4.1. We have addressed these points as follows in our revised manuscript:

  1. Figure 3: Updated the legend for improved clarity.
  2. Table 4: Added an explicit reference in Section 5.2.
  3. Proposition 4.1: Corrected the typo from "Hamdard" to "Hadamard product."
评论

Dear Reviewer GkgW,

Thank you again for your valuable feedback and comments!

As the discussion period is ending soon, we would greatly appreciate it if you could let us know whether you are satisfied with our response. We will be happy to address any remaining concerns.

Sincerely,

Authors

审稿意见
6

This paper proposes a novel framework for molecular dynamics simulations using machine learning. The framework, called Long-Short-Range Message-Passing (LSR-MP), combines equivariant graph neural networks (EGNNs) with fragmentation-based methods to capture both short-range and long-range interactions among atoms. The authors demonstrate that LSR-MP can achieve state-of-the-art results on large molecular datasets, while being more efficient and effective than existing methods. The authors also conduct ablation studies and analysis to validate the importance of incorporating long-range components and the advantages of using BRICS fragmentation.

优点

  • Problem Definition: This paper addresses a challenging and important problem of modeling large molecular systems with high accuracy and low computational cost.

  • Methodology: This paper introduces a novel message-passing framework that leverages domain knowledge from quantum chemistry to incorporate long-range interactions efficiently and effectively.

  • Performance: This paper shows significant performance improvements over existing methods on various benchmarks, while using fewer parameters and offering faster speed.

  • Generalizability: This paper illustrates the general applicability and robustness of the LSR-MP framework by applying it to different EGNN backbones and showing consistent improvements.

  • Implementation: This paper provide sufficient details on experimental setups and how the method is implemented.

缺点

  • Novelty: I could not find any distinct weaknesses in this paper, but I might have missed one since I am not an expert in Molecular Modeling. One major concern is regarding the novelty of the proposed long-range message-passing module. As far as I know, long-range message-passing is one of the highlighted research topics in GNN literature. It would be better to discuss this line of work.

问题

  • As far as I know, there are many long-range message-passing modules designed for graph-structured data. Can you compare the proposed method with other long-range message-passing modules?

伦理问题详情

N/A

评论

Thank you for your constructive comments and suggestions, they are exceedingly helpful for us to improve our paper. Our point-to-point responses to your comments are given below:

One major concern is regarding the novelty of the proposed long-range message-passing module. As far as I know, long-range message-passing is one of the highlighted research topics in GNN literature. It would be better to discuss this line of work. As far as I know, there are many long-range message-passing modules designed for graph-structured data. Can you compare the proposed method with other long-range message-passing modules?

Regarding the novelty you mentioned, our work is distinctly inspired by quantum chemistry principles. We introduce a fragmentation-based approach that divides large molecules into smaller subsystems to model their long-range interactions more efficiently and effectively. This is a simple and novel approach has demonstrated superior performance when implemented on various existing EGNNs.

We have thoroughly reviewed literature pertinent to long-range interactions, which is now included in our revised manuscript (Appendix A). A critical observation from our study is most existing networks would disrupt symmetry (SE(3) Equivariance, see Appendix J for more details) when directly applied to geometric graphs. This disruption could detrimentally impact performance. To demonstrate the efficacy of our proposed model, we performed comparative analyses against the SO3Karates[1], a specialized model designed for long-range interactions in molecular systems, incorporating hidden space rewiring techniques. Additionally, we benchmarked our model against an Equivariant Transformer. We employed a complete graph to effectively represent long-range interactions [2]. The outcomes of these comparative evaluations are detailed as follows:

MoleculeSo3KaratesViSNet-LSRMEquivariant Transformer (Complete)
Ac-Ala3-NhMeEnergy0.3370.06544.535
Force0.2440.09025.522
DHAEnergy0.3790.08733.354
Force0.2420.05984.095
StachyoseEnergy0.4420.10555.531
Force0.4350.07672.382
ATATEnergy0.1780.07227.523
Force0.2160.07813.125
ATATCGCGEnergy0.3450.11356.452
Force0.3320.10633.235

The table shown above demonstrates that our proposed ViSNet-LSRM provides the best accuracy among all the baseline methods.

[1]: Frank et al. So3krates: Equivariant attention for interactions on arbitrary length-scales in molecular systems. NeurIPS 2022.

[2]: TorchMD-NET: Tholke et al. Equivariant Transformers for Neural Network based Molecular Potentials. ICLR 2022.

审稿意见
8

This paper proposes a fragment-based approach to propagate long-range information in Graph Neural Networks (GNNs). A set of fragments is constructed using the BRICS fragmentation method, which leverages chemical structures to define well-behaved fragments. The framework operates at two levels: first, it performs a message-passing step on the short-range graph, and then uses these results to define fragment-level features that are also message-passed at the fragment-level graph. The method demonstrates competitive accuracy on the MD22 benchmark, which contains large structures, and shows some improvements over short-range baselines.

优点

  • The paper is clearly written and pedagogical.
  • Various ablation studies were conducted on both the long-range modules and the fragmentation methods.
  • Results show improvements over short-range baselines for large molecules.
  • Some limitations of the fragmentation methods are discussed.

缺点

  • The main weakness of the method, as shared in the paper, is the definition of fragments. First, as mentioned, it is not clear how to define these for most systems, including materials. My biggest concern is the issue of smoothness. In molecular dynamics (MD) simulations, it is crucial to ensure that the predictions are smooth. I can envision many MD scenarios where such partitioning might cause problems, and I would be very interested in seeing the behavior of this model over long simulations.

  • The Equiformer and VisNet models have 4 layers with a 4Å cutoff, resulting in a receptive field of 32Å in diameter. Most of the MD22 molecules fit well within their receptive fields. While this does not detract from the improvement offered by the method, it should be clearly highlighted.

  • The importance of long-range effects beyond a 12Å radius is subtle, as large effects are usually screened in most systems. One would expect to see little difference in errors between a short-range and a long-range model. However, observables computed from MD simulations might vary significantly, as these long-range effects do not average out over long timescales. To capture these observables accurately, the most crucial factor is the decay of interactions, rather than raw accuracy. There is no reason to believe that your approach would correctly capture this decay, enabling accurate observables in these simulations. I want to stress that long-range effects in large biomolecular systems are mostly relevant for observables, and justifying the method solely through raw accuracy has limited scientific relevance.

  • One of the main challenges of long-range modeling is transferability, especially for models without typical decay behaviors. I would be very interested in seeing how this model extrapolates to longer, unseen molecules, and whether it performs better than a local model in this context. This is the only relevant setting for practical applications, particularly for modeling systems where ab initio computations are not feasible.

问题

  • How well do you expect your model to transfer to new, unseen systems, particularly those of larger sizes?

  • Could you plot the typical decay learned by your interactions, assuming the fragmentation approach allows for it? You could try separating two molecules and plotting the energy as a function of distance. Without sensible decay, the model stands little chance of extrapolating effectively.

评论

One of the main challenges of long-range modeling is transferability, especially for models without typical decay behaviors. I would be very interested in seeing how this model extrapolates to longer, unseen molecules, and whether it performs better than a local model in this context. This is the only relevant setting for practical applications, particularly for modeling systems where ab initio computations are not feasible.

To answer your question, we conducted three experiments to verify the extrapolation capability of our system.

  • Zero-Shot Experiment: To study transferability, we commence by adopting a zero-shot setup. We trained on molecules including ATAT, Stachyose, DHA, and Ac-Ala3-NhMe, and then tested directly on a larger molecule, ATATCGCG. The zero-shot results are shown in the table below. This experiment revealed that direct transferability without demonstration is challenging for MD22 trajectories.
zero shot on ATAT-CGCGEnergy (kcal/mol)Force (kcal/mol/A)
ViSNet182.436310.93
ViSNet-LSRM150.234310.22
  • Few-Shot Learning Experiment: To further explore transferability, we conducted a few-shot learning experiment, as shown in Table below. By adding a small set of 50 ATATCGCG training samples to the original zero-shot training set, our model demonstrated significant improvement over the baseline model. This suggests that with minimal additional training data, our model can adapt to new, larger molecular systems more effectively than local models.
Few shot on ATAT-CGCGEnergy (kcal/mol)Force (kcal/mol/A)
VisNet2.5750.7448
VisNet-LSRM2.1670.6556
  • PubChem: In light of the few-shot learning experiments, we further assessed our model's capabilities using the PubChem [1] dataset, as elaborated in Appendix E.1. The dataset features heterogeneous molecules of size ranging from 40 to 100. We recalculated the dataset using def2-tzvp basis set, and b3lyp xc functional to improve accuracy. Notably, we included molecular force, which remains informative signals given that the molecules were relaxed only through a semi-empirical approach. For dataset division, we used molecules with fewer than 60 atoms (30,545 samples) for training and those with more than 60 atoms (3,455 samples) for testing. Our results are shown in the Table below. Compared to the baseline ViSNet model, our model showed enhanced performance on larger and unseen molecules, underlining its robust transferability and wide applicability in diverse molecular contexts.
PubChemEnergyForce
VisNet4.4580.3303
VisNet - LSRM3.3390.2395

[1]: Nakata M, Maeda T. PubChemQC B3LYP/6-31G*//PM6 Data Set: The Electronic Structures of 86 Million Molecules Using B3LYP/6-31G* Calculations. J Chem Inf Model.

评论

I thank the authors for their comprehensive responses. I appreciate the additional experiments, and the results are convincing, particularly regarding the decay of the interactions. I would recommend that the authors continue to plot the DFT points, as their absence might raise concerns. While I still perceive significant limitations in the applicability of the methods, as highlighted in my review, I believe the paper is a valuable contribution to the community working on machine learning force fields. Hence, I recommend its acceptance for the conference.

I would also like to emphasize an important point. The systems mentioned in reference [1], specifically cumulenes, are extreme examples of electronic delocalization. While they are intriguing, these systems do not represent the majority of long-range effects in biological systems, which are predominantly screened beyond 12 Ångströms, though they remain significant in the dynamics of these systems. The impact of long-range interactions on the dynamics of neutral molecules is most likely to manifest in the terahertz spectrum.

评论

The Equiformer and VisNet models have 4 layers with a 4Å cutoff, resulting in a receptive field of 32Å in diameter. Most of the MD22 molecules fit well within their receptive fields. While this does not detract from the improvement offered by the method, it should be clearly highlighted.

Thanks for your valuable feedback, we will highlight this point in our revised paper.

The importance of long-range effects beyond a 12Å radius is subtle, as large effects are usually screened in most systems. One would expect to see little difference in errors between a short-range and a long-range model. However, observables computed from MD simulations might vary significantly, as these long-range effects do not average out over long timescales. To capture these observables accurately, the most crucial factor is the decay of interactions, rather than raw accuracy. There is no reason to believe that your approach would correctly capture this decay, enabling accurate observables in these simulations. I want to stress that long-range effects in large biomolecular systems are mostly relevant for observables, and justifying the method solely through raw accuracy has limited scientific relevance. Could you plot the typical decay learned by your interactions, assuming the fragmentation approach allows for it? You could try separating two molecules and plotting the energy as a function of distance. Without sensible decay, the model stands little chance of extrapolating effectively.

Thank you for your valuable insights and for highlighting the importance of accurately capturing long-range effects in large biomolecular systems. We appreciate your emphasis on the subtleties of long-range effects in large biomolecular systems and their significance in observables derived from molecular dynamics (MD) simulations. To address your concerns, we would like to politely clarify that long-range effects beyond a 12Å radius are indeed critical and cannot be neglected, as evidenced in multiple systems[1]. Thus raw accuracy still holds its validity under such settings.

In our study, detailed in the updated Appendix E.2, we focused on a system with significant electrostatic interactions. We studied the decay of interactions by separating two molecules in a dimer configuration and plotting the energy as a function of distance. The decay curve can be found in Appendix E.2.1 as well as https://i.ibb.co/TMnMrPG/nov-16-decay.jpg. These experiments demonstrate that, compared to a local model, our model exhibits a more appropriate decaying behavior. This finding is crucial as it suggests that our model captures the long-range interactions more effectively, addressing the key issue you raised.

[1]: Frank et al. So3krates: Equivariant attention for interactions on arbitrary length-scales in molecular systems. NeurIPS 2022.

评论

Thank you for your constructive comments and suggestions, they are exceedingly helpful for us to improve our paper. Our point-to-point responses to your comments are given below:

The main weakness of the method, as shared in the paper, is the definition of fragments. First, as mentioned, it is not clear how to define these for most systems, including materials. My biggest concern is the issue of smoothness. In molecular dynamics (MD) simulations, it is crucial to ensure that the predictions are smooth. I can envision many MD scenarios where such partitioning might cause problems, and I would be very interested in seeing the behavior of this model over long simulations.

We appreciate your feedback. Research in recent decades has demonstrated that the smoothness issues associated with fragmentation can be mitigated to a negligible level, particularly in Molecular Dynamics (MD) simulations (https://doi.org/10.1021/ar500038z). In our upcoming experiments, we aim to demonstrate that smoothness is not a practical issue in long-term simulations with our design.

Here, we performed an MD simulation for a relatively large molecule, ATAT, for 20ps, matching the duration of the AT-AT simulation in the MD22 dataset. This was done at a constant energy ensemble (NVE). These simulations were driven by our ViSNet-LSRM and DFT with a time step of τ = 1 fs, allowing us to analyze the vibrational spectra of the AT-AT molecule. As depicted in Fig. 5 and the provided link https://i.ibb.co/pzKwkjX/vel-auto.jpg, both the trajectory in MD22 and the trajectory simulated by ViSNet-LSRM show similar vibrational spectra, albeit with minor differences in peak intensities compared to DFT. This suggests that our simulations can accurately mimic the actual vibrational modes of the molecules over relatively long time periods.

To test the model performance in longer time steps, we ran a longer 200 ps NVE simulation with a time step of τ = 1 fs for this molecule. The total energy profile is displayed in Fig. 6 and the link https://i.ibb.co/pZ2Xz33/etot-vs-steps-0-200000.png. The total energy stays reasonably conserved (fluctuates within 0.0001% of the total energy), unlike a problematic energy profile that might drastically increase/decrease over time (fluctuates more than 0.01% of the total energy). This implies that the LSR-MP could well adapt to longer simulations.

审稿意见
6

This paper proposes a new framework for machine learning of molecular dynamics, called Long-Short-Range Message-Passing (LSR-MP). LSR-MP combines short-range and long-range message passing on graphs to capture both local and non-local interactions in chemical and biological systems. LSR-MP uses a fragmentation-based method inspired by quantum chemistry to divide large molecules into smaller subsystems and model their long-range interactions efficiently and effectively. LSR-MP is implemented on top of an existing equivariant graph neural network (EGNN) called ViSNet, and achieves state-of-the-art results on large molecular datasets with fewer parameters and faster speed. LSR-MP is also applied to other EGNN models and shows consistent improvements, demonstrating its general applicability and robustness.

优点

The paper presents a novel and elegant framework for long-short-range message passing on graphs, which can capture both local and non-local interactions in chemical and biological systems. The paper draws inspiration from quantum chemistry and adopts a fragmentation-based method to divide large molecules into smaller subsystems and model their long-range interactions efficiently and effectively. This is a clever and creative way to overcome the computational and memory challenges of existing methods. The paper implements the proposed framework on top of an existing equivariant graph neural network (EGNN) called ViSNet, and demonstrates its superior performance on two large molecular datasets, MD22 and Chignolin. The paper shows that the proposed method achieves state-of-the-art results with fewer parameters and faster speed than the baselines, which is impressive and convincing. The paper also applies the proposed framework to other EGNN models, such as PaiNN, ET, and Equiformer, and shows consistent improvements across different architectures and datasets. This demonstrates the general applicability and robustness of the proposed framework, and suggests that it can be easily integrated with other existing methods.

缺点

The paper does not provide a clear analysis of the stability, and error bounds and how sensitive the performance of the method is to the choice of these modules and parameters.

问题

Q1. How do you justify the choice of the LSR-MP framework as a generalization of the existing EGNNs? What are the advantages and limitations of this framework compared to other possible ways of incorporating long-range interactions, such as attention mechanisms, continuous filters, or Fourier features?

Q2. How do you ensure the stability and accuracy of the BRICS fragmentation method for different types of molecules and systems? How sensitive is the performance of the LSR-MP framework to the choice of the fragmentation method and the number and size of the fragments?

Q3. How do you evaluate the scalability and efficiency of the LSR-MP framework for larger and more complex molecular systems? What are the computational and memory costs of the LSR-MP framework, and how do they compare with the conventional quantum chemical methods and other machine learning methods? How do you handle the trade-off between accuracy and efficiency in the LSR-MP framework?

评论

Q2. How do you ensure the stability and accuracy of the BRICS fragmentation method for different types of molecules and systems? How sensitive is the performance of the LSR-MP framework to the choice of the fragmentation method and the number and size of the fragments?

In response to your query, we generally apply the default settings of the BRICS method to most molecules. This approach serves as an effective starting point for optimal performance. We've also developed a strategy for merging to manage the fragment size, detailed in Appendix G. Additionally, we've included an analysis of the fragmentation method selection in Table 7. We've also discussed the performance sensitivity regarding the number and size of the fragments, which are included in the table shown below.

Average Fragment Size on AT-AT-CG-CGFragment NumberForce MAEEnergy MAE
w/o long rangeNA0.15630.1995
k-means (8.43)140.12760.1246
118.0010.14880.1698
23.6050.11440.1384
14.7580.10960.1372
8.43140.10640.1135
1.001180.14960.1610

How do you evaluate the scalability and efficiency of the LSR-MP framework for larger and more complex molecular systems? What are the computational and memory costs of the LSR-MP framework, and how do they compare with the conventional quantum chemical methods and other machine learning methods? How do you handle the trade-off between accuracy and efficiency in the LSR-MP framework?

To evaluate the scalability for larger and more complex molecular systems, we performed the following experiments:

  • PubChem: Our model was evaluated on the PubChem dataset (Appendix E.1), which includes molecules of varying sizes (40-100). We recalculated the dataset using the def2-tzvp basis set and b3lyp xc functional for accuracy, also integrating molecular force for informative signals. The training involved molecules with fewer than 60 atoms (30,545 samples), and testing those with more than 60 atoms (3,455 samples). Our results, shown below, indicate our model's superior scalability over the baseline ViSNet model, particularly for larger molecules.
PubChemEnergyForce
VisNet4.4580.3303
VisNet - LSRM3.3390.2395

For a formal investigation on scalability and efficiency, we conducted an analysis of the computational complexity. The results are displayed in the table shown below:

MethodEfficiencyMany-bodyLong-rangeAccuracy
Ab initio MethodsO(Namb)[a3,m=24]O(N^a m^b) [a \geq 3, m = 2\sim4]✔️✔️Most Accurate
Fragmentation MethodsO(N)+O(Nfrag(N/Nfrag)amb)[a3,m=24]O(N) + O(N_{frag} (N/N_{frag})^a m^b) [a \geq 3, m = 2\sim4]✔️1 - 5 kcal/mol
EGNNO(Na)[a=12]O(N^a) [a = 1 \sim 2]✔️Optimally < 1 kcal/mol
EGNN + LSR-MPO(Na)[a=12]O(N^a) [a = 1 \sim 2]✔️✔️Optimally < 1 kcal/mol

NN number of atoms, mm number of basis. In particular, LSR-MP can well scale to larger bio molecules with minimal computational overhead when compared with quantum chemistry methods.

In considering the trade-off between accuracy and efficiency, our findings reveal that the utilization of LSR-MP (Long-Short Range Message Passing) significantly boosts accuracy while concurrently providing superior efficiency, as evidenced in Table 3. In practical implementations, we recommend using 4 to 6 layers of short-range and 2 layers of long-range message passing. This configuration provides the best balance between accuracy and efficiency.

评论

Thank you for your constructive comments and suggestions, they are exceedingly helpful for us to improve our paper. Our point-to-point responses to your comments are given below:

The paper does not provide a clear analysis of the stability, and error bounds and how sensitive the performance of the method is to the choice of these modules and parameters.

Thank you for your insightful feedback. We recognize the importance of a clear and detailed analysis in our manuscript. We have included a comprehensive sensitivity analysis in the main text and appendix of our manuscript. Specifically, it examines the impact of choices like the number of short-range layers (Figure 2), average fragment size (Figure 11), fragmentation methods (Table 7), and the cutoffs for short (Figure 2) and long-range interactions (Table 9), ablation of the proposed modules (Table 4). We also supplement the analysis on learning rate and batch size for ATATCGCG. We hope this provides a thorough analysis of our model's stability. If there are any other parameters you are interested in, feel free to post them here.

Batch SizeEnergy MAEForce MAE
40.11380.1065
80.11350.1064
160.14040.1224
Learning RateEnergy MAEForce MAE
1e-30.14220.1243
4e-40.11350.1064
1e-40.14920.1252

How do you justify the choice of the LSR-MP framework as a generalization of the existing EGNNs? What are the advantages and limitations of this framework compared to other possible ways of incorporating long-range interactions, such as attention mechanisms, continuous filters, or Fourier features? What are the advantages and limitations of this framework compared to other possible ways of incorporating long-range interactions, such as attention mechanisms, continuous filters, or Fourier features?

The strength of our model lies in its superior accuracy and efficiency in characterizing long-range interactions. However, there are limitations within our framework. For instance, when using the MD22 dataset, it became evident that traditional fragmentation methods in quantum chemistry struggle with supramolecules. This necessitated the adoption of canonical clustering methods as an alternative.

We have thoroughly reviewed literature pertinent to long-range interactions, which is now included in our revised manuscript. It's important to clarify that directly applying most of these networks can disrupt symmetry, which could lead to poor performance on geometric graphs. To illustrate our model's capabilities, we conducted comparisons with a model tailored for characterizing long-range interactions in molecular systems: the SO3Karates [1], a method involving hidden space rewiring. Further, we compared our model with an Equivariant Transformer using a complete graph to capture long-range interactions [2]. The results of these comparisons are provided as follows:

MoleculeSo3KaratesViSNet-LSRMEquivariant Transformer (Complete)
Ac-Ala3-NhMeEnergy0.3370.06544.535
Force0.2440.09025.522
DHAEnergy0.3790.08733.354
Force0.2420.05984.095
StachyoseEnergy0.4420.10555.531
Force0.4350.07672.382
ATATEnergy0.1780.07227.523
Force0.2160.07813.125
ATATCGCGEnergy0.3450.11356.452
Force0.3320.10633.235

It is evident that our proposed ViSNet-LSRM provides the best accuracy among all the baseline methods.

[1]: Frank et al. So3krates: Equivariant attention for interactions on arbitrary length-scales in molecular systems. NeurIPS 2022.

[2]: TorchMD-NET: Tholke et al. Equivariant Transformers for Neural Network based Molecular Potentials. ICLR 2022.

评论

Dear Reviewers,

Based on your valuable comments, we have made the following modifications to our manuscript:

  • Three Transferability experiments (Appendix E.3), including zero shot, few shot on MD22, and PubChem extrapolation experiments, based on comments from reviewer gmV7, cAHU.
  • Visualization on the decay of interactions (Appendix E.2.1) based on comments from reviewer gmV7.
  • Molecular Dynamics Simulation on ATAT (Appendix E.4), and visualization of velocity auto correlation function and total energy function, based on comments from reviewer gmV7.
  • Add two more baselines: GemNetOC, So3Karates (Table 1) based on the comments from Reviewer cAHU and 4VzQ.
  • Related works on Heterogenous Graph Neural Networks and Learning from long-range dependency (Appendix A), based on comments from reviewer GkgW, cAHU, and 4VzQ.
  • Update Figure 2 captions and added additional visual cues to avoid confusion, based on comments from reviewer GkgW.
  • Improve the clarity of legend on Figure 3, based on comments from reviewer GkgW.
  • Fix typos and reference issues, based on comments from reviewer GkgW.

We hope our modifications address your concerns. Please feel free to raise any other issues if you have additional ones.

Warmly,

Authors

AC 元评审

This paper proposes a new approach for modeling long range dependencies in graph neural networks. The proposed approach combines equivariant graph neural networks (EGNNs) with fragmentation-based methods to capture both short-range and long-range interactions. The authors demonstrate state-of-the-art results on large molecular datasets, along with complementary and extensive ablations. The reviewers were generally appreciative of both the methodological and empirical contributions, while raising some concerns about the definition of fragments and generalizability of the approach both within chemistry domains (ie, to unseen molecules) and non-chemistry domains.

为何不给更高分

From a practical perspective, the approach seems very specialized to the target domain and may not generalize broadly to unseen molecules.

为何不给更低分

Good methodology and empirical results.

最终决定

Accept (poster)