Higher-Rank Irreducible Cartesian Tensors for Equivariant Message Passing
We introduce higher-rank irreducible Cartesian tensors and their products for equivariant message passing.
摘要
评审与讨论
This work builds upon advances on equivariant and many-body architectures for the construction of neural network potentials. It lays out the formalism to substitute the conventionally-used spherical tensors in higher-rank models for Cartesian tensors. Taking as a reference the MACE architecture, the authors intend to show that this is a competitive approach to SOTA models in terms of accuracy and computational efficiency.
优点
The exploration of new methods to more efficiently learn machine learning force fields is an active area of research, and the use of higher-rank Cartesian approaches is quite novel, in contrast to the use of the spherical basis. The authors demonstrate that they can obtain results on benchmarks datasets that compete with SOTA models (both spherical and higher-rank Cartesian). The exposition of mathematical concepts is quite clear for a reader familiar with the literature. In terms of accuracy, the model is very satisfactory.
缺点
Although the achievement of competitive performance when compared to SOTA models is relevant enough, and the mathematical machinery is novel for the neural network potential field, I am not sure whether the authors have been able to demonstrate in some way why their method should be chosen in contrast to MACE, for example. The comparison of inference times seems to not favor the use of ICTP. However, I am aware of the fact that the formalism laid out in the paper allows the construction of other architectures, and that the design space of these models could be further investigated to find even more efficient models.
问题
-
It seems that the first model to explore the idea of higher-rank Cartesian tensors was TensorNet [30], even though it is not flexible to incorporate arbitrary ranks, and it does not explicitly account for many-body interactions. I miss some more thorough discussion on their differences, even more taking into account that [30] seems to display competitive performance to ICTP without making use of those more sophisticated approaches. I would encourage the authors to include some discussion in this regard. Is ICTP a combination of CACE and TensorNet?
-
I do not intend the authors to address the following question with more experiments, I acknowledge the limited time frame, but: experiments have been conducted on datasets consisting of single systems. Do the authors have any reference of how the model performs on datasets with varying chemical composition?
局限性
The authors have adequately addressed the limitations
We thank the reviewer for their positive assessment of the manuscript and have addressed each point they raised below. All numerical results for the experiments conducted for this review are presented in Tabs. 1, 2, and 3 of the attached PDF. We also include Fig. 1, illustrating inference time and memory consumption of ICTP and MACE as a function of and .
W1: To demonstrate the advantage of a Cartesian approach we can consider computational complexities for equivariant convolutions ( and for ICTP and MACE, respectively) and the product basis ( and , respectively). For more details on the asymptotic computational complexity, see W2.1 and Q1 by the Reviewer 69Cm and Q3 by the Reviewer qF29. Furthermore, Tab. 1 and Fig. 1 assess the inference time and memory consumption varying rank of messages and tensors embedding atomic environments ; we also vary the number of contracted tensors . Our results in W2.1 by the Reviewer 69Cm and Q3 by the Reviewer qF29 demonstrate that MACE scales worse than ICTP when increasing . Particularly, for MACE with , we could raise to a maximal value of 4 since, for larger values, we obtained an OOM error on an NVIDIA A100 GPU with 80GB. Spanning the -space more efficiently may be important to improve the model's expressive power for tasks requiring correlations of higher body orders. These correlations become more important when, e.g., environments are degenerate with respect to lower body orders, and higher accuracy is required [B, G].
Apart from the theoretical complexity and obtained inference times, as seen from an implementation perspective, symmetric tensors allow for more efficient implementations and algorithms for the general matrix-matrix multiplications (GEMM), which PyTorch has not yet provided. Finally, our approach can exploit the symmetry of tensors when computing forces and stresses, omitting the transpose calculation.
Q1: Our approach includes TensorNet and CACE as special cases. A TensorNet-like architecture could be defined with and , though with equivariant convolution filters. We provide the corresponding results in Tab. 3. We found model configurations with and (and ) outperform the model configuration with and by factors of 1.4 and 1.2 in energy and force RMSEs, respectively. Tab. 2 in the manuscript demonstrates a better accuracy for ICTP by 2.3 compared to CACE. We will add this discussion to the revised manuscript.
More specifically, TensorNet uses rank-2 reducible Cartesian tensors to embed atomic environments and decomposes them into irreducible ones before computing products with invariant radial filters, i.e., before computing messages. It includes explicit 3-body features in a message-passing layer since it computes a matrix-matrix product between node features and messages. CACE uses reducible higher-rank Cartesian tensors to embed local atomic environments and their full tensor contractions (see also MTP or GM-NN) to build invariant many-body filters. Our approach uses exclusively irreducible Cartesian tensors for embedding environments, equivariant convolutions, and product basis. Thus, we do not mix irreducible representations during our many-body message passing. We use irreducible tensor products for the equivariant convolution and go beyond invariant filters. Finally, we systematically construct equivariant many-body messages.
Q2: We added results for ICTP and MACE for the large-scale Ta-V-Cr-W data set [C], which is diverse and includes 0 K energies, forces, and stresses for 2-, 3-, and 4-component systems and 2500 K properties in 4-component disordered alloys [C]. It contains 6711 configurations with sizes ranging from 2 to 432 atoms in the periodic cell. We were experimenting with this data set before the rebuttal and performed a hyperparameter search for both models to obtain suitable relative weights for energy, force, and virial losses. No configuration for MACE provides competitive accuracy for energies and forces simultaneously. Tab. 3 shows that MACE at most matches the accuracy of ICTP on forces but is typically outperformed by a factor of 2.0 on energies.
I would like to thank the authors for their rebuttal. They have addressed my concerns satisfactorily, providing extensive clarifications, particularly on how TensorNet and CACE are related to the present work, a computational complexity comparison to MACE, and additional experiments. Furthermore, they provide good additional results, both in terms of inference times and in terms of accuracy on a more diverse dataset. Given this, and after reading other reviewers' impressions and how they are addressed by the authors, I rise my score.
Dear Reviewer,
We thank you for your prompt response and for raising your score. Your feedback and suggestions have significantly improved our work.
Results are obtained by averaging over 10 independent runs. Best performances are highlighted in bold. Inference time and memory consumption are measured for a batch size of 50. Inference time is reported per atom in μs; memory consumption is provided for the entire batch in GB.
| Subsystem | ICTP (L = 2) | ICTP (L = 1) | ICTP (L = 0) | MACE (L = 2) | MACE (L = 1) | MACE (L = 0) | ICTP (L = 2, ν = 2) | MTP | GM-NN | EAM | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| TaV | E | 1.02 ± 0.27 | 1.21 ± 0.54 | 1.65 ± 1.06 | 1.72 ± 0.67 | 1.76 ± 0.53 | 2.24 ± 1.34 | 1.24 ± 0.50 | 1.94 | 1.54 | 32.0 |
| F | 0.020 ± 0.002 | 0.022 ± 0.002 | 0.024 ± 0.002 | 0.019 ± 0.002 | 0.020 ± 0.003 | 0.022 ± 0.002 | 0.023 ± 0.002 | 0.050 | 0.029 | 0.404 | |
| TaCr | E | 1.81 ± 0.29 | 1.94 ± 0.23 | 2.13 ± 0.19 | 3.26 ± 0.42 | 3.31 ± 0.44 | 4.18 ± 0.56 | 2.4 ± 0.33 | 3.26 | 2.98 | 43.6 |
| F | 0.025 ± 0.007 | 0.024 ± 0.006 | 0.027 ± 0.005 | 0.029 ± 0.01 | 0.026 ± 0.007 | 0.028 ± 0.007 | 0.026 ± 0.006 | 0.057 | 0.038 | 0.343 | |
| TaW | E | 1.75 ± 0.11 | 1.87 ± 0.14 | 2.45 ± 0.31 | 2.73 ± 0.53 | 3.21 ± 0.55 | 3.57 ± 0.48 | 2.19 ± 0.54 | 2.72 | 2.99 | 44.8 |
| F | 0.017 ± 0.002 | 0.018 ± 0.002 | 0.020 ± 0.002 | 0.017 ± 0.002 | 0.018 ± 0.002 | 0.019 ± 0.002 | 0.018 ± 0.002 | 0.038 | 0.025 | 0.248 | |
| VCr | E | 1.74 ± 1.2 | 2.52 ± 2.43 | 2.13 ± 1.24 | 2.19 ± 0.78 | 2.82 ± 1.28 | 3.11 ± 1.42 | 1.89 ± 1.27 | 2.29 | 2.82 | 44.8 |
| F | 0.016 ± 0.002 | 0.018 ± 0.001 | 0.019 ± 0.001 | 0.016 ± 0.001 | 0.017 ± 0.001 | 0.018 ± 0.002 | 0.019 ± 0.001 | 0.036 | 0.025 | 0.270 | |
| VW | E | 1.32 ± 0.2 | 1.46 ± 0.16 | 1.69 ± 0.21 | 1.9 ± 0.19 | 1.94 ± 0.23 | 2.42 ± 0.24 | 1.61 ± 0.16 | 2.50 | 2.00 | 21.3 |
| F | 0.014 ± 0.002 | 0.015 ± 0.002 | 0.018 ± 0.003 | 0.014 ± 0.002 | 0.015 ± 0.002 | 0.017 ± 0.002 | 0.016 ± 0.002 | 0.037 | 0.023 | 0.292 | |
| CrW | E | 2.18 ± 0.93 | 2.45 ± 1.53 | 2.76 ± 1.15 | 2.31 ± 1.18 | 2.84 ± 0.98 | 4.14 ± 1.38 | 3.12 ± 1.90 | 4.35 | 2.87 | 23.4 |
| F | 0.018 ± 0.004 | 0.020 ± 0.005 | 0.024 ± 0.008 | 0.020 ± 0.009 | 0.019 ± 0.006 | 0.023 ± 0.007 | 0.022 ± 0.006 | 0.041 | 0.029 | 0.248 | |
| TaVCr | E | 0.79 ± 0.08 | 0.92 ± 0.17 | 1.00 ± 0.24 | 2.26 ± 0.54 | 2.71 ± 0.66 | 3.92 ± 0.77 | 0.97 ± 0.13 | 2.43 | 1.97 | 34.1 |
| F | 0.027 ± 0.001 | 0.029 ± 0.002 | 0.033 ± 0.002 | 0.023 ± 0.002 | 0.024 ± 0.001 | 0.028 ± 0.001 | 0.031 ± 0.002 | 0.054 | 0.045 | 0.313 | |
| TaVW | E | 1.00 ± 0.2 | 0.98 ± 0.18 | 1.26 ± 0.23 | 1.8 ± 0.35 | 1.97 ± 0.44 | 2.29 ± 0.86 | 0.95 ± 0.25 | 1.67 | 1.70 | 39.6 |
| F | 0.021 ± 0.001 | 0.022 ± 0.001 | 0.025 ± 0.001 | 0.021 ± 0.002 | 0.023 ± 0.001 | 0.026 ± 0.001 | 0.023 ± 0.001 | 0.043 | 0.034 | 0.321 | |
| TaCrW | E | 1.16 ± 0.15 | 1.28 ± 0.13 | 1.58 ± 0.29 | 1.67 ± 0.38 | 1.48 ± 0.50 | 2.08 ± 0.57 | 1.24 ± 0.11 | 2.08 | 2.19 | 23.6 |
| F | 0.022 ± 0.001 | 0.024 ± 0.001 | 0.027 ± 0.001 | 0.028 ± 0.002 | 0.030 ± 0.002 | 0.033 ± 0.002 | 0.026 ± 0.001 | 0.051 | 0.039 | 0.327 | |
| VCrW | E | 1.00 ± 0.16 | 1.07 ± 0.14 | 1.37 ± 0.13 | 1.97 ± 0.5 | 2.21 ± 0.42 | 2.86 ± 0.64 | 1.10 ± 0.14 | 1.37 | 1.94 | 19.4 |
| F | 0.018 ± 0.001 | 0.019 ± 0.001 | 0.022 ± 0.001 | 0.017 ± 0.001 | 0.019 ± 0.001 | 0.021 ± 0.001 | 0.020 ± 0.001 | 0.040 | 0.031 | 0.314 | |
| TaVCrW (0 K) | E | 1.22 ± 0.07 | 1.30 ± 0.1 | 1.48 ± 0.16 | 2.26 ± 0.55 | 2.48 ± 0.46 | 3.60 ± 0.54 | 1.33 ± 0.17 | 2.09 | 2.16 | 50.8 |
| F | 0.021 ± 0.002 | 0.022 ± 0.002 | 0.025 ± 0.002 | 0.022 ± 0.001 | 0.023 ± 0.002 | 0.027 ± 0.001 | 0.024 ± 0.002 | 0.049 | 0.037 | 0.488 | |
| TaVCrW (2500 K) | E | 1.63 ± 0.07 | 1.74 ± 0.11 | 2.09 ± 0.09 | 2.22 ± 0.48 | 2.34 ± 0.59 | 3.68 ± 0.70 | 2.06 ± 0.09 | 2.40 | 2.67 | 59.4 |
| F | 0.116 ± 0.002 | 0.121 ± 0.002 | 0.141 ± 0.003 | 0.119 ± 0.007 | 0.126 ± 0.006 | 0.150 ± 0.003 | 0.140 ± 0.002 | 0.156 | 0.179 | 0.521 | |
| Overall | E | 1.38 ± 0.09 | 1.56 ± 0.21 | 1.80 ± 0.18 | 2.19 ± 0.31 | 2.42 ± 0.31 | 3.17 ± 0.28 | 1.67 ± 0.21 | 2.43 | 2.32 | 37.14 |
| F | 0.028 ± 0.001 | 0.029 ± 0.001 | 0.034 ± 0.001 | 0.029 ± 0.001 | 0.030 ± 0.001 | 0.034 ± 0.001 | 0.032 ± 0.001 | 0.054 | 0.043 | 0.443 | |
| Inference time | 51.78 ± 1.18 | 25.09 ± 0.02 | 14.59 ± 0.01 | 29.48 ± 0.23 | 15.37 ± 0.04 | 4.43 ± 0.00 | 14.97 ± 0.09 | 17.57 | 7.25 | 0.50 | |
| Memory consumption | 36.78 ± 0.00 | 16.93 ± 0.00 | 8.48 ± 0.00 | 28.82 ± 0.00 | 13.87 ± 0.00 | 5.91 ± 0.00 | 13.15 ± 0.00 | – | – | – |
All values are obtained by averaging over five independent runs. Best performances are highlighted in bold. Inference time and memory consumption are measured for a batch size of 10. Inference time is reported per structure in ms; memory consumption is provided for the entire batch in GB.
| L = 1 | L = 2 | L = 3 | ||||
|---|---|---|---|---|---|---|
| ICTP | MACE | ICTP | MACE | ICTP | MACE | |
| Inference times | ||||||
| ν = 1 | 0.76 ± 0.17 | 1.02 ± 0.03 | 0.87 ± 0.18 | 1.38 ± 0.04 | 0.98 ± 0.26 | 1.88 ± 0.03 |
| ν = 2 | 0.59 ± 0.20 | 1.12 ± 0.03 | 1.03 ± 0.21 | 1.52 ± 0.05 | 1.34 ± 0.08 | 2.0 ± 0.10 |
| ν = 3 | 0.79 ± 0.22 | 1.23 ± 0.03 | 1.15 ± 0.08 | 1.67 ± 0.03 | 1.85 ± 0.13 | 2.23 ± 0.03 |
| ν = 4 | 0.94 ± 0.17 | 1.41 ± 0.11 | 1.31 ± 0.21 | 1.83 ± 0.01 | 2.07 ± 0.20 | 2.53 ± 0.01 |
| ν = 5 | 1.02 ± 0.17 | 1.52 ± 0.08 | 1.72 ± 0.07 | 2.26 ± 0.03 | 3.61 ± 0.02 | OOM |
| ν = 6 | 1.00 ± 0.07 | 1.77 ± 0.05 | 1.83 ± 0.16 | 27.85 ± 0.01 | 16.76 ± 0.35 | OOM |
| Memory consumption | ||||||
| ν = 1 | 0.05 ± 0.00 | 0.04 ± 0.00 | 0.08 ± 0.00 | 0.06 ± 0.00 | 0.21 ± 0.00 | 0.13 ± 0.00 |
| ν = 2 | 0.05 ± 0.00 | 0.04 ± 0.00 | 0.08 ± 0.00 | 0.07 ± 0.00 | 0.28 ± 0.09 | 0.13 ± 0.00 |
| ν = 3 | 0.05 ± 0.00 | 0.04 ± 0.00 | 0.10 ± 0.00 | 0.08 ± 0.00 | 0.51 ± 0.03 | 0.23 ± 0.00 |
| ν = 4 | 0.05 ± 0.00 | 0.05 ± 0.00 | 0.18 ± 0.08 | 0.30 ± 0.00 | 1.07 ± 0.10 | 4.16 ± 0.00 |
| ν = 5 | 0.05 ± 0.00 | 0.07 ± 0.00 | 0.35 ± 0.07 | 3.18 ± 0.00 | 5.07 ± 0.02 | OOM |
| ν = 6 | 0.11 ± 0.09 | 0.22 ± 0.00 | 0.93 ± 0.00 | 50.49 ± 0.00 | 28.48 ± 0.03 | OOM |
All values are obtained by averaging over five independent runs. Best performances are highlighted in bold. Inference time and memory consumption are measured for a batch size of 100. Inference time is reported per structure in ms; memory consumption is provided for the entire batch in GB.
| ICTP (L = 2) | MACE (L = 2) | ||
|---|---|---|---|
| 300 K | E | 12.90 ± 1.06 | 13.50 ± 1.71 |
| F | 29.90 ± 0.25 | 30.18 ± 0.38 | |
| 600 K | E | 29.97 ± 0.94 | 31.32 ± 2.16 |
| F | 62.80 ± 0.45 | 63.04 ± 0.73 | |
| 1200 K | E | 81.03 ± 1.64 | 81.54 ± 2.02 |
| F | 146.96 ± 1.30 | 149.44 ± 1.94 | |
| Dihedral slices | E | 22.84 ± 2.96 | 28.08 ± 4.04 |
| F | 48.82 ± 5.25 | 49.62 ± 2.92 | |
| Inference time | 2.62 ± 0.02 | 2.96 ± 0.06 | |
| Memory consumption | 32.57 ± 0.00 | 23.32 ± 0.00 |
This paper introduces the use of higher-rank irreducible Cartesian tensors as an alternative to spherical tensors for equivariant message passing in machine learning interatomic potentials. The authors illustrate clearly on how to construct these tensors and their products, prove equivariance properties, and evaluate the approach empirically on several molecular datasets.
优点
- The mathematical foundations are clearly illustrated, with detailed explanations of how to construct irreducible Cartesian tensors and compute their products.
- The experiments on out-of-domain extrapolation, particularly on the 3BPA and acetylacetone datasets, provide valuable insights into the generalization capabilities of the proposed method.
- The paper demonstrates that irreducible Cartesian tensor-based models can achieve comparable or sometimes better performance than state-of-the-art spherical tensor models.
缺点
-
The empirical evaluation is limited to relatively simple molecular datasets. The paper would be strengthened by including experiments on more challenging datasets such as MD22 or heterogeneous datasets like QM9.
-
The efficiency gain and the performance gain is not that appealing to my eye. It seems little more than “instead of using that math, you can use this math!” without a very strong theoretical justification for why to do so. The author could have done a better job of explaining what is the fundamental difference/advantage of the proposed cartesian tensors when compared with the sphereical tensors.
问题
- What are the core differences between this method and TensorNet? A clearer comparison would help position this work in the context of existing literature.
- Is the proposed model compatible with Hamiltonian prediction? This could be an interesting avenue for future work.
- Can the authors provide plots showing how their model scales with increasing L?
- The paper mentions "transferability" in line 225. Could the authors clarify what they mean by this term in this context?
- The claim that Cartesian tensors are advantageous to spherical tensors requires further explanation. From a representation power perspective, aren't they equivalent? Is it possible that the observed performance gains are due to hyperparameter tuning rather than fundamental differences in representation power?
局限性
See above.
We thank the reviewer for their constructive assessment of the manuscript. We have addressed each point they raised below. All numerical results for the performed experiments are presented in Tabs. 1–3 and Fig. 1 of the attached PDF.
W1: We added a large-scale data set that aims to assess the model's performance on a varying number of atom types/components and relaxed (0 K) as well as high-temperature structures. The Ta-V-Cr-W data set is diverse and includes 0 K energies, forces, and stresses for 2-, 3-, and 4-component systems and 2500 K properties in 4-component disordered alloys [C]. It contains 6711 configurations with sizes ranging from 2 to 432 atoms in the periodic cell. We were experimenting with this data set before the rebuttal and performed a hyperparameter search for both models to obtain suitable relative weights for energy, force, and virial losses. No configuration for MACE provides competitive accuracy for energies and forces simultaneously. Tab. 3 shows that MACE at most matches the accuracy of ICTP on forces but is typically outperformed by a factor of 2.0 on energies.
We decided not to use MD22 and QM9 since they do not include variations in atom types or MD trajectories, respectively.
W2: We agree that we could better motivate our approach regarding the computational advantages compared to spherical tensors. Considering results from Q3 and computational complexities (Q1 by the Reviewer 69Cm) for equivariant convolutions ( and for ICTP and MACE, respectively) and the product basis ( and , respectively), employing irreducible Cartesian tensors offers more than simply replacing mathematical expressions. Indeed, ICTP is more efficient in covering larger (and larger if is large) than MACE; see also Q3.
We expect our approach to inspire the development of computationally efficient models and frameworks using strategies other than those employed for spherical ones. We also demonstrate improved efficiency by leveraging the symmetries of tensor products and coupled product features; see Tab. 2 in the manuscript. Apart from the above, symmetric tensors allow for more efficient implementations and algorithms for the general matrix-matrix multiplications (GEMM), which PyTorch has not yet provided. Finally, our approach can exploit the symmetry of tensors when computing forces and stresses, omitting the transpose calculation.
Q1: Our approach includes TensorNet as a special case with and , though with equivariant convolution filters; for results, see Tab. 3. Please see also W1 and Q2 by the Reviewer 69Cm. TensorNet uses reducible Cartesian tensors to embed atomic environments and decomposes them into irreducible ones before computing products with invariant radial filters. It includes explicit three-body features since it computes a matrix-matrix product between node features and messages. Our approach uses exclusively irreducible Cartesian tensors for embedding atomic environments, equivariant convolutions, and product basis. Thus, we do not mix irreducible representations during message passing. We use irreducible tensor products for the equivariant convolution and go beyond invariant filters. Finally, we systematically construct equivariant many-body messages.
Q2: We see no hurdle to applying our approach to -center properties such as single-particle Hamiltonian matrices; see [D]. Our approach also includes all operations required for [E] and [F].
Q3: We agree that an ablation study on and would improve our work; see also W2.1 and Q1 by the Reviewer 69Cm. Tab. 1 and Fig. 1 show the inference time and memory consumption for varying ranks (messages) and (tensors embedding environments); we also vary the number of contracted tensors . Indeed, ICTP outperforms MACE for most parameter values. Particularly, ICTP allows spanning the -space more efficiently and, thus, improves models' expressive power for tasks requiring correlations of higher body orders. These correlations become more important when, e.g., environments are degenerate with respect to lower body orders, and higher accuracy is required [B, G].
ICTP is also more computationally efficient if and . Note that is sufficient for most applications in physics; see, e.g., [A]. For neighborhood orientations with -fold symmetries, however, at least rank- tensors may be required [B]. These symmetries are typically lifted in atomistic simulations. Fig. 1 shows that for , Cartesian models will also be advantageous if . These results agree with our complexity analysis in Q1 by the Reviewer 69Cm.
Q4: When we mention an interatomic potential's transferability, we refer to its ability to accurately predict energies, forces, and stresses for crystal structures, temperatures, and stoichiometries on which it was not trained; see [77] in the manuscript.
Q5: We agree that spherical and irreducible Cartesian tensors (also reducible ones) should have comparable expressive power for a fixed since both tensors are related through a linear transformation; see [40, 49, 59-61] in the manuscript. However, our results in Q3 demonstrate that MACE scales worse than ICTP when increasing , i.e., when increasing the expressive power of the model. Particularly, for MACE with , we could raise to a maximal value of 4 since, for larger values, we obtained an OOM error on an NVIDIA A100 GPU with 80GB. We also conducted a careful hyperparameter tuning for MACE and ICTP to ensure a fair comparison, expecting similar energy and force errors due to the comparable expressive power of spherical and Cartesian tensors.
Results are obtained by averaging over 10 independent runs. Best performances are highlighted in bold. Inference time and memory consumption are measured for a batch size of 50. Inference time is reported per atom in μs; memory consumption is provided for the entire batch in GB.
| Subsystem | ICTP (L = 2) | ICTP (L = 1) | ICTP (L = 0) | MACE (L = 2) | MACE (L = 1) | MACE (L = 0) | ICTP (L = 2, ν = 2) | MTP | GM-NN | EAM | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| TaV | E | 1.02 ± 0.27 | 1.21 ± 0.54 | 1.65 ± 1.06 | 1.72 ± 0.67 | 1.76 ± 0.53 | 2.24 ± 1.34 | 1.24 ± 0.50 | 1.94 | 1.54 | 32.0 |
| F | 0.020 ± 0.002 | 0.022 ± 0.002 | 0.024 ± 0.002 | 0.019 ± 0.002 | 0.020 ± 0.003 | 0.022 ± 0.002 | 0.023 ± 0.002 | 0.050 | 0.029 | 0.404 | |
| TaCr | E | 1.81 ± 0.29 | 1.94 ± 0.23 | 2.13 ± 0.19 | 3.26 ± 0.42 | 3.31 ± 0.44 | 4.18 ± 0.56 | 2.4 ± 0.33 | 3.26 | 2.98 | 43.6 |
| F | 0.025 ± 0.007 | 0.024 ± 0.006 | 0.027 ± 0.005 | 0.029 ± 0.01 | 0.026 ± 0.007 | 0.028 ± 0.007 | 0.026 ± 0.006 | 0.057 | 0.038 | 0.343 | |
| TaW | E | 1.75 ± 0.11 | 1.87 ± 0.14 | 2.45 ± 0.31 | 2.73 ± 0.53 | 3.21 ± 0.55 | 3.57 ± 0.48 | 2.19 ± 0.54 | 2.72 | 2.99 | 44.8 |
| F | 0.017 ± 0.002 | 0.018 ± 0.002 | 0.020 ± 0.002 | 0.017 ± 0.002 | 0.018 ± 0.002 | 0.019 ± 0.002 | 0.018 ± 0.002 | 0.038 | 0.025 | 0.248 | |
| VCr | E | 1.74 ± 1.2 | 2.52 ± 2.43 | 2.13 ± 1.24 | 2.19 ± 0.78 | 2.82 ± 1.28 | 3.11 ± 1.42 | 1.89 ± 1.27 | 2.29 | 2.82 | 44.8 |
| F | 0.016 ± 0.002 | 0.018 ± 0.001 | 0.019 ± 0.001 | 0.016 ± 0.001 | 0.017 ± 0.001 | 0.018 ± 0.002 | 0.019 ± 0.001 | 0.036 | 0.025 | 0.270 | |
| VW | E | 1.32 ± 0.2 | 1.46 ± 0.16 | 1.69 ± 0.21 | 1.9 ± 0.19 | 1.94 ± 0.23 | 2.42 ± 0.24 | 1.61 ± 0.16 | 2.50 | 2.00 | 21.3 |
| F | 0.014 ± 0.002 | 0.015 ± 0.002 | 0.018 ± 0.003 | 0.014 ± 0.002 | 0.015 ± 0.002 | 0.017 ± 0.002 | 0.016 ± 0.002 | 0.037 | 0.023 | 0.292 | |
| CrW | E | 2.18 ± 0.93 | 2.45 ± 1.53 | 2.76 ± 1.15 | 2.31 ± 1.18 | 2.84 ± 0.98 | 4.14 ± 1.38 | 3.12 ± 1.90 | 4.35 | 2.87 | 23.4 |
| F | 0.018 ± 0.004 | 0.020 ± 0.005 | 0.024 ± 0.008 | 0.020 ± 0.009 | 0.019 ± 0.006 | 0.023 ± 0.007 | 0.022 ± 0.006 | 0.041 | 0.029 | 0.248 | |
| TaVCr | E | 0.79 ± 0.08 | 0.92 ± 0.17 | 1.00 ± 0.24 | 2.26 ± 0.54 | 2.71 ± 0.66 | 3.92 ± 0.77 | 0.97 ± 0.13 | 2.43 | 1.97 | 34.1 |
| F | 0.027 ± 0.001 | 0.029 ± 0.002 | 0.033 ± 0.002 | 0.023 ± 0.002 | 0.024 ± 0.001 | 0.028 ± 0.001 | 0.031 ± 0.002 | 0.054 | 0.045 | 0.313 | |
| TaVW | E | 1.00 ± 0.2 | 0.98 ± 0.18 | 1.26 ± 0.23 | 1.8 ± 0.35 | 1.97 ± 0.44 | 2.29 ± 0.86 | 0.95 ± 0.25 | 1.67 | 1.70 | 39.6 |
| F | 0.021 ± 0.001 | 0.022 ± 0.001 | 0.025 ± 0.001 | 0.021 ± 0.002 | 0.023 ± 0.001 | 0.026 ± 0.001 | 0.023 ± 0.001 | 0.043 | 0.034 | 0.321 | |
| TaCrW | E | 1.16 ± 0.15 | 1.28 ± 0.13 | 1.58 ± 0.29 | 1.67 ± 0.38 | 1.48 ± 0.50 | 2.08 ± 0.57 | 1.24 ± 0.11 | 2.08 | 2.19 | 23.6 |
| F | 0.022 ± 0.001 | 0.024 ± 0.001 | 0.027 ± 0.001 | 0.028 ± 0.002 | 0.030 ± 0.002 | 0.033 ± 0.002 | 0.026 ± 0.001 | 0.051 | 0.039 | 0.327 | |
| VCrW | E | 1.00 ± 0.16 | 1.07 ± 0.14 | 1.37 ± 0.13 | 1.97 ± 0.5 | 2.21 ± 0.42 | 2.86 ± 0.64 | 1.10 ± 0.14 | 1.37 | 1.94 | 19.4 |
| F | 0.018 ± 0.001 | 0.019 ± 0.001 | 0.022 ± 0.001 | 0.017 ± 0.001 | 0.019 ± 0.001 | 0.021 ± 0.001 | 0.020 ± 0.001 | 0.040 | 0.031 | 0.314 | |
| TaVCrW (0 K) | E | 1.22 ± 0.07 | 1.30 ± 0.1 | 1.48 ± 0.16 | 2.26 ± 0.55 | 2.48 ± 0.46 | 3.60 ± 0.54 | 1.33 ± 0.17 | 2.09 | 2.16 | 50.8 |
| F | 0.021 ± 0.002 | 0.022 ± 0.002 | 0.025 ± 0.002 | 0.022 ± 0.001 | 0.023 ± 0.002 | 0.027 ± 0.001 | 0.024 ± 0.002 | 0.049 | 0.037 | 0.488 | |
| TaVCrW (2500 K) | E | 1.63 ± 0.07 | 1.74 ± 0.11 | 2.09 ± 0.09 | 2.22 ± 0.48 | 2.34 ± 0.59 | 3.68 ± 0.70 | 2.06 ± 0.09 | 2.40 | 2.67 | 59.4 |
| F | 0.116 ± 0.002 | 0.121 ± 0.002 | 0.141 ± 0.003 | 0.119 ± 0.007 | 0.126 ± 0.006 | 0.150 ± 0.003 | 0.140 ± 0.002 | 0.156 | 0.179 | 0.521 | |
| Overall | E | 1.38 ± 0.09 | 1.56 ± 0.21 | 1.80 ± 0.18 | 2.19 ± 0.31 | 2.42 ± 0.31 | 3.17 ± 0.28 | 1.67 ± 0.21 | 2.43 | 2.32 | 37.14 |
| F | 0.028 ± 0.001 | 0.029 ± 0.001 | 0.034 ± 0.001 | 0.029 ± 0.001 | 0.030 ± 0.001 | 0.034 ± 0.001 | 0.032 ± 0.001 | 0.054 | 0.043 | 0.443 | |
| Inference time | 51.78 ± 1.18 | 25.09 ± 0.02 | 14.59 ± 0.01 | 29.48 ± 0.23 | 15.37 ± 0.04 | 4.43 ± 0.00 | 14.97 ± 0.09 | 17.57 | 7.25 | 0.50 | |
| Memory consumption | 36.78 ± 0.00 | 16.93 ± 0.00 | 8.48 ± 0.00 | 28.82 ± 0.00 | 13.87 ± 0.00 | 5.91 ± 0.00 | 13.15 ± 0.00 | – | – | – |
Thank you for your response.
I am curious on how you trained the Ta-V-Cr-W systems. Did you train them jointly or separately? If separately, can model train on low temperature extrapolates to high temperature? How long does it take to train the system? Also, how do you handle the heterogeneity in the system for predicting the energies and forces? Can I find a reference for the dataset?
All values are obtained by averaging over five independent runs. Best performances are highlighted in bold. Inference time and memory consumption are measured for a batch size of 100. Inference time is reported per structure in ms; memory consumption is provided for the entire batch in GB.
| ICTP (L = 2) | MACE (L = 2) | ||
|---|---|---|---|
| 300 K | E | 12.90 ± 1.06 | 13.50 ± 1.71 |
| F | 29.90 ± 0.25 | 30.18 ± 0.38 | |
| 600 K | E | 29.97 ± 0.94 | 31.32 ± 2.16 |
| F | 62.80 ± 0.45 | 63.04 ± 0.73 | |
| 1200 K | E | 81.03 ± 1.64 | 81.54 ± 2.02 |
| F | 146.96 ± 1.30 | 149.44 ± 1.94 | |
| Dihedral slices | E | 22.84 ± 2.96 | 28.08 ± 4.04 |
| F | 48.82 ± 5.25 | 49.62 ± 2.92 | |
| Inference time | 2.62 ± 0.02 | 2.96 ± 0.06 | |
| Memory consumption | 32.57 ± 0.00 | 23.32 ± 0.00 |
All values are obtained by averaging over five independent runs. Best performances are highlighted in bold. Inference time and memory consumption are measured for a batch size of 10. Inference time is reported per structure in ms; memory consumption is provided for the entire batch in GB.
| L = 1 | L = 2 | L = 3 | ||||
|---|---|---|---|---|---|---|
| ICTP | MACE | ICTP | MACE | ICTP | MACE | |
| Inference times | ||||||
| ν = 1 | 0.76 ± 0.17 | 1.02 ± 0.03 | 0.87 ± 0.18 | 1.38 ± 0.04 | 0.98 ± 0.26 | 1.88 ± 0.03 |
| ν = 2 | 0.59 ± 0.20 | 1.12 ± 0.03 | 1.03 ± 0.21 | 1.52 ± 0.05 | 1.34 ± 0.08 | 2.0 ± 0.10 |
| ν = 3 | 0.79 ± 0.22 | 1.23 ± 0.03 | 1.15 ± 0.08 | 1.67 ± 0.03 | 1.85 ± 0.13 | 2.23 ± 0.03 |
| ν = 4 | 0.94 ± 0.17 | 1.41 ± 0.11 | 1.31 ± 0.21 | 1.83 ± 0.01 | 2.07 ± 0.20 | 2.53 ± 0.01 |
| ν = 5 | 1.02 ± 0.17 | 1.52 ± 0.08 | 1.72 ± 0.07 | 2.26 ± 0.03 | 3.61 ± 0.02 | OOM |
| ν = 6 | 1.00 ± 0.07 | 1.77 ± 0.05 | 1.83 ± 0.16 | 27.85 ± 0.01 | 16.76 ± 0.35 | OOM |
| Memory consumption | ||||||
| ν = 1 | 0.05 ± 0.00 | 0.04 ± 0.00 | 0.08 ± 0.00 | 0.06 ± 0.00 | 0.21 ± 0.00 | 0.13 ± 0.00 |
| ν = 2 | 0.05 ± 0.00 | 0.04 ± 0.00 | 0.08 ± 0.00 | 0.07 ± 0.00 | 0.28 ± 0.09 | 0.13 ± 0.00 |
| ν = 3 | 0.05 ± 0.00 | 0.04 ± 0.00 | 0.10 ± 0.00 | 0.08 ± 0.00 | 0.51 ± 0.03 | 0.23 ± 0.00 |
| ν = 4 | 0.05 ± 0.00 | 0.05 ± 0.00 | 0.18 ± 0.08 | 0.30 ± 0.00 | 1.07 ± 0.10 | 4.16 ± 0.00 |
| ν = 5 | 0.05 ± 0.00 | 0.07 ± 0.00 | 0.35 ± 0.07 | 3.18 ± 0.00 | 5.07 ± 0.02 | OOM |
| ν = 6 | 0.11 ± 0.09 | 0.22 ± 0.00 | 0.93 ± 0.00 | 50.49 ± 0.00 | 28.48 ± 0.03 | OOM |
Sorry for the oversight. I was not paying so much attention to the general response. That addresses most of my concern. But I am still worried that the performance/efficiency improvements is not significant and stronger baselines (such as Equiformer V2) or more standardized datasets (such as MD22/OCP as mentioned by Reviewer 69Cm) are needed.
Dear Reviewer,
We thank you for your prompt response. We noticed that other reviewers cannot read your comment. Therefore, we will add it to allow them to follow our discussion:
Thank you for your response.
I am curious on how you trained the Ta-V-Cr-W systems. Did you train them jointly or separately? If separately, can model train > on low temperature extrapolates to high temperature? How long does it take to train the system? Also, how do you handle the > heterogeneity in the system for predicting the energies and forces? Can I find a reference for the dataset?
We have addressed each of your questions below:
-
We train ICTP and MACE using all Ta-V-Cr-W subsystems simultaneously. Particularly, as already stated in the general response, all models are trained using 5373 configurations (4873 are used for training and 500—for early stopping), while the remaining 1338 configurations are reserved for testing the models' performance. The performance is tested separately using 0 K binaries, ternaries, quaternaries, and near-melting temperature four-component disordered alloys.
-
Models trained exclusively on 0 K subsystems are not expected to generalize to near-melting temperature four-component disordered alloys. The 0 K subsystems span: (i) different atomic combinations for relaxed binary, ternary, and quaternary systems; (ii) different low-temperature ordering in the Ta-V-Cr-W family (B2 ordering, B32 ordering, random binary solid solution, BCC interface); (iii) all possible phase separations on the TaVCrW lattice (B2/B2 ordering, B2/B32 ordering, B32/B32 ordering, B2/random binary ordering, B32/random binary ordering, random binary/random binary ordering). None overlaps sufficiently in local environments with high-temperature (2500 K) disordered structures. For more details on the data set, we refer to the "Description of the data set" section of the original publication [C].
-
Training a single model requires up to 12 hours on an NVIDIA A100 GPU with 80GB.
-
We did not implement any specific step for handling the heterogeneity in the Ta-V-Cr-W data set. We only increased the mini-batch size to 32 for both models to account for energy statistics and reduced the relative weight of the force loss.
-
For the dataset, we have referenced [C] in our previous response (K. Gubaev, V. Zaverkin, P. Srinivasan et al.: Performance of two complementary machine-learned potentials in modelling chemically complex systems. npj Comput. Mater. 9, 129 (2023)). The data set can be accessed via the link: https://doi.org/10.18419/darus-3516.
We hope we could properly address your questions and awaiting on your response.
Dear reviewer,
We again noticed that other reviewers cannot read your comment. Therefore, we will add it to allow them to follow our discussion:
Sorry for the oversight. I was not paying so much attention to the general response. That addresses most of my concern. But > I am still worried that the performance/efficiency improvements is not significant and stronger baselines (such as Equiformer > V2) or more standardized datasets (such as MD22/OCP as mentioned by Reviewer 69Cm) are needed.
We want to point out that the official review does not mention comparing to an additional baseline, such as EquiformerV2. Besides that, baselines, such as MACE, Allegro, NequIP, TensorNet, and CACE, which we used in our work, are current state-of-the-art models. Therefore, we do not see how adding experimental results for EquiformerV2 could further contribute to demonstrating the performance and efficiency advantages of our approach.
We evaluated ICTP using rMD17, 3BPA, and Acetylacetone, which other state-of-the-art models commonly use. Also, as requested by the reviewer, we included another, more challenging data set (Ta-V-Cr-W) and motivated our choice. Including yet another benchmark data set, such as MD22 or OC20/22, would not improve the value of our work. As we already explained, the suggested MD22 data set does not include variations in atom types. Thus, it would not provide additional insights beyond those we acquired with rMD17; the models' performance would again be tested on vibrational degrees of freedom of a single molecule. Furthermore, and contrary to the Ta-V-Cr-W data set, OC20/22 does not allow systematic evaluation of the models' performance across different crystal structures, temperatures, and stoichiometries.
We would appreciate further clarification on the reviewer's concerns regarding the models and data sets used in the manuscript and the response to the official review. We are eager to better understand the specific reasons why our current evaluation is not convincing so far.
Thank you for your prompt response. I will raise my score. However, I hope that more challenging datasets will be included in the revised version. MD22 can measure long-range effects, and OCP has numerous other baseline performances reported in the literature. To provide a more comprehensive evaluation, it would be convincing to add Equiformer V2 as a baseline to the Ta-V-Cr-W system. This should be relatively straightforward, and I am interested in understanding the relative performance of this model.
Dear reviewer,
We thank you for your feedback on our work and for raising your score.
In this work, the authors proposed higher-rank irreducible Cartesian Tensor Product, and explored its usage in equivariant neural networks design in scientific applications such as molecular modeling. The authors firstly prove that irreducible Cartesian Tensor Product is equivariant to O(3) group, and further show that higher-rank (e.g., > 2) operations can be used in an efficient way for models using many-body interactions. Experiments are conducted to demonstrate the effectiveness of the proposed approach.
优点
- The problem this work aims to tackle is of great significance in real-world scientific applications.
- The proposed approach is interesting and can potentially improve a new class of equivariant neural networks for crucial tasks.
- The paper is easy to follow.
缺点
-
The motivation of this work needs to be better explained and presented. As stated in the Introduction and Related Works, the major disadvantage of spherical tensors is computationally demanding, which motivates the development of Cartesian-Tensor-Product-based approaches. However, the authors do not well discuss the disadvantages of existing Cartesian-Tensor-Product-based approaches (e.g., inefficiency in scaling tensor ranks up), which is necessary as a solid support for the motivation of this work. Besides, the lack of comprehensive discussion and comparisons between this work and existing Cartesian-Tensor-Product-based approaches makes the actual value of this work doubtful for readers who are not familiar with the context.
-
Experimental results are weak in demonstrating the superiority of the proposed approach:
- Lack of detailed efficiency comparisons: in this work, the authors demonstrate that the proposed irreducible Cartesian Tensor Product can be used for both two-body and many-body feature interaction or equivariant convolutions with better theoretically-proved efficiency. However, there are no comprehensive comparisons covering different operations and also different rank L.
- Lack of large-scale experiments: all datasets used in this work (rMD17, 3BPA and Acetylacetone) have limited scales of molecular systems and number of samples. Since the proposed approach is claimed to bring benefits in efficiency, it would be necessary to verify it on larger-scale datasets such as OC20/22.
- Lack of experiments on applying iCTP for two-body operations only: MACE is mainly used to compare iCTP and spherical tensor products. However, two-body operations like equivariant feature interaction/equivariant convolution are also widely used in equivariant networks. It would be better to further verify the effectiveness of iCTP on these operations only to demonstrate its generality.
Overall, it is of great significance to design more powerful and efficient equivariant networks for real-world applications. However, several issues exist in the current submission. My recommendation is Borderline Accept, and I will carefully read the rebuttal and other reviews to decide whether to decrease or increase my scores.
问题
-
Could you detailedly explain the difference in computational complexity between iCTP and spherical Tensor Products on both two-body and many-body interactions?
-
Could you comprehensively compare this work with TensorNet and compare the strengths and weaknesses of them?
局限性
The authors have discussed the limitations.
We thank the reviewer for their valuable feedback and comments. We have addressed each of their points below and included all new results in Tabs. 1–3 and Fig. 1 of the attached PDF.
W1: We agree that discussing these points would improve our work. Indeed, our approach includes TensorNet and CACE as special cases. A TensorNet-like architecture could be defined with and , though with equivariant convolution filters; see Tab. 3. For the computational complexity, see W2.1 and Q1.
Unlike TensorNet and CACE, which use invariant radial (both) and many-body (CACE) filters, we propose equivariant convolution filters based on irreducible Cartesian tensors. Additionally, while TensorNet and CACE embed atomic environments using reducible tensors, we use exclusively irreducible ones and ensure that irreducible representations are not mixed during message-passing. TensorNet decomposes reducible rank-2 tensors before computing messages and builds explicit 3-body features. In contrast, our approach introduces the product basis using irreducible Cartesian tensor products and systematically constructs equivariant many-body messages. The limited message-passing mechanisms in TensorNet and CACE restrict their architectures and expressive power. Our approach enables the systematic construction of O(3) equivariant models and enhances their expressive power.
W2.1: We agree that an ablation study on and would improve our work. Tab. 1 and Fig. 1 show the inference time and memory consumption for varying ranks (messages) and (tensors embedding environments); we also vary the number of contracted tensors . Indeed, ICTP outperforms MACE for most parameter values. Particularly, ICTP allows spanning the -space more efficiently and, thus, improves models' expressive power for tasks requiring correlations of higher body orders. These correlations become more important when, e.g., environments are degenerate with respect to lower body orders, and higher accuracy is required [B, G].
ICTP is also more computationally efficient if and . Note that is sufficient for most applications in physics; see, e.g., [A]. For environments with -fold symmetries, however, rank- tensors may be required [B]. These symmetries are typically lifted in atomistic simulations. Fig. 1 shows that for , Cartesian models will also be advantageous if . These results agree with our complexity analysis in Q1.
W2.2: We added a large-scale data set that aims to assess the model's performance on a varying number of atom types/components and relaxed (0 K) as well as high-temperature structures. The Ta-V-Cr-W data set is diverse and includes 0 K energies, forces, and stresses for 2-, 3-, and 4-component systems and 2500 K properties in 4-component disordered alloys [C]. It contains 6711 configurations with sizes ranging from 2 to 432 atoms in the periodic cell. We run a hyperparameter search for both models to obtain suitable relative weights for energy, force, and virial losses. No configuration for MACE provides competitive accuracy for energies and forces simultaneously. Tab. 3 shows that MACE at most matches the accuracy of ICTP on forces but is typically outperformed by a factor of 2.0 on energies.
We decided not to use OC20/22 as it does not allow systematic evaluation of models' performance across different crystal structures, temperatures, and stoichiometries, which can facilitate further method development.
W2.3: Tab. 2 shows ICTP/MACE results with , i.e., using only two-body interactions. ICTP has accuracy comparable to or better than that of MACE and inference times smaller by 1.13.
Q1: For irreducible Cartesian tensors, the computational complexity of a tensor product is ; see Section B3. Thus, for two-body interactions, we get (: number of edges; : tensor rank; : number of features). The Clebsch-Gordan (CG) tensor product in spherical models has the complexity of . Thus, for an equivariant convolution, we get . For many-body interactions, we obtain and for ICTP and MACE, respectively (: number of nodes; : number of contracted tensors; : all possible -fold tensor contractions). Spherical models can use generalized CG coefficients, resulting in . The factor is removed in MACE by restricting the parameterization to uncoupled features, i.e., we have . However, this choice of the product basis makes MACE more computationally efficient than ICTP only for large and small ; see also W2.1. Tab. 2 in the manuscript also shows that leveraging the symmetry of tensor products and coupled features improves the computational efficiency of ICTP.
Q2: ICTP includes TensorNet as a special case with and , though with equivariant convolution filters; see also W1 and Tab. 3. TensorNet uses reducible Cartesian tensors to embed atomic environments and decomposes them into irreducible ones before computing products with invariant radial filters. It includes explicit three-body features since it computes a matrix-matrix product between node features and messages. Our approach uses exclusively irreducible Cartesian tensors for embedding environments, equivariant convolutions, and product basis. Thus, we do not mix irreducible representations during message passing. We use irreducible tensor products for the equivariant convolution and go beyond invariant filters. Finally, we systematically construct equivariant many-body messages.
Results are obtained by averaging over 10 independent runs. Best performances are highlighted in bold. Inference time and memory consumption are measured for a batch size of 50. Inference time is reported per atom in μs; memory consumption is provided for the entire batch in GB.
| Subsystem | ICTP (L = 2) | ICTP (L = 1) | ICTP (L = 0) | MACE (L = 2) | MACE (L = 1) | MACE (L = 0) | ICTP (L = 2, ν = 2) | MTP | GM-NN | EAM | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| TaV | E | 1.02 ± 0.27 | 1.21 ± 0.54 | 1.65 ± 1.06 | 1.72 ± 0.67 | 1.76 ± 0.53 | 2.24 ± 1.34 | 1.24 ± 0.50 | 1.94 | 1.54 | 32.0 |
| F | 0.020 ± 0.002 | 0.022 ± 0.002 | 0.024 ± 0.002 | 0.019 ± 0.002 | 0.020 ± 0.003 | 0.022 ± 0.002 | 0.023 ± 0.002 | 0.050 | 0.029 | 0.404 | |
| TaCr | E | 1.81 ± 0.29 | 1.94 ± 0.23 | 2.13 ± 0.19 | 3.26 ± 0.42 | 3.31 ± 0.44 | 4.18 ± 0.56 | 2.4 ± 0.33 | 3.26 | 2.98 | 43.6 |
| F | 0.025 ± 0.007 | 0.024 ± 0.006 | 0.027 ± 0.005 | 0.029 ± 0.01 | 0.026 ± 0.007 | 0.028 ± 0.007 | 0.026 ± 0.006 | 0.057 | 0.038 | 0.343 | |
| TaW | E | 1.75 ± 0.11 | 1.87 ± 0.14 | 2.45 ± 0.31 | 2.73 ± 0.53 | 3.21 ± 0.55 | 3.57 ± 0.48 | 2.19 ± 0.54 | 2.72 | 2.99 | 44.8 |
| F | 0.017 ± 0.002 | 0.018 ± 0.002 | 0.020 ± 0.002 | 0.017 ± 0.002 | 0.018 ± 0.002 | 0.019 ± 0.002 | 0.018 ± 0.002 | 0.038 | 0.025 | 0.248 | |
| VCr | E | 1.74 ± 1.2 | 2.52 ± 2.43 | 2.13 ± 1.24 | 2.19 ± 0.78 | 2.82 ± 1.28 | 3.11 ± 1.42 | 1.89 ± 1.27 | 2.29 | 2.82 | 44.8 |
| F | 0.016 ± 0.002 | 0.018 ± 0.001 | 0.019 ± 0.001 | 0.016 ± 0.001 | 0.017 ± 0.001 | 0.018 ± 0.002 | 0.019 ± 0.001 | 0.036 | 0.025 | 0.270 | |
| VW | E | 1.32 ± 0.2 | 1.46 ± 0.16 | 1.69 ± 0.21 | 1.9 ± 0.19 | 1.94 ± 0.23 | 2.42 ± 0.24 | 1.61 ± 0.16 | 2.50 | 2.00 | 21.3 |
| F | 0.014 ± 0.002 | 0.015 ± 0.002 | 0.018 ± 0.003 | 0.014 ± 0.002 | 0.015 ± 0.002 | 0.017 ± 0.002 | 0.016 ± 0.002 | 0.037 | 0.023 | 0.292 | |
| CrW | E | 2.18 ± 0.93 | 2.45 ± 1.53 | 2.76 ± 1.15 | 2.31 ± 1.18 | 2.84 ± 0.98 | 4.14 ± 1.38 | 3.12 ± 1.90 | 4.35 | 2.87 | 23.4 |
| F | 0.018 ± 0.004 | 0.020 ± 0.005 | 0.024 ± 0.008 | 0.020 ± 0.009 | 0.019 ± 0.006 | 0.023 ± 0.007 | 0.022 ± 0.006 | 0.041 | 0.029 | 0.248 | |
| TaVCr | E | 0.79 ± 0.08 | 0.92 ± 0.17 | 1.00 ± 0.24 | 2.26 ± 0.54 | 2.71 ± 0.66 | 3.92 ± 0.77 | 0.97 ± 0.13 | 2.43 | 1.97 | 34.1 |
| F | 0.027 ± 0.001 | 0.029 ± 0.002 | 0.033 ± 0.002 | 0.023 ± 0.002 | 0.024 ± 0.001 | 0.028 ± 0.001 | 0.031 ± 0.002 | 0.054 | 0.045 | 0.313 | |
| TaVW | E | 1.00 ± 0.2 | 0.98 ± 0.18 | 1.26 ± 0.23 | 1.8 ± 0.35 | 1.97 ± 0.44 | 2.29 ± 0.86 | 0.95 ± 0.25 | 1.67 | 1.70 | 39.6 |
| F | 0.021 ± 0.001 | 0.022 ± 0.001 | 0.025 ± 0.001 | 0.021 ± 0.002 | 0.023 ± 0.001 | 0.026 ± 0.001 | 0.023 ± 0.001 | 0.043 | 0.034 | 0.321 | |
| TaCrW | E | 1.16 ± 0.15 | 1.28 ± 0.13 | 1.58 ± 0.29 | 1.67 ± 0.38 | 1.48 ± 0.50 | 2.08 ± 0.57 | 1.24 ± 0.11 | 2.08 | 2.19 | 23.6 |
| F | 0.022 ± 0.001 | 0.024 ± 0.001 | 0.027 ± 0.001 | 0.028 ± 0.002 | 0.030 ± 0.002 | 0.033 ± 0.002 | 0.026 ± 0.001 | 0.051 | 0.039 | 0.327 | |
| VCrW | E | 1.00 ± 0.16 | 1.07 ± 0.14 | 1.37 ± 0.13 | 1.97 ± 0.5 | 2.21 ± 0.42 | 2.86 ± 0.64 | 1.10 ± 0.14 | 1.37 | 1.94 | 19.4 |
| F | 0.018 ± 0.001 | 0.019 ± 0.001 | 0.022 ± 0.001 | 0.017 ± 0.001 | 0.019 ± 0.001 | 0.021 ± 0.001 | 0.020 ± 0.001 | 0.040 | 0.031 | 0.314 | |
| TaVCrW (0 K) | E | 1.22 ± 0.07 | 1.30 ± 0.1 | 1.48 ± 0.16 | 2.26 ± 0.55 | 2.48 ± 0.46 | 3.60 ± 0.54 | 1.33 ± 0.17 | 2.09 | 2.16 | 50.8 |
| F | 0.021 ± 0.002 | 0.022 ± 0.002 | 0.025 ± 0.002 | 0.022 ± 0.001 | 0.023 ± 0.002 | 0.027 ± 0.001 | 0.024 ± 0.002 | 0.049 | 0.037 | 0.488 | |
| TaVCrW (2500 K) | E | 1.63 ± 0.07 | 1.74 ± 0.11 | 2.09 ± 0.09 | 2.22 ± 0.48 | 2.34 ± 0.59 | 3.68 ± 0.70 | 2.06 ± 0.09 | 2.40 | 2.67 | 59.4 |
| F | 0.116 ± 0.002 | 0.121 ± 0.002 | 0.141 ± 0.003 | 0.119 ± 0.007 | 0.126 ± 0.006 | 0.150 ± 0.003 | 0.140 ± 0.002 | 0.156 | 0.179 | 0.521 | |
| Overall | E | 1.38 ± 0.09 | 1.56 ± 0.21 | 1.80 ± 0.18 | 2.19 ± 0.31 | 2.42 ± 0.31 | 3.17 ± 0.28 | 1.67 ± 0.21 | 2.43 | 2.32 | 37.14 |
| F | 0.028 ± 0.001 | 0.029 ± 0.001 | 0.034 ± 0.001 | 0.029 ± 0.001 | 0.030 ± 0.001 | 0.034 ± 0.001 | 0.032 ± 0.001 | 0.054 | 0.043 | 0.443 | |
| Inference time | 51.78 ± 1.18 | 25.09 ± 0.02 | 14.59 ± 0.01 | 29.48 ± 0.23 | 15.37 ± 0.04 | 4.43 ± 0.00 | 14.97 ± 0.09 | 17.57 | 7.25 | 0.50 | |
| Memory consumption | 36.78 ± 0.00 | 16.93 ± 0.00 | 8.48 ± 0.00 | 28.82 ± 0.00 | 13.87 ± 0.00 | 5.91 ± 0.00 | 13.15 ± 0.00 | – | – | – |
All values are obtained by averaging over five independent runs. Best performances are highlighted in bold. Inference time and memory consumption are measured for a batch size of 100. Inference time is reported per structure in ms; memory consumption is provided for the entire batch in GB.
| ICTP (L = 2) | MACE (L = 2) | ||
|---|---|---|---|
| 300 K | E | 12.90 ± 1.06 | 13.50 ± 1.71 |
| F | 29.90 ± 0.25 | 30.18 ± 0.38 | |
| 600 K | E | 29.97 ± 0.94 | 31.32 ± 2.16 |
| F | 62.80 ± 0.45 | 63.04 ± 0.73 | |
| 1200 K | E | 81.03 ± 1.64 | 81.54 ± 2.02 |
| F | 146.96 ± 1.30 | 149.44 ± 1.94 | |
| Dihedral slices | E | 22.84 ± 2.96 | 28.08 ± 4.04 |
| F | 48.82 ± 5.25 | 49.62 ± 2.92 | |
| Inference time | 2.62 ± 0.02 | 2.96 ± 0.06 | |
| Memory consumption | 32.57 ± 0.00 | 23.32 ± 0.00 |
All values are obtained by averaging over five independent runs. Best performances are highlighted in bold. Inference time and memory consumption are measured for a batch size of 10. Inference time is reported per structure in ms; memory consumption is provided for the entire batch in GB.
| L = 1 | L = 2 | L = 3 | ||||
|---|---|---|---|---|---|---|
| ICTP | MACE | ICTP | MACE | ICTP | MACE | |
| Inference times | ||||||
| ν = 1 | 0.76 ± 0.17 | 1.02 ± 0.03 | 0.87 ± 0.18 | 1.38 ± 0.04 | 0.98 ± 0.26 | 1.88 ± 0.03 |
| ν = 2 | 0.59 ± 0.20 | 1.12 ± 0.03 | 1.03 ± 0.21 | 1.52 ± 0.05 | 1.34 ± 0.08 | 2.0 ± 0.10 |
| ν = 3 | 0.79 ± 0.22 | 1.23 ± 0.03 | 1.15 ± 0.08 | 1.67 ± 0.03 | 1.85 ± 0.13 | 2.23 ± 0.03 |
| ν = 4 | 0.94 ± 0.17 | 1.41 ± 0.11 | 1.31 ± 0.21 | 1.83 ± 0.01 | 2.07 ± 0.20 | 2.53 ± 0.01 |
| ν = 5 | 1.02 ± 0.17 | 1.52 ± 0.08 | 1.72 ± 0.07 | 2.26 ± 0.03 | 3.61 ± 0.02 | OOM |
| ν = 6 | 1.00 ± 0.07 | 1.77 ± 0.05 | 1.83 ± 0.16 | 27.85 ± 0.01 | 16.76 ± 0.35 | OOM |
| Memory consumption | ||||||
| ν = 1 | 0.05 ± 0.00 | 0.04 ± 0.00 | 0.08 ± 0.00 | 0.06 ± 0.00 | 0.21 ± 0.00 | 0.13 ± 0.00 |
| ν = 2 | 0.05 ± 0.00 | 0.04 ± 0.00 | 0.08 ± 0.00 | 0.07 ± 0.00 | 0.28 ± 0.09 | 0.13 ± 0.00 |
| ν = 3 | 0.05 ± 0.00 | 0.04 ± 0.00 | 0.10 ± 0.00 | 0.08 ± 0.00 | 0.51 ± 0.03 | 0.23 ± 0.00 |
| ν = 4 | 0.05 ± 0.00 | 0.05 ± 0.00 | 0.18 ± 0.08 | 0.30 ± 0.00 | 1.07 ± 0.10 | 4.16 ± 0.00 |
| ν = 5 | 0.05 ± 0.00 | 0.07 ± 0.00 | 0.35 ± 0.07 | 3.18 ± 0.00 | 5.07 ± 0.02 | OOM |
| ν = 6 | 0.11 ± 0.09 | 0.22 ± 0.00 | 0.93 ± 0.00 | 50.49 ± 0.00 | 28.48 ± 0.03 | OOM |
Thank you for your clarifications. Most of my concerns have been addressed. I choose to increase my rating to 6.
Dear reviewer,
We thank you for raising your score. Your feedback and suggestions have significantly improved our work.
Dear Reviewers,
We thank you for your time and effort in evaluating the manuscript and providing positive feedback and constructive suggestions. Below, we provide a general response to your comments, with more details available in the individual discussions. We will revise the manuscript accordingly, incorporating all results produced during this review process.
All reviewers shared similar concerns:
-
We improved the motivation for our work by discussing its differences from and advantages over recent Cartesian models, like TensorNet and CACE. We clarified that we are using irreducible Cartesian tensors for all parts of our message-passing layers, ensuring that irreducible representations are not mixed. Additionally, we define equivariant convolution filters that extend beyond the invariant ones used by recent work. We construct equivariant many-body messages using the product basis. Thus, our work enables a systematic construction of message-passing architectures that are equivariant under the action of using irreducible Cartesian tensors, enhances the expressivity of resulting models, and captures TensorNet/CACE as special cases. We support our discussion by evaluating the TensorNet-like model on the Ta-V-Cr-W data set, which we included for this review.
-
We extended our complexity analysis to demonstrate further the advantages of models based on irreducible Cartesian tensors over their spherical counterparts. For equivariant convolutions (two-body interactions), we obtain and for ICTP and MACE, respectively. For the product basis (many-body interactions), we obtain and for ICTP and MACE, respectively. Here, is the number of nodes, is the number of edges, is the number of feature channels, is the maximal tensor rank, is the number of contracted tensors, and counts all possible -fold tensor contractions. The obtained complexities show that ICTP gets more computationally efficient than MACE as increases. Larger values of become more important when, e.g., local atomic environments are degenerate with respect to lower body orders, and higher accuracy is required [B, G]. We support our complexity analysis by performing an ablation study on the hyper-parameter and using the 3BPA data set.
-
We included the large-scale Ta-V-Cr-W data set in our analysis of the models' performance [C]. This data set includes 0 K energies, atomic forces, and stresses for binaries (i.e., two different atom types), ternaries (i.e., three different atom types), and quaternary (i.e., four different atom types) and near-melting temperature properties in four-component (i.e., four different atom types) disordered alloys. In total, this benchmark data set contains 6711 configurations that are computed with density functional theory (DFT). More precisely, there are 5680 0 K structures: 4491 binary, 595 ternary, and 594 quaternary structures, along with 1031 structures sampled from molecular dynamics (MD) at 2500 K. Structure sizes range from 2 to 432 atoms in the periodic cell. All models are trained using 5373 configurations (4873 are used for training and 500—for early stopping), while the remaining configurations are reserved for testing the models' performance. The performance is tested separately using 0 K binaries, ternaries, quaternaries, and near-melting temperature four-component disordered alloys. ICTP systematically provides energies and forces better than the current state-of-the-art (MTP/GM-NN) for the Ta-V-Cr-W data set. In contrast, for MACE, we were not able to identify a set of relative weights for energy, forces, and virial losses that consistently yield competitive results for both energies and forces.
All numerical results for the experiments conducted for this review are presented in Tabs. 1, 2, and 3 of the attached PDF. We also include Fig. 1, illustrating inference time and memory consumption of ICTP and MACE as a function of and .
We look forward to your feedback and await your opinions on our responses.
Yours Sincerely,
The authors.
References
[A] I. Grega, I. Batatia, G. Csányi et al.: Energy-conserving equivariant GNN for elasticity of lattice architected metamaterials. Int. Conf. Learn. Represent. https://arxiv.org/abs/2401.16914 (2024)
[B] C. K. Joshi, C. Bodnar, S. V. Mathis et al.: On the Expressive Power of Geometric Graph Neural Networks. Int. Conf. Learn. Represent. https://arxiv.org/abs/2301.09308 (2023)
[C] K. Gubaev, V. Zaverkin, P. Srinivasan et al.: Performance of two complementary machine-learned potentials in modelling chemically complex systems. npj Comput. Mater. 9, 129 (2023)
[D] J. Nigam, M. J. Willatt, and M. Ceriotti: Equivariant representations for molecular Hamiltonians and -center atomic-scale properties. J. Chem. Phys. 156, 014115 (2022)
[E] O. T. Unke, M. Bogojeski, M. Gastegger et al.: SE(3)-equivariant prediction of molecular wavefunctions and electronic densities. Adv. Neural Inf. Process. Syst. https://arxiv.org/abs/2106.02347 (2021)
[F] I. Batatia, L. L. Schaaf, G. Csányi et al.: Equivariant Matrix Function Neural Networks. Int. Conf. Learn. Represent. https://arxiv.org/abs/2310.10434 (2024)
[G] S. N. Pozdnyakov, M. J. Willatt, A. P. Bartók et al.: Incompleteness of Atomic Structure Representations. Phys. Rev. Lett. 125, 166001 (2020)
This paper suggests SO(3) equivariant networks which are based on irreducible representations of tensor representations, rather than the more standard spherical representations. This is a novel and refreshing approach, though ultimately it is equivalent, in a sense, to the spherical approach. Initially, the reviewers were not convinced that this approach yielded significant performance advantage over spherical approaches such as MACE. These concerns were some what alleviated by later efficiency comparison in the rebuttal, both theoretical and empirical, that show the proposed approach has better complexity when several tensor products are computed simultaneously.
Ultimately, this is a refreshing new perspective to an important problem, and I recommend acceptance.