Kinetic Langevin Diffusion for Crystalline Materials Generation
摘要
评审与讨论
The paper presents Kinetic Langevin Diffusion for Materials (KLDM), a groundbreaking diffusion model designed for generating crystalline materials. KLDM tackles the challenge of modeling fractional coordinates on a hypertorus by introducing auxiliary Euclidean velocity variables, eliminating the need for approximations inherent in Riemannian diffusion and ensures consistent training objectives. The model is tested on two key tasks—Crystal Structure Prediction (CSP) and De-novo Generation (DNG)—and achieves competitive results compared to state-of-the-art models, especially on large datasets such as MP-20 and MPTS-52.
给作者的问题
- Why can a zero-mean ensure that and share the same group element ? Please provide more intuition and explanation.
- The core operation of this paper is the introduction of the zero-mean , which ensures the consistency of the target score function . Why does the paper not provide an ablation study for this operation? I believe such an experiment could highlight the key contribution of the work.
论据与证据
The claims in this paper are supported by clear evidence.
方法与评估标准
This article provides a fairly detailed explanation of the methodology, and the evaluation is also quite reasonable.
理论论述
The theoretical claims presented in this paper are well-founded.
实验设计与分析
This paper conducts experiments on the CSP and DNG benchmarks, comparing with the mainstream models. The results serve as evidence supporting the effectiveness of the method.
补充材料
The Appendix provides sufficient supplementary information.
与现有文献的关系
This paper focuses on addressing the issue of inconsistent training objectives in crystal generation tasks, a problem that had not been adequately resolved in previous works such as DiffCSP [A] and EquiCSP [B]. Compared to these models, this paper introduces the Kinetic Langevin Diffusion process, inspired by TDM [C]. By incorporating an auxiliary velocity , the modeling of fractional coordinates is simplified, eliminating the need to focus on Riemannian manifolds. Compared to TDM [3], this paper extends the method to the crystal generation task, which requires considering additional symmetries.
[A] Jiao, Rui, et al. "Crystal structure prediction by joint equivariant diffusion." NeurIPS 2023.
[B] Lin, Peijia, et al. "Equivariant diffusion for crystal structure prediction." ICML 2024.
[C] Zhu, Yuchen, et al. "Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups." ICLR 2025/
遗漏的重要参考文献
The references in this paper are sufficient, requiring no further supplementation.
其他优缺点
Strengths:
- This paper tackles the challenge of inconsistent training objectives in crystal generation tasks.
- The paper aligns both datasets and metrics with prior models, achieving state-of-the-art (SOTA) results in CSP tasks and comparable results in DNG tasks.
Weaknesses:
- Technically, the diffusion framework they employ is derived from TDM [C], while the backbone model is based on DiffCSP [A]. Although the paper tackles a key issue and achieves good results, the level of technical innovation appears to be somewhat limited.
其他意见或建议
See "Questions for Authors".
We thank the reviewer for their positive consideration and suggestions to improve the paper. We address questions and comments below.
De-Novo generation task results Due to the limited character, we have to refer to the answer provided to reviewer MrGy about this topic.
Zero-net translation intuition We agree with the reviewer that the intuition was lacking from the submitted manuscript. Here, we provide a simple example to build intuition. We will include it in the updated version.
Consider a datapoint with a single atom in 1D, i.e. consists of just a single coordinate. In this simple setup, every point can be seen as a periodic translation of any another , hence also of the clean sample itself.
With no constraint on the velocity field, the forward dynamics results in noisy samples corresponding to periodic translations of (i.e. almost surely represented by different group elements) with non-zero target scores pointing back to . Since all effectively represent the same datapoint, modelling this degree of freedom is unecessary.
By constraining the velocity field to be zero-mean, the single velocity has to be zero for the constraint to be satisfied. By simulating the forward dynamics, all noisy samples, , are exactly (i.e. they share the same group element) with an associated zero target.
Ablation of design choices
While the proposed constraint of the velocity field is an important part of the paper, we do not see it as being the main contribution. The core of the paper is instead the extension of the TDM framework to crystalline materials generation. To obtain fast convergence and competitive results, we find that zero initial velocities and the resulting simplified parameterization are key elements (see Figure). We show that non-zero initial velocities (see Figure) systematically lead to subpar performance.
As suggested by the reviewer, we provide an ablation of the effect of the zero net translation for zero initial velocities (see Figure) and non-zero ones (see Figure). By removing this unecesseray degree of freedom, we observe a benefit in all cases, in particular with non-zero initial velocities.
This paper proposes a new diffusion model for modeling crystalline materials.
The model is built upon a Kinetic Langevin Diffusion on the fractional coordinates, and standard Euclidean diffusion for the lattice vector and atom types (one-hot embedded).
The core contributions of the paper are:
- proposing to use a velocity noising process in the fractional coordinate diffusion to make the noising process itself invariant to fractional translations.
- proposing a simplified score parameterisation for the combined model.
- application of the model to standard benchmark tasks.
给作者的问题
Other than the questions posed in other boxes, could the Authors discuss why they think this method appears to be working better than previous methods? Could they discuss why this does not appear to be the case for the denovo task?
Could the Authors discuss if they see impact for this work and the modeling developments outside the application area of crystal structure generation?
论据与证据
For the most part the claims are well supported.
The claim I have most issue with is the discussion around issue in the subsection Score parametrization and targets.
It is not clear to me that the issue is that the conditional target scores can be different for different translations of the same . Is this not an expected result of score matching? The point of the denoising score matching loss is to minimise the average square error over the conditional scores to give you the score function. Perhaps I have misunderstood the issue?
The solution proposed still seems interesting to me - but in that it reduces the complexity of the learnt function by quotienting out an additional symmetry of the model, namely by the noising process.
This feature is not ablated in the experimental results, although the simplification in the parameterisation of the score function is, and I think it would be quite important to show that this procedure does indeed help with better model performance.
方法与评估标准
The benchmarks are in line with prior work in the area, and appear to be sufficient.
理论论述
The pieces of analysis in the paper, such as the loss derivation, are correct. There are no other claims made.
实验设计与分析
Overall the design is sound.
I have one small nitpick and that is that for some of the experiments there are error bars computed, and for others there are not. Could the authors explain why?
Additionally, in the De Novo generation task the majority of the methods appear to be very close together in performance for the majority of metrics. Could the authors comment on which of these metrics is most important, and why there is little gain on this task compared to the tasks presented in table 1.
补充材料
I checked the sections regarding the derivation of the loss function and background material. No issues.
与现有文献的关系
The paper builds upon other work in the crystal structure generation literature, and is compared well to other baseline methods such as CDVAE, DIFFCSP, EQUICSP, FLOWMM. The paper is most related to DIFFSCP, where it replaces the diffusion on the fractional coordinates with the Kinetic Langevin Diffusion.
遗漏的重要参考文献
None to my knowladge.
其他优缺点
I appreciate the value in the combination of previous ideas presented here, and the innovations in the modelling process regarding the score function parameterisation and noising process. The results in Table 1 tasks suggest that the new model does perform better than competitors in some settings.
There are quite a few typos in the paper:
- 248R differ -> defer.
- 407R does not make sense. For example.
其他意见或建议
None
We thank the reviewer for their positive consideration and suggestions to improve the paper. Thanks for pointing out some typos, we will correct them in the updates version. We address questions and comments below.
Invariant network and equivariant target inconsistency We agree with the reviewer that in settings where no symmetries are involved, there is no issue with the denoising score matching loss. In the present case (i.e. target distribution with translational symmetry), the problem stems from the use of a periodic translation invariant score network to match an equivariant target. Considering for example a noisy point-cloud and a periodic translated version thereof, these two datapoints are equivalent from the network's perspective while the target scores are going to be different. Although this is averaged out over the course of the training and does not prevent models from learning a useful score approximation (e.g. DiffCSP and MatterGen), this is undesirable. For an intuition on this, refer to the reply "Zero-net translation intuition" given to Reviewer WGaL and or an alternative discussion, see also [1].
To further support this, we ablate the effect of the zero net translation in the next paragraph.
Ablation on zero net translation and initial zero velocity We investigate the effect of the zero net translation in terms of the match rate on the validation set of MP-20 (see Figure), where we obtain (slightly) better results by enforcing zero net translation.
We also present an analysis about the impact of non-zero initial velocities for different variances (see Figure and Figure). We observe that the zero net translation get better results no matter the initial distribution, and that by forcing the initial velocity to be zero the model converges faster and get better results in terms of match rate on the validation set.
Error bars We agree with the reviewer that we are not consistent as we present errors bars only for some of the experiments. For the baselines, results are taken from the previous papers. We will add error bars also for the DNG task in the updated version. For CSP@20, this was due to the computational cost, but we can add them in the updated version.
New metrics de novo generation task Due to the limited character, we have to refer to the answer provided to Reviewer MrGy about this topic.
Why does this work? We hypothesize that the added momentum on the fractional coordinates dynamics is the main driver behind the improved performance over DiffCSP. We find that zero initial velocities and velocity fields zero net translation are critical for better results and faster convergence. Exploring different noise schedules for the velocities is an interesting direction for further improving KLDM.
Possible future applications Our model can also be applied to other tasks that involve the generation of periodic systems. A natural application can be surfaces or other lower dimensional periodic systems, e.g. 2D or 1D materials. The generation of metal-organic frameworks (MOF) is another interesting future application, with the main challenge being the additional modelling of rotational frames.
References
[1] Lin, Peijia, et al. "Equivariant diffusion for crystal structure prediction." ICML 2024.
This paper proposes a diffusion model tailored for crystalline material generation. It utilizes the specific manifold structure of the data, and applies the framework of Trivialized Diffusion model, which is a diffusion model that works on Lie groups. This framework avoids doing Riemannian diffusion by taking the tangent space and defining the noising process on the velocity, which lies in an Euclidean space, largely simplifies the computation. It demonstrates empirical performance on structure prediction and de novo generation tasks, with comparable performance with existing methods.
Update after rebuttal
Thank you for adding these empirical results, comparison and explanations. I have raised my score accordingly.
给作者的问题
Can you provide a complexity analysis and comparison with the existing methods, especially the ones using Riemannian Diffusion models on manifolds? Is the matrix exponential step slow to compute, or are they simplified with the trigonometric functions? Compared to existing methods (DIFFCSP), are there less parameters?
论据与证据
The claims made in the submission are supported by clear and convincing evidence.
方法与评估标准
The methods that specifically design diffusion process for the coordinate parametrization of the crystalline data structures makes sense.
For the structure prediction task, it compares RMSE with ground truth and Match Rate. For RMSE computation, I wonder if it considered the symmetry of the coordinates as described in section 2.1.
The Metric for de novo generation makes sense and aligns with literature.
理论论述
I checked the main ideas, the transition kernels and objectives, and they make sense. I did not look into the details of the derivation in the appendix.
实验设计与分析
The experimental designs are sound. The structure prediction and de novo generation make sense. The ablation study shows the simplified parameterization improves the accuracy of the prediction. The paper also mentions “the simplified parameterization” leads to faster convergence, but I did not see quantitative results supporting this.
补充材料
I reviewed the related work and experimental details and they are well-written.
与现有文献的关系
How are the key contributions of the paper related to the broader scientific literature? Be specific in terms of prior related findings/results/ideas/etc.
This paper mainly uses the Trivialized Diffusion Model, which enables simpler training of diffusion model for data with a Lie group structure. It provides an interesting direction of designing the diffusion process specific to the algebraic and geometric structure of crystals. It has application in structure prediction and crystal generation.
遗漏的重要参考文献
N/A
其他优缺点
Strengths: The idea of designing the diffusion process and score-matching objective specific to the crystal problem is novel, and the application of the trivialized diffusion model for this data with the group structure is interesting.
Weaknesses: Analysis and result of complexity, convergence, are missing, which would support the benefit of this approach over existing methods. Especially given the fact that it does not outperform them for de novo generation tasks.
其他意见或建议
For de novo generation, the performance is not as good as existing methods. Maybe a future direction would be adding some guidance of those desirable structures.
We thank the reviewer for their positive consideration and suggestions for improving the paper. We address their questions and comments below.
RMSE computation Similar to previous work, we compute the RMSE of the generated samples wrt. ground truth using StructureMatcher from pymatgen, after filtering for structural and compositional validity. The algorithm internally accounts for the symmetries in the data.
Simplified parameterization To support this, we provide a plot showing the evolution of the match rate on the validation set of MP-20 (see Figure), where the simplified parameterization is shown to converge significantly faster and to higher values than the direct one.
Other design choices ablation We note that this simplified parameterization is only possible when . To further support this design choice, we evaluate the effect of the initial velocity standard deviation on the convergence / performance of the model (see Figure). When , the models do not reach convergence within the allocated budget of k epochs -- as for the direct parameterization in the case .
Architecture compared to previous models KLDM and DiffCSP are comparable in terms of the NN architecture. We use the same backbone as that of DiffCSP (and EquiCSP), with the minor difference being that now our score network receives an additional input representing the velocity , resulting in a limited increase in learnable parameters.
Matrix exponential computation and difference with other Riemannian score based models (RSBMs) The main difference with DiffCSP is that our diffusion process is defined on the velocity variables and not directly on the fractional coordinates. Our transition kernel has an additional distribution (wrapped normal + normal), resulting in the modelling of instead of only. Compared to RSBM (Algorithm 1 in [1]), which DiffCSP builds upon, our process (Eq 12., and Algorithm 3 in the submitted manuscript) has an additional momentum term, resulting in velocities displaying some inertia.
Intuitively, this can be thought of as the difference between gradient descent (DiffCSP) and gradient descent with momentum (KLDM).
Regarding the expontial map, our implementation follows what we presented in Eq. 15 in the paper and Appendix C.2. In the case of a torus, this is simply equivalent to a translation and wrapping operation.
De-Novo generation task results We acknowledge the limitations of the presented metrics, and therefore provide more meaningful discovery-related metrics in this new table. Given the timeline and the available resources, the evaluation is performed using a machine-learning interatomic potential, based on the open-source MatterGen pipeline.
For completeness, we compare ways of performing diffusion on the discrete atom types: continuous diffusion on one-hot encoded atom types (C), continuous diffusion on analog bits (C-AB), and discrete diffusion with absorbing state (D). Notably, when relying on analog-bits or discrete diffusion to model the atom types, KLDM performs better than DiffCSP in terms of RMSD (lower values means generated structures closer to relaxed ones), energy above the hull (lower values means generated materials closer to stability) and stability, while being slightly subpar on S.U.N..
We however note that that the compared DiffCSP and MatterGen-MP were trained on a re-optimized version of MP-20 where some chemical elements have been removed, specifically noble gases, radioactive elements and elements with atomic number greater than 84. Samples with energy above the hull bigger than 0.1 eV / atom have also been filtered out. Our model was trained on the original MP-20.
Regarding Mattergen-MP, we believe that the gap can be explained by different elements: (1) a more expressive denoiser operating in real space, (2) a PC sampler on the lattice parameters, and (3) effect of the pre-processing of MP-20.
References
[1] De Bortoli, Valentin, et al. "Riemannian score-based generative modelling." NeurIPS 2022
This paper addresses the interesting problem of crystalline material generation using the state of the art tools of diffusion modelling. While the reviewers had some initial doubts, the rebuttal addressed all their concerns. The properly conducted evaluations indeed show the advantage of the presented approach. All reviewers recommend acceptance and the AC agrees. AC kindly asks that the paper reflects all discussions provided in the rebuttal.