Equivariant Blurring Diffusion for Hierarchical Molecular Conformer Generation
摘要
评审与讨论
In this work, the authors introduce an hierarchical diffusion model for molecular conformer generation. The framework starts with fragment positions initialized by RDKit and designs a diffusion process that generates atomic positions from substructure positions. The reverse diffusion process is modeled by an equivariant neural network. Experiments on standard molecular conformer generation benchmarks: GEOM-QM9 and GEOM-DRUGS show better performance than some previous baselines. The authors also include ablation studies that validate some important design choices in the model.
优点
- The model investigates a popular yet important problem: molecular conformer generation with deep generative model.
- The proposed model is benchmarked on standard dataset to demonstrate its performance.
缺点
- The model is introduced in a way that it is derivated from blurring diffusion. However, the final formulation doesn't seem to strongly relate the concept.
- The work misses comparison with some strong baselines in molecular conformer generation.
- Though there are ablation studies to validate some design choices, there are still some important framework designs that are not well discussed.
More details can be found in the following Questions section.
问题
- Following #1 in weakness, the final blurring operator (eq. 7) is basically a linear interpolant in Euclidean space between the substructure coordinate space and atom coordinate space. Unlike usual linear interpolant that starts from random Gaussian, here it starts from substructure coordinate space. I feel it is a bit confusing to introduce the framework as a variant of blurring diffusion which can obscure the actual contributions of this paper.
- Following # 2 in weakness, there are recent deep generative models for molecular conformation generation [1,2] that achieve state-of-the-art but are not included in the comparison to proposed method. The authors are strongly recommended to include the comparison to better validation the performance.
- Following # 3 in weakness, the work relies on principal subgraph (PS) to obtain molecular fragments. I wonder if the authors have tried other cheminformatic methods like BRICS [3].
- Also, the vocabulary size |S| is set to 50, which means there are quite some isolated atoms. I wonder what are the ratios of isolated atoms.
- The work designs a diffusion process from substructure space to atom space, I wonder how it compare with a standard diffusion model that conditioned on substructure coordinates. Have the authors by any chance investigated similar settings?
- How is in Eq. 8 determined in diffusion process?
References:
[1] Torsional Diffusion for Molecular Conformer Generation: https://arxiv.org/abs/2206.01729
[2] Swallowing the Bitter Pill: Simplified Scalable Conformer Generation: https://arxiv.org/abs/2311.17932
[3] On the Art of Compiling and Using 'Drug-Like' Chemical Fragment Spaces: https://chemistry-europe.onlinelibrary.wiley.com/doi/abs/10.1002/cmdc.200800178
局限性
The authors adequately addressed the limitations in the work.
Q1. Following #1 in weakness, the final blurring operator (eq. 7) is basically a linear interpolant in Euclidean space between the substructure coordinate space and atom coordinate space. Unlike usual linear interpolant that starts from random Gaussian, here it starts from substructure coordinate space. I feel it is a bit confusing to introduce the framework as a variant of blurring diffusion which can obscure the actual contributions of this paper.
Answer.
-
First and foremost, we want to emphasize that the purpose of the blurring operation is to transform the atomic positions in 3-dimensional Euclidean space from a coarse-grained structure (fragment coordinates) to a fine-grained structure (atomic coordinates).
-
We attempted to use the blurring operator in the spectral domain (Eq. 2) to transform the atomic coordinates in the spatial domain. However, as explained starting from line 154 of the manuscript, the eigendecomposition of the graph Laplacian for each fragment requires excessive time. Additionally, a large T is required to converge the positions of atoms to the prior fragment coordinates. The problem is that fragments vary in size and structure and this makes it difficult to model uniform atomic movement for different fragments with a single T value. Lastly, there is a discrepancy between the ground truth fragment coordinates, which are the convergence result of the spectral operator, and the prior RDKit fragment coordinates.
-
Therefore, we aimed to introduce an operator that maintains the essence of blurring, meaning the gradual transition from coarse-grained to fine-grained structures, while being computationally efficient, less affected by the varying sizes and structures of fragments, and considering the discrepancy between fragment coordinate distributions. The result is a linear interpolation between the coarse-grained and fine-grained distributions in the spatial domain (Eq. 7).
-
In summary, while the operator is a linear interpolation in the spatial domain, it is derived from characteristics in the spectral domain.
Q2. Following # 2 in weakness, there are recent deep generative models for molecular conformation generation [1,2] that achieve state-of-the-art but are not included in the comparison to proposed method. The authors are strongly recommended to include the comparison to better validation the performance.
Answer.
- Regarding the comparison with strong baseline, please check the general response of G-Q3.
Q3. Following # 3 in weakness, the work relies on principal subgraph (PS) to obtain molecular fragments. I wonder if the authors have tried other cheminformatic methods like BRICS [3].
Answer.
-
We chose Principal Subgraphs for the following reasons:
- There are no overlapping atoms between fragments, which prevents the case where an atom in the prior distribution is present in the coordinates of more than one fragment.
- We can set the size of the fragment vocabulary, allowing us to observe the impact of fragment granularity on generative performance.
-
We also considered using the well-known BRICS [1] and tree decomposition [2] methods but encountered the following issues:
- With BRICS, it is impossible to adjust the size of the vocabulary, preventing us from observing performance based on fragment granularity. Additionally, BRICS generates large fragments with very low frequencies, which can impact generalization performance. For GEOM-Drugs, the BRICS vocabulary contains 11,356 fragments with an average size of 19.98 atoms. Moreover, 73.3% of the fragments in the vocabulary occur fewer than ten times in the entire dataset.
- Tree decomposition, as observed in the analysis of Principal Subgraphs paper, generates too fine fragments. For GEOM-Drugs, tree decomposition generates fragments of size 1 (isolated atoms) and size 2 that have a 95.56% frequency of occurrence in the entire dataset. Additionally, there are overlapping atoms between different fragments.
[1] Degen, Jorg, et al. "On the art of compiling and using 'drug-like' chemical fragment spaces." ChemMedChem 3.10 (2008): 1503.
[2] Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Junction tree variational autoencoder for molecular graph generation." International conference on machine learning. PMLR, 2018.
Q4. Also, the vocabulary size |S| is set to 50, which means there are quite some isolated atoms. I wonder what are the ratios of isolated atoms.
Answer.
| PS50 | PS200 | PS1000 | BRICS | Tree | |
|---|---|---|---|---|---|
| Occurrence frequency of single atom fragments | 0.4699 | 0.4478 | 0.4362 | 0.8857 | 0.6139 |
-
We measured the occurrence frequency of single atom fragments for Principal Subgraphs (|S|=50, 200, 1000), BRICS, and tree decomposition on GEOM-Drugs.
-
The results in the table show that PS had similar frequency values across different vocab sizes. In contrast, BRICS and tree decomposition, which exhibit significant variations in occurrence frequency based on fragment size, showed notably high frequencies for single atom fragments.
Q5. The work designs a diffusion process from substructure space to atom space, I wonder how it compare with a standard diffusion model that conditioned on substructure coordinates. Have the authors by any chance investigated similar settings?
Answer.
- Regarding the comparison with DecompDiff in the ablation study, please check the general response of G-Q4.
Q6. How is in Eq. 8 determined in diffusion process?
Answer.
- Regarding the choice of noise scales in forward and reverse processes, please check the general response of G-Q2.
I thank the authors for answering the questions and adding extra experiments (i.e., analysis to PS and comparison to DecompDiff). I have raised my score.
We are pleased to hear that our rebuttal addressed the reviewer's concerns. We sincerely appreciate your thoughtful consideration.
The paper addresses the question by focusing on a fundamental biochemical problem: generating 3D molecular conformers based on molecular graphs in a multiscale manner. It consists of two stages:
- Generating a coarse-grained fragment-level 3D structure from the molecular graph.
- Generating fine atomic details from the coarse-grained approximated structure while allowing simultaneous adjustments to the latter.
优点
- The paper proposed the EDB, which can generate atomic details from a coarse-to-grained estimation of fragment structures using equivariant networks
- The paper proposed a novel blurring scheduler and a revised loss function that significantly impacts performance instead of directly applying those of the existing image blurring diffusion model.
- The experiments and analysis demonstrate more plausible conformers compared to SOTA denoising diffusion models.
缺点
None
问题
None
局限性
None
We would like to thank you for appreciating our work and for providing a great summary.
The paper proposes a novel diffusion method for molecular conformers based on blurring diffusion. The method utilizes RDKit to predict the 3D structure of small molecule fragments and trains a diffusion model to generate the full-atomistic molecule from the RDKit prior, leveraging hierarchical modeling. The method is evaluated based on the GEOM dataset, testing both the quality of the sampled geometries and physical properties of the conformers, as is standard in the field.
优点
- The authors apply blurred diffusion developed for image applications to the domain of molecular conformer generation to leverage hierarchical modeling in molecular settings.
- The problem of coarse-to-fine prediction is a core problem in coarse-grained molecular modeling (referred to as backmapping in the respective literature) and the proposed approach might be applicable these (large-scale) problems as well.
- The proposed method demonstrates good performance compared to other diffusion-based approaches with models of comparable size in the literature.
缺点
- The authors did not consider the molecular conformer fields (MCF) paper (https://arxiv.org/pdf/2311.17932), which is the current state-of-the-art approach to molecular conformer generation. The MCF method is more performant than the proposed approach.
- The approach depends strongly on coarse-grained fragment generation via RDKit, which could become a problem for larger systems of practical relevance.
- The authors should consider also reporting their performance metrics on the stricter threshold on GEOM-Drugs (delta = 0.75 A).
问题
- How many parameters does the model use?
- Did the authors try to build an end-to-end pipeline, where a generative model predicts the coarse-grained coordinates instead of RDKit?
局限性
The authors discuss the limitation of RDKit in application to larger molecular structure and the additional cost of the deblurring function.
W1. The authors did not consider the molecular conformer fields (MCF) paper, which is the current state-of-the-art approach to molecular conformer generation. The MCF method is more performant than the proposed approach.
Answer.
- Regarding the comparison with strong baseline, please check the general response of G-Q3.
W2. The approach depends strongly on coarse-grained fragment generation via RDKit, which could become a problem for larger systems of practical relevance.
Answer.
- Regarding the analysis on the relationship between the quality of RDKit fragment coordinates and the performance of the proposed model, please check the general response of G-Q1.
W3. The authors should consider also reporting their performance metrics on the stricter threshold on GEOM-Drugs (delta = 0.75 A).
Answer.
| COV-R mean | COV-R med | COV-P mean | COV-P med | |
|---|---|---|---|---|
| RDKit DG | 12.29 | 2.5 | 7.25 | 1.04 |
| GeoDiff | 38.29 | 32.82 | 20.8 | 14.38 |
| EBD | 42.07 | 35.5 | 21.73 | 13.3 |
- We measured the coverage scores for RDKit DG, GeoDiff, and the proposed method on GEOM-Drugs when delta is 0.75 A.
Q1. How many parameters does the model use?
Answer.
- Our equivariant deblurring network has 2,457,356 parameters when the number of layers (line 531) is 6 and the feature dimension (line 533) is 128. Each layer consists of an invariant fragment feature update function, an invariant atom feature update function, and an equivariant atom coordinate function.
Q2. Did the authors try to build an end-to-end pipeline, where a generative model predicts the coarse-grained coordinates instead of RDKit?
Answer.
-
Thank you for your insightful suggestion. End-to-end multi-scale learning is indeed our ultimate goal. Among the two stages—generating coarse-grained structures from random noise and generating fine-grained structures from coarse-grained structures—we have focused more on developing the coarse-to-fine generation stage. This is because generating coarse-grained structures from random noise can be leveraged by many existing, successful denoising diffusion models or off-the-shelf tools like RDKit, whereas methods for coarse-to-fine generation have not been sufficiently explored.
-
We believe that the proposed method can be effectively utilized in the coarse-to-fine generation stage, and we plan to explore combining these two stages into a single end-to-end model. An end-to-end generative model needs to generate m fragments and then n atoms, which presents the challenge of changing the dimension of the state value as the time steps increase or decrease. Exploring methods to handle dimension changes during the generation process [1] is a promising future direction.
[1] Campbell, Andrew, et al. "Trans-dimensional generative modeling via jump diffusion models." Advances in Neural Information Processing Systems 36 (2023).
Given that the results in the more difficult setting (delta = 0.75 A) are not very impressive, it seems very important to also show results based on the larger split (train/val/test = 243,473/30,433/1,000 molecules) that most recent works have adopted.
The proposed approach does not seem competitive with recent approaches such as MCF, indicating that the proposed approach might in the near future be viewed as a fancy engineering approach that has been superseded by more expressive models trained on more data. This possibility underlines the need for evaluation on the larger test split to gauge whether the approach remains competitive as the amount of data increases.
Thank you very much for your constructive suggestions.
-
First and foremost, we would like to clarify that our primary objective is the design of a coarse-to-fine generative model for multi-scale learning on 3D geometric data. As Reviewer UJjj mentioned, our model is the first attempt at a hierarchical method that fits well with the multi-scale structure of molecular data. This allows our model to be applicable at various levels of granularity. While MCF is a successful model with an orthogonal contribution to our work, it does not achieve our primary goal of multi-scale (coarse-to-fine) generative learning. We agree that MCF is an excellent model with an orthogonal objective of learning a distribution over functions, distinct from ours. We also believe that our proposed multi-scale generative model could potentially be extended to sequentially learn distributions over functions.
-
Like the reviewer, we also believe that performance should be measured under similar experimental conditions to ensure a fair comparison. We aimed to match the experimental environment as closely as possible to MCF, including the data split, number of model parameters, training time, and GPU resources. MCF used between 13M and 242M parameters and at least 8 to 16 A100 GPUs on GEOM-Drugs. In contrast, our model leverages a hierarchical approach with inductive bias to achieve efficiency in coarse-to-fine generative modeling. In other words, MCF required at least 5 times more parameters and at least 8 times more GPUs than our model. Given these significant differences in parameters, GPU resources, and training time, simply comparing the numbers in the MCF paper is not appropriate. While we tried to ensure a fair comparison, we want to emphasize that it is not possible to do so fully as MCF’s code and their training time are not publicly available.
-
This paper presents a model for small-molecule 3D structure generation, conditioning on its 2D molecular graphs.
-
The authors proposed a two-step process to address the problem: 1) first, using an off-the-shelf bioinformatics tool, RDKit, to generate a template scaffold structure; 2) then focusing on training a diffusion model to generate the fine-grained atom positions given the scaffold.
-
In essence, the model in the second step should learn to: 1) generate fine-grained atom positions given coarse fragment centers; 2) correct potential biases from RDKit-generated fragments.
-
To achieve this, they proposed a diffusion-like deblurring process inspired by heat diffusion (IHDM) but over a linear trajectory in Euclidean space, from fragment-averaged coordinates to predicted atomic coordinates. This design allows efficient training and sampling for the targeted problem.
-
Experiments on two small-molecule benchmarks show the model's superior performance compared to other generative model baselines.
优点
- The proposed model comprises a novel combination of rational design choices:
- A scaffold-atomic two-step generation process that defers the first step to well-established tools, converting the problem to generate atomic details and correcting prior distribution.
- Borrowing the idea from heat diffusion, it uses constant noise instead of varying noise levels as in regular diffusion models.
- It uses updated trajectory matching objectives by matching instead of the next step .
- Empirical analysis shows these design choices bring noticeable improvement in sampling coverage and accuracy over previous diffusion models. They also present several ablation studies and analyses to understand the performance and pinpoint some design factors: 1) fragment size; 2) diffusion trajectory and noise schedules; 3) loss reparameterization.
- The manuscript is presented in a clear manner and is easy to follow.
- Overall, this paper demonstrates that certain design choices can lead to improved performance and can be valuable for further research in small-molecule generation tasks.
缺点
-
While the authors demonstrated better empirical performance and conducted ablation studies to verify selected design factors, some questions still remain on why and how some of the factors are critical, particularly:
- The effect of using RDKit as prior distribution: see Q1 - Q2
- Experiment details on comparing constant noising schedule (proposed) to regular diffusion (DecompDiff like): see Q3
- The effect of choosing noising levels: see Q4
-
Minor Typos: page 16: Pseudo-code 1: label is not for training code but the RDKit conformer generator.
问题
Q1. The superior performance of the proposed models might be due to 1) the accurate generation of fine-grained atomic positions and/or 2) correcting the biases from RDKit-generated scaffolds. However, which component plays a more important role is not clearly addressed. The authors showed in Section 5.2 that a small fragment size of achieves the best performance due to the decreased atomic-level details needing to be learned, raising a natural question of whether the main benefit was from correcting the prior biases. For example, can the model achieve similar or better performance without fragments (i.e., set and the model only learns an error correction trajectory from the RDKit-predicted structure)?
Q2. Related to Q1, one may wonder to what extent the model's performance relies on the quality of RDKit-generated scaffolds. Despite the discussion in the limitations, can the authors provide more analysis on EBD’s performance vs. RDKit’s performance?
Q3. As discussed in 5.2, Effects of Data Corruptions, DecompDiff is the most similar model except for the choice of the diffusion process. However, the comparison with DecompDiff was limited: 1) it was not included in the full benchmark (Table 1); 2) T=50 steps were used for DecompDiff compared to T>200 in their paper, which might lead to different performances; 3) it was not clear if the authors retrained DecompDiff following the same setup, given the original DecompDiff was proposed for a different task (pocket-conditioned ligand generation). Can the authors provide clarification on above concerns?
Q4. Sampling noise (, ) is a key hyperparameter in the proposed diffusion process. Can the authors provide theoretical or empirical analysis on the choices of these two parameters? Are the results sensitive to their choices?
局限性
The authors have included discussion on the limitation in Appendix F.
Q1-1. The superior performance of the proposed models might be due to 1) the accurate generation of fine-grained atomic positions and/or 2) correcting the biases from RDKit-generated scaffolds. However, which component plays a more important role is not clearly addressed.
Q2. Related to Q1, one may wonder to what extent the model's performance relies on the quality of RDKit-generated scaffolds. Despite the discussion in the limitations, can the authors provide more analysis on EBD’s performance vs. RDKit’s performance?
Answer.
- Regarding the analysis on the relationship between the quality of RDKit fragment coordinates and the performance of the proposed model, please check the general response of G-Q1.
Q1-2. The authors showed in Section 5.2 that a small fragment size of achieves the best performance due to the decreased atomic-level details needing to be learned, raising a natural question of whether the main benefit was from correcting the prior biases. For example, can the model achieve similar or better performance without fragments (i.e., set and the model only learns an error correction trajectory from the RDKit-predicted structure)?
Answer.
- Thank you for your interesting suggestion. While learning the trajectory from the atom coordinates generated by RDKit to the ground truth atom coordinates can be similarly achieved in EBD, it would deviate from the primary goal of this paper, which is to develop a coarse-to-fine generative model for multi-scale learning.
Q3. As discussed in 5.2, Effects of Data Corruptions, DecompDiff is the most similar model except for the choice of the diffusion process. However, the comparison with DecompDiff was limited: 1) it was not included in the full benchmark (Table 1 – actually Table 2?); 2) T=50 steps were used for DecompDiff compared to T>200 in their paper, which might lead to different performances; 3) it was not clear if the authors retrained DecompDiff following the same setup, given the original DecompDiff was proposed for a different task (pocket-conditioned ligand generation). Can the authors provide clarification on above concerns?
Answer.
- Regarding the comparison with DecompDiff in the ablation study, please check the general response of G-Q4.
Q4. Sampling noise () is a key hyperparameter in the proposed diffusion process. Can the authors provide theoretical or empirical analysis on the choices of these two parameters? Are the results sensitive to their choices?
Answer.
- Regarding the choice of noise scales in forward and reverse processes, please check the general response of G-Q2.
I thank the authors for their additional results and references - they have resolved most of my questions.
The remaining concern is Q1-2 that if the coarse-grained-to-fine approach is indeed better than atom-level correction, especially when the 47% of the fragments are actually single atoms (as in their response to reviewer sBnm Q4). Despite the primary goal of this paper is to develop a coarse-to-fine generative model, the lack of such comparison undermines the motivation and significance of using a "coarse-to-fine" model. This limitation is factored in my rating.
We are pleased to hear that our rebuttal addressed most of the reviewer's concerns.
-
First, as the reviewer pointed out, the occurrence frequency of single atom fragments is approximately 47%. However, we would like to draw your attention to Table 1 in the manuscript. The Drugs dataset contains an average molecular graph with 40 particles (atoms). When |S|=50, the average number of particles (fragments) in coarse-grained structures is 11.77. If we calculate the resolution based on the number of particles, the resolution of the coarse-grained structure is, on average, reduced by 70% compared to the fine-grained structure. Therefore, our proposed method can indeed be considered a coarse-to-fine model that generates high resolution from low resolution.
-
We appreciate and agree with the reviewer's constructive feedback. Currently, we are training a diffusion model from RDKit all-atom to GT all-atom. We aim to share the results before the response period ends. However, if that’s not feasible, we will definitely include the experimental results in the paper to demonstrate the performance at another granularity level (all-atom). We believe this will further strengthen the motivation and significance of the proposed method.
-
In our model, disentangling the two elements of i) the accurate generation of fine-grained atomic positions and ii) correcting the biases from RDKit-generated scaffolds is challenging. This is due to our definition of fragment coordinates as the average of their consistent atom coordinates. Therefore, accurately generating fine-grained atomic positions inherently involves correcting the RDKit-generated scaffolds.
-
The table below presents results generated from |S|=50 and RDKit-generated atomic coordinates (RDKit all atom). (Due to the limited time available for the author-reviewer response, we kindly ask for your understanding that the RDKit all atom results were obtained during an intermediate stage of training.) As expected, RDKit all atom shows more accurate results compared to |S|=50, as it includes more detail in the prior distribution. These results are predictable because the trajectory required to achieve the goal of accurately generating fine-grained atomic positions is much more challenging at the significantly lower resolution of |S|=50 compared to RDKit all atom.
-
We hope the additional results have adequately addressed your concerns. For the RDKit all-atom case, we will include the performance after the training is completed in the manuscript.
| C-R-Mn | C-R-Med | M-R-Mn | M-R-Md | C-P-Mn | C-P-Med | M-P-Mn | M-P-Med | |
|---|---|---|---|---|---|---|---|---|
| |S|=50 | 0.9260 | 0.9873 | 0.8216 | 0.8279 | 0.6624 | 0.6839 | 1.1237 | 1.0916 |
| RDKit all atom | 0.8886 | 0.9762 | 0.8456 | 0.8434 | 0.7235 | 0.8270 | 1.0808 | 1.0069 |
I appreciate the authors' efforts on addressing the questions and running additional experiments in a short period. The intermediate results do help to illustrate that how proposed methods can improve the conformer generation: from both the perspectives of coarse-to-fine and position correction. It would be great if the authors can include the final results in the manuscript and discuss the seemly trade-off (if still exists) between the recall (C-R) and precision (C-P) of two experiments. Given the new results, I am updating the rating from 6 to 7.
We are pleased to hear that our response addressed the reviewer's concerns. We sincerely appreciate your thoughtful consideration.
The paper introduces Equivariant Blurring Diffusion (EBD), a unique generative model for hierarchical molecular conformer creation. A coarse-to-fine production process is presented by the model, with an emphasis on producing fragment-level structures first and then honing them down to atomic details. The method guarantees equivariance of SE(3), which is necessary for molecular structures. Comparisons with the most recent models on drug-like chemicals reveal that EBD performs better in geometric and chemical evaluations, demonstrating its effectiveness.
优点
An important step forward is the two-step process of creating fragment-level structures and then honing in on atomic details. This hierarchical method fits in nicely with molecular structures' multiscale structure. In order to preserve the geometrical and physical integrity of molecular conformers throughout the generation process, the model guarantees SE(3) equivariance. The experimental results show that EBD can generate accurate and diversified molecular conformers with fewer diffusion steps than state-of-the-art models. Comprehensive ablation investigations and in-depth comparisons with current models are included in the study, which offers a comprehensive understanding of the design decisions and how they affect performance. The examination of the chemical properties, which includes HOMO-LUMO gaps and energy estimates, provides a great deal of value by demonstrating that EBD can produce stable and chemically realistic conformers.
缺点
- The implementation and computational resource requirements may become more complex due to the hierarchical structure and requirement for fragmentation. This might make the model less useful and accessible for wider applications.
- RDKit is used extensively in the first generation of fragment coordinates. The quality of the initial fragment structures that RDKit provides could limit the model's performance.
- The concept works well for drug-like molecules, but it hasn't been fully investigated how well it scales to larger and more complex molecular structures. The claims would be strengthened by additional validation using larger datasets or more complicated compounds.
- The geometric (RMSD) and chemical properties are the main metrics used for evaluation. An evaluation of the model's capabilities that is more thorough might be obtained by incorporating further metrics pertaining to the novelty and variety of the generated conformers.
问题
- Have you looked into any other options except RDKit for creating initial fragment coordinates? To what extent does the quality of these initial coordinates affect EBD performance?
- Could you elaborate on how EBD scales up to more complex molecules? Have you used these datasets for any preliminary experiments?
- Although the conformers developed exhibit chemical plausibility, what is their performance in real-world scenarios like docking simulations or property prediction? Is it planned to validate the conformers that are generated in these kinds of real-world situations?
- There is a brief mention of generation times and training. Could you elaborate on the amount of computing power needed to train EBD and produce conformers? In what way does this differ from the resources required for other cutting-edge models?
局限性
NA
Q1 (W2). RDKit is used extensively in the first generation of fragment coordinates. The quality of the initial fragment structures that RDKit provides could limit the model's performance. Have you looked into any other options except RDKit for creating initial fragment coordinates? To what extent does the quality of these initial coordinates affect EBD performance?
Answer.
-
Thank you for your feedback. Regarding the analysis on the relationship between the quality of RDKit fragment coordinates and the performance of the proposed model, please check the general response of G-Q1.
-
Multi-scale generative models consist of two stages: i) generating a coarse-grained structure from random noise, and ii) generating a fine-grained structure from the coarse-grained structure. We have prioritized developing the coarse-to-fine generation stage. This is because the development of 3D molecular conformer models for coarse-to-fine generative processes, such as designing data corruption that preserves the coarse-grained structure, has not been sufficiently explored. In contrast, generating a coarse-grained structure from random noise can be leveraged by various existing successful denoising diffusion models or off-the-shelf tools like RDKit.
-
As the reviewer mentioned, existing denoising diffusion models can also be applied to generate coarse-grained structures. However, this approach requires first training and generating coarse-grained structures, and then training and generating fine-grained structures, which may result in higher accuracy and diversity compared to using RDKit but will likely take more training time.
Q2 (W3). The concept works well for drug-like molecules, but it hasn't been fully investigated how well it scales to larger and more complex molecular structures. The claims would be strengthened by additional validation using larger datasets or more complicated compounds. Could you elaborate on how EBD scales up to more complex molecules? Have you used these datasets for any preliminary experiments?
Answer.
- We sincerely appreciate your constructive suggestions. We believe that hierarchy utilized in EBD exists widely across molecular systems, ranging from proteins as linear polymers of amino acids to materials as lattices of molecules. As reviewer pba2 mentioned, we believe that our proposed model could also be applied to backmapping problems of proteins, in addition to drug-like molecules. The goal of the protein backmapping problem is predicting the coordinates of side chain atoms given the protein backbone structure. Compared to drug-like molecules, proteins have repeating linear structures and larger sizes. While we do not yet have results from preliminary experiments, we anticipate that our proposed model, with modifications such as the addition of internal coordinate loss, can achieve promising results.
Q3. Although the conformers developed exhibit chemical plausibility, what is their performance in real-world scenarios like docking simulations or property prediction? Is it planned to validate the conformers that are generated in these kinds of real-world situations?
Answer.
- Thank you for your suggestions regarding the future extensions of our proposed method. While our primary target task is generating molecular conformers through an unconditional coarse-to-fine generative model for 3D structures, we believe our approach can be extended to the (conditional) docking problem through maintaining SE(3) equivariance to the conditioning protein structures or pockets. When decomposing the ligand compound, we could use a decomposition method optimized for the docking problem instead of the principal subgraph approach we used. For example, DecompDiff decomposes the ligand into arms that interact with the pocket and a scaffold that connects these arms. By taking the averaged coordinates of these decomposed arms and scaffold as the coarse-grained structure of the prior distribution, and conditioning the deblurring networks on the protein pockets, we could apply our method to the docking problem.
Q4 (W1). The implementation and computational resource requirements may become more complex due to the hierarchical structure and requirement for fragmentation. This might make the model less useful and accessible for wider applications. There is a brief mention of generation times and training. Could you elaborate on the amount of computing power needed to train EBD and produce conformers? In what way does this differ from the resources required for other cutting-edge models?
Answer.
-
Given a molecular graph, performing decomposition and calculating the prior distribution of fragment coordinates before training the generative model is a key difference in resource requirements compared to other cutting-edge models. This preprocessing step does not require GPU usage. For GEOM-Drugs, calculating the coarse-grained prior distribution took 38 hours on 16 Intel Xeon 8352Y CPUs, averaging 3 seconds per molecule.
-
The training of the proposed model requires similar resources as other cutting-edge models do. We trained our model on a single A100 GPU for 3.8 days. The comparison model, GeoDiff, also required a similar training time. The primary factor influencing training time is the number of parameters in the deblurring networks. We used a 6-layer, 128-feature-dimension deblurring network with 2,457,356 parameters which was 3 times larger than the encoder of GeoDiff (803,858 parameters).
NA
We hope our response has adequately addressed your questions. We sincerely appreciate your insightful feedback and suggestions.
We sincerely appreciate all the reviewers for their constructive feedback and suggestions. Below, we provided general responses to the questions raised by several reviewers.
G-Q1. Quality of prior vs Performance.
Answer.
- To observe the model's performance based on the quality of fragment coordinates, we measured how accurately the fragment coordinates generated by RDKit were corrected towards the ground truth. For 200 molecules in the GEOM-Drugs test set, we measured the RMSD between RDKit fragment coordinates and ground truth fragment coordinates (RMSD(x_{RDKit}^f, x_{gt}^f)) and the RMSD between fragment coordinates generated by our model and ground truth fragment coordinates (RMSD(x_{EBD}^f, x_{gt}^f)). If RMSD(x_{EBD}^f, x_{gt}^f) is lower than RMSD(x_{RDKit}^f, x_{gt}^f) for a molecule, it indicates that the model has accurately corrected the fragment coordinates.
- In Figure 1 of the attached PDF, the points below the red line represent cases where the model corrected the coordinates accurately. We observed that the greater the RMSD(x_{RDKit}^f, x_{gt}^f) (points further to the right on the x-axis), the larger the reduction in RMSD towards RMSD(x_{EBD}^f, x_{gt}^f). In other words, the lower the quality of the coarse-grained prior, the more accurately the model tends to make corrections.
G-Q2. Choice of noise scales in forward and reverse processes.
Answer.
- We apologize for not including the details about the noise scale in the manuscript. We aimed to use low noise scale values in the proposed blurring and deblurring processes. Thus, in all experiments, we used a noise scale of 0.01 for the forward process ( in Eq. 6) and 0.0125 for the reverse process ( in Eq. 8) based on the noise scale analysis in IHDM (Appendix C.1 in IHDM). Referencing the analysis, we ensured that the delta/sigma noise scale ratio was slightly above 1. We observed that using a noise scale small but not too close to 0 and setting the delta/sigma ratio slightly above 1 were suitable for training the blurring diffusion model.
G-Q3. Strong baselines.
Answer.
- Thank you for bringing up these great relevant studies. We discovered that due to the different data splits used in these papers, it is challenging to directly compare their performance with our results. For GEOM-Drugs, we used the data split proposed by ConfGF: train/val/test = 40,000/5,000/200 molecules. In contrast, MCF and Torsional Diffusion used the data split proposed by GeoMol: train/val/test = 243,473/30,433/1,000 molecules. Additionally, for MCF, we kindly ask for your understanding as comparing performance is difficult without access to their implementation code. In a case of comparison between Torsional Diffusion, we are implementing experiments comparing our model with torsional diffusion. We will include some of the experiments in future revisions.
G-Q4. Comparison with DecompDiff in the ablation study.
Answer.
-
We appreciate your feedback and would like to elaborate on the motivation, experimental settings, and results of the ablation study on the effects of data corruption (line 279).
-
Motivation: DecompDiff is a denoising diffusion model conditioned on coarse-grained structures, where the number of prior distributions corresponds to the number of fragments, and the mean of each prior is the respective fragment coordinates. By comparing the proposed method with DecompDiff in a controlled manner, we aimed to isolate the effect of the proposed blurring scheduler and random noise injection on learning in the coarse-to-fine molecular conformer generation task.
Our use of DecompDiff was not to demonstrate its suitability for the molecular conformer generation task but rather to show that the stochastic trajectory from random noise corruption is more challenging for the coarse-to-fine generation task than the proposed blurring schedule, even when the prior distributions are conditioned on the coarse-grained structures. As the reviewer 78eu clearly pointed out, DecompDiff has shown effectiveness in generating ligand compound structures docking to the target protein. They proposed a fragment decomposition method (scaffold-arms) specialized for protein-ligand complexes, rather than Principal Subgraphs we used. Since DecompDiff was designed for protein-ligand complex problems and the fragment decomposition (Principal Subgraphs) used in our experiments is not aligned with their target task, we did not include the full report of its performance in Table 2.
-
Experimental settings: Except for the data corruption methods, specifically the proposed blurring schedule and random noise, we used the same coarse-grained prior distribution, encoder design, ground truth estimator (please note that DecompDiff also used a ground truth state estimator (Eq. (8) of DecompDiff)), and number of time steps when comparing our method and DecompDiff. This controlled setting was taken to isolate the contribution of data corruption to the coarse-to-fine generative task fairly and clearly, without any entanglement of other factors.
-
Results: The Table 1 in the attached pdf is a full report of the performance of DecompDiff in the experimental settings above, and detailed results were presented in Figure 3 (c) and Figure 4 of the manuscript. We observed that the conformers generated from EBD show better diversity scores compared to the stochastic trajectory, since the proposed blurring schedule of EBD facilitates the learning process of the coarse-to-fine generative models.
The paper introduces Equivariant Blurring Diffusion (EBD), a novel generative model for hierarchical molecular conformer generation. The model employs a coarse-to-fine process, starting with the production of fragment-level structures and refining them down to atomic details. It ensures SE(3) equivariance, which is crucial for maintaining the geometric integrity of molecular structures.
Strengths:
- The proposed model features a novel combination of rational design choices, including a scaffold-to-atomic two-step generation process, an innovative blurring scheduler, and a revised loss function.
- Empirical analysis demonstrates that these design choices lead to significant improvements in sampling coverage and accuracy over previous diffusion models.
- The manuscript is presented in a clear manner and is easy to follow.
Weaknesses:
- The quality of these initial structures provided by RDKit could potentially limit the model's overall performance.
- The work misses comparison with some strong baselines in molecular conformer generation.
In their rebuttal, the authors provided additional experiments and explanations on how accurately the proposed method corrected the fragment coordinates generated by RDKit towards the ground truth. They also considered reporting their performance metrics using a stricter threshold on GEOM-Drugs (delta = 0.75 Å). Additional evaluations were conducted on diffusion from RDKit's all-atom positions, along with further ablations regarding DecompDiff. Three of the five reviewers were satisfied with the authors' response, and two upgraded their scores. However, during the discussion period, Reviewer pba2 continued to express concerns regarding the comparison with the MCF method and the model's reliance on RDKit. After reviewing the authors' explanations on these points, I believe they have made valuable contributions to the problem of coarse-to-fine molecular conformation generation.
Overall, this paper demonstrates that specific design choices can significantly improve performance and contribute to further research in small-molecule generation tasks. The authors are strongly suggested to carefully consider the reviewers' comments and revise their final version accordingly.