PaperHub
8.7
/10
Poster4 位审稿人
最低5最高6标准差0.4
5
5
6
5
4.5
置信度
创新性3.5
质量3.0
清晰度2.8
重要性3.0
NeurIPS 2025

Accelerating 3D Molecule Generative Models with Trajectory Diagnosis

OpenReviewPDF
提交: 2025-05-10更新: 2025-10-29
TL;DR

A novel method for fast 3D molecule generation

摘要

关键词
3D Molecule GenerationFast GenerationDrug Design

评审与讨论

审稿意见
5

The paper proposes a novel approach to accelerate the generative process for 3D molecular data. Unlike in the Euclidean data domain, such as images, the authors observe that the generation of 3D molecular structures involves two distinct phases. In the first phase—permutation ordering—the model focuses on aligning the current sample with the correct atom ordering of the final structure. Once this is resolved, the second phase involves refining the atomic features. To accelerate the process, the authors introduce a geometry-informed prior to speed up the first phase and apply consistency learning to improve efficiency in the second phase. As a result, the proposed model achieves competitive performance with a significantly reduced number of function evaluations (NFEs).

优缺点分析

Strengths

I find the discovery of two distinct phases in 3D molecular generative modeling both novel and interesting. As for the proposed solutions, using a geometry-informed prior instead of the standard BFN prior is a reasonable approach to accelerate molecular structure generation. Moreover, applying consistency training in the parameter space of BFNs also strikes me as a novel and promising idea.

Weaknesses

  • The work builds on prior research [1,2] to identify the problem and design its solutions. From the perspectives of modeling, benchmarking, and architecture, the contribution appears incremental and not particularly substantial within the broader context of 3D molecular generation.
  • The method requires storing a separate geometry-informed prior for each group of molecules with the same number of atoms. This limits its generalization to out-of-distribution cases involving molecules with atom counts not seen during training, and may reduce scalability to datasets with a wide range of molecular sizes.
  • Some statements in the paper appear to be unclear (see Questions).

References

[1] Equivariant Flow Matching with Hybrid Probability Transport for 3D Molecule Generation. In NeurIPS23.

[2] Unified Generative Modeling of 3D Molecules via Bayesian Flow Networks. In ICLR24.

问题

  • Could the authors elaborate on how the novelty metric is computed? Is it based on SMILES string comparison with the training set?

  • In the context of 3D molecular data, multiple symmetries exist. Could the authors discuss how rotational symmetry evolves throughout the generative process, in addition to permutation symmetry?

局限性

yes

最终评判理由

The paper is very well written. Their method appears novel in the field of 3D molecular generation. In the rebuttal, the authors also addressed all my questions. I believe this is a very good paper, and I recommend accepting it.

格式问题

No formatting error found.

作者回复

Weaknesses 1:Relationships to prior works

We thank the reviewer for their careful assessment of our contribution. While our framework follows prior research [1, 2], our core contribution lies in identifying and addressing a fundamental, previously unappreciated characteristic of 3D molecular generation that has significant implications for efficiency.

Driven by this novel insight, we introduce two distinct and highly effective techniques. Our empirical results demonstrate that this dual approach delivers a magnitude-order improvement in sampling speed over the state-of-the-art. Furthermore, our techniques are complementary and can be integrated into various generative frameworks, highlighting their broad utility.

Beyond the demonstrated speedup in 3D molecule generation, we believe our work offers insights into essential elements for modeling 3D molecular structures. Specifically, our findings suggest that a strong inductive bias towards geometric priors can significantly simplify the inherent complexity of 3D geometric generation.

Weaknesses 2. Out-of-distribution generation

Thanks for your insightful comment! Current generative framework, training objectives and even evaluation metrics tend to encourage models to fit the empirical distribution of the training data. As you mentioned, we also recognize the importance of generating novel and valid molecules beyond the training set, which holds significant practical value. However, evaluating generalization ability remains an open challenge. To address this, we conducted additional empirical studies and highlight three key points to demonstrate the model’s generalization capabilities:

Novelty and Validity as proxies for generalization:

As shown in Table 1 in the original manuscript, our method achieves state-of-the-art novelty metric: V&N&U (67.03% on QM9) while maintaining high validity: 96.04%. This suggests that the geometric prior does not collapse to memorizing training structures but instead guides exploration of chemically plausible regions beyond the training distribution.

Generalization across datasets:

As mentioned in the main text, we have explicitly tested the prior’s transferability to external dataset. We constructed the geometric-prior using training data from GEOM-DRUG, and plugged it into the generative model trained on QM9—a dataset with distinct structural patterns (smaller molecules, different atom types). Using 12 NFEs we generated 10000 molecules and evaluated on QM9:

ModelAtom stability%Molecule stability%Validity%
GeoBFN10098.687.2
MolTD with QM9 prior99.492.5396.04
MolTD with DRUG prior99.290.2294.58

As the results show, using prior constructed from external dataset achieves the same accelerating effect as original MolTD, demonstrating that the prior captures fundamental geometric structures shared across datasets.

Generalization to a number of atoms that do not exist in the training set

For this task, we generate molecules with 28 atoms, as QM9 does not contain a molecule with 28 atoms and most of its molecules have less than 28 atoms (>99%). We construct prior using GEOM-DRUG dataset and plugged it on MolTD trained and evaluated on QM9, and generated 1000 molecules with 28 atoms. The results show competitive performance with SOTA using only 12 steps:

MethodNFEAtom stability%Molecule stability%Validity%
EquiFM20099.083.590.2
GeoBFN10098.880.591.2
MolTD1298.578.189.1
MolTD with DRUG prior1298.880.390.9

Q1: How the novelty metric is computed?

We appreciate the opportunity to clarify the computation of the novelty metric, which consist of three steps: First, we select the generated molecules that can be successfully transformed to SMILES strings using RDKit. Then, among these valid SMILES strings, we select a subset of the unique SMILES strings. Finally, if a unique and valid SMILES strings did not appears in the training set, then we called it a novel molecules, and the novelty metric is the percentage of novel molecules among the uniqe ones. Furthermore, the metric V&N&U is the percentage of novel molecules among all generated molecules.

Q2: How rotational symmetry evolves throughout the generative process?

Thanks for the insightful question! As detailed in the main text, the geometric-informed prior utilized the EOT algorithm to align the geometric structures of molecules, which eliminates the variations due to rotation and translation. As a result, the geometric-informed prior simplifies the multiple rotational symmetries to just one orientation, and accelerates the stabilizing of rotation. Proposition 4.5 formalizes the intuition as the rotational-equivariant property of generated molecules.

Furthermore, we provide empirical evidence for the acceleration effect. Similar to the main text, we analyzed the generative trajectory with respect to rotation, by measuring the rotation angle between intermediate molecules to the final molecules. The result shows that MolTD requires less rotational transform to align intermediate molecules to the final molecules, demonstrating the acceleration effect of our method in terms of rotation symmetry.

Following the NeurIPS 2025 instruction, we can not upload image in this stage of rebuttal. We will update the results in the revised manuscript.

Reference

[1] Equivariant Flow Matching with Hybrid Probability Transport for 3D Molecule Generation. In NeurIPS23.

[2] Unified Generative Modeling of 3D Molecules via Bayesian Flow Networks. In ICLR24.

评论

Thank you for your detailed answers. I think this is a good paper! I am happy to keep my score and raise my confidence score to 5.

评论

We sincerely appreciate your support and your recommendation for our work! Thanks again for your time and effort in reviewing our paper.

审稿意见
5

This paper addresses the problem of 3D molecular generation. The authors empirically demonstrate that diffusion- and flow-based generative models tend to establish atom permutations early in the sampling trajectories. To analyze this phenomenon, they decompose the generation process into two distinct phases: (1) permutation reordering, which finds a coarse-grained 3D configuration, and (2) atom feature refinement. To accelerate the first phase, the authors propose a geometry-informed prior that initializes the generation process with a representative molecular structure, thereby facilitating early stabilization of atom permutations. Additionally, they introduce a novel training objective aimed at improving the consistency of model parameters throughout the generative trajectory. The method effectively accelerates both the permutation stabilization and atom type assignment phases.

优缺点分析

Strengths:

  1. The analysis in the section Decomposition of the Generative Trajectory is well-designed and provides a compelling justification for the proposed approach.
  2. The method effectively accelerates both the permutation stabilization and atom type assignment phases, leading to improvements in the Number of Function Evaluations (NFE) metric.

Weaknesses:

  1. Evaluation Scope:
    • One of the most practically relevant scenarios for molecular generation is conditional generation, where molecules are optimized for specific properties. Without demonstrating successful conditional generation, the practical significance of the work remains limited.
    • A key evaluation metric, Validity, Uniqueness, and Novelty (V&U&N), is missing from the DRUG benchmark table.
    • Actual runtime measurements are not reported. Although NFE is informative, real-world runtime is crucial, especially when comparing different generative frameworks.
  2. Clarity and Terminology:
    • Several parts of the manuscript are confusing or imprecise:
      • Line 149: Referring to transpositions or swaps as "permutations" may be misleading; a clarification is needed.
      • Line 24: The phrase “comparable” lacks a clear reference—comparable to what?
      • Line 39: The claim that "3D molecular generation requires determining the permutation order of atoms" is overly general. It applies to some model families but not all; for example, fragment-based generation methods followed by conformer prediction do not require this step.
      • Line 40: Permutation ordering is not unique to 3D molecular generation; point cloud generation models may exhibit similar characteristics.
      • Lines 53–57: This section describing the contributions should be more concrete. The current formulation is overly vague and lacks specific technical insights.
      • Line 104: The permutation should be defined over {0,1}\{0, 1\}, not real numbers.
    • Several design choices and methodological details would benefit from further explanation (see the Questions section for specific points).

Overall Assessment

I find the paper to be interesting and promising. The proposed analysis and methods address a meaningful challenge (speed) in generative modeling for molecules. However, for the work to reach its full potential, it would benefit from a more comprehensive evaluation, particularly in the context of conditional generation, as well as clearer presentation. If these concerns are addressed in a revision, I will be inclined to increase my score.

问题

1. Table 1.:

  1. Key Metrics: Could the authors clarify which metrics in Table 1 are considered most indicative of model performance and why? From a drug design perspective, the V&U&N metric appears to be the most relevant. However, it is notably low for MolTD on QM9 and is omitted entirely for DRUG. Could the authors elaborate on the rationale behind this omission?
  2. NFE Definition: What specific function evaluations are counted under NFE (Number of Function Evaluations)? Is this definition consistent across all compared models?
  3. Runtime Reporting: The paper reports NFE but does not include actual runtime measurements. Given that BFN-based models may be computationally expensive, could the authors provide wall-clock runtimes or comment on how well NFE reflects real-world performance?

2. Figure 2.:

  1. GOAT Exclusion: Could the authors explain why the GOAT model is excluded from Figure 2?

  2. Figure Clarity: The third subfigure in Figure 2 is somewhat difficult to interpret. Specifically:

    • What is the boundary of x21x^1_2?
    • What do the circles around x01x^1_0 and x21x^1_2 represent?
    • What does the "Phase I" region denote?
    • What is the distinction between "Permutation" and "Phase I", especially since Phase I seems to be when permutation is established?
    • A more detailed explanation in the figure caption or main text, and possibly a refinement of the visualization, would be appreciated.

3. Geometric-Informed Prior:

  1. Construction of Representative Structure: Could the authors clarify how the representative structure is constructed? Is it derived by aligning a set of molecules and averaging their features, including spatial coordinates?
  2. Number of Representatives: How many representative structures are used in total? Is there one representation structure per stratified subset of molecules?
  3. Inference Procedure: How is the representative structure selected during inference? The notation suggests θp\theta_p is defined for a single representative - how is it determined?

4. Consistency Parameter:

  1. Training Phase Separation: In the introduction, the authors state that "consistency training can be significantly beneficial when specifically adapted for geometric generation in the adjustment phase." This implies that decoupling the permutation and adjustment phases is crucial. Could the authors provide experimental evidence for this claim? For example, what happens if tstablet_{\text{stable}} is removed from Equation (9) and the consistency objective is optimized across the entire trajectory?
  2. Terminology: Could the authors clarify what is meant by “naturally aligned” in line 251?
  3. Proposition 4.6: How does Proposition 4.6 support the joint training of the BFN and consistency objectives? A more explicit connection would help the reader understand the motivation.

5. Training:

  1. Stability Determination: How does the method handle cases where the structure does not stabilize?

6. Ablation:

  1. Component Impact: It would be helpful to include results for MolTD without the permutation and consistency components in Figure 2a and 2b. This would clarify the contribution of each component to the respective metrics.

局限性

yes

最终评判理由

Following the authors' clarifications and the inclusion of additional benchmarks, I have raised my initial score and now recommend the paper for acceptance

格式问题

作者回复

Thank the thorough review and insightful suggestions! Your feedback will help us significantly improve our work. We address your concerns as follows:

W1: Conditional generation

We apply our method to structure-based drug design, integrating it with MolCRAFT [1], a state-of-the-art method in this domain. The prior is constructed using Algorithm 1 from the set of target ligands within the training set. Given that MolCRAFT also utilizes BFNs as its generative backbone, we were able to seamlessly apply our consistency parameter objective to its parameter space. We adhered to MolCRAFT's established evaluation settings to report our results:

ModelNFEQEDSAVina Score (mean)Vina Score (median)Vina Min (mean)Vina Min (median)
TargetDiff10000.480.58-5.47-6.30-6.64-6.83
Decomp-R10000.510.66-5.19-5.27-6.03-6.00
MolCRAFT1000.500.69-6.59-7.04-7.27-7.26
MolCRAFT250.510.65-5.95-6.70-6.73-6.89
MolTD250.540.72-6.60-6.91-7.36-7.24

For Vina-based metrics, lower value indicate better performance. For others, larger value indicate better performance. Our method demonstrates comparable or superior performance with significantly fewer NFEs. This outcome underscores its considerable practical value.

W2: V&U&N metric on DRUG dataset

On DRUG, both our model and all compared works consistently achieve near-perfect scores for uniqueness and novelty. Our model achieves 99.7% Uniqueness and 99.9% Novelty. Given this saturation, we followed the established practice of prior works [2,3] by primarily reporting the validity metric.

Furthermore, we are one of the few works to report molecular stability on the DRUG dataset, as detailed in Table 3. Including this metric provide a more comprehensive evaluation of generation quality.

W3: Actual runtime

The following table presents the wall-clock times required to generate 1000 samples on the QM9 dataset, using a single RTX 3090 GPU with a batch size of 64:

ModelNFEsTime (seconds)
MolTD124.86
GeoBFN10034.52
EquiFM200180
EDM1000760

As the results demonstrate, MolTD achieves a sampling speed over 100x faster than diffusion-based models. Furthermore, the practical speed-up is even greater than what is reflected by the reduction in NFEs alone, as MolTD benefit from a more efficient implementation.

W4: Clarity and Terminology

We have revised the paper based on your valuable comments, a provide a point-to-point response in the following:

  1. Line 149: We change permutations to transpositions to improve clarity.
  2. Line 24: Compared to official results reported by AlphaFold3
  3. Line 39: We change the phrase to '..., 3D molecular generative models that generate the structure holistically, requires determining the permutation order ...', which is more rigorous.
  4. Line 40: We change the phrase to 'This two-phase segmentation is unique to 3D geometric generation and...'
  5. Line 53–57: We have revised the paragraph to more precisely articulate our contribution and provide high-level insights.
  6. Line 104: We have corrected it in the revised manuscripts.

Q1: Which metrics in Table 1 is more indicative?

Both Validity and Stability are crucial for assessing a model's capacity to generate chemically sound molecules. However, Stability is arguably a more robust indicator of model performance than Validity, primarily because the Validity is susceptible to being artificially inflated.

We borrow the discussion from [2] : 'Experimentally, it is observed that validity could artificially be increased by reducing the number of bonds. For example, predicting only single bonds was enough to obtain close to 100% of valid molecules on GEOM-DRUGS. On the contrary, the stability metrics directly model hydrogens and cannot be tricked as easily.'

Furthermore, the Novelty metric, which quantifies the percentage of generated molecules absent from the training set, also requires careful interpretation. While high novelty is generally desired, an extremely high score can paradoxically suggest an underfitting to the training data distribution.

Given these considerations, we believe that the V&U&N metric is less indicative of a model's true generation performance. It should be viewed as a signal that ideally falls within a reasonable range. Our method achieves a V&U&N of 67.03%, aligning well with the 50-70% range reported by previous works in Table .

We report two additional metrics to evaluate quality and diversity: Validity & Uniqueness (V&U) and Atom Stability & Validity & Uniqueness (S&V&U):

ModelNFEV&US&V&U
EDM100090.789.5
GeoBFN10091.590.2
EquiFM20093.592.4
GOAT9091.991.1
MolTD1293.192.5

As the result shows, our method achieves significant speed-up while maintaining superior generation quality.

Q2: Definition of NFEs

One NFE denotes one forward pass of the neural network, which is equivalent to one sampling step in the denoising process. This definition is consistent across all compared models.

Q3 Exclusion of GOAT in Figure 2

We agree that GOAT is a key state-of-the-art model, and we have aimed to compare against it thoroughly.

As shown in Table 1, we conducted a comprehensive comparison with GOAT on all standard metrics, where our method demonstrates a significant speed-up while maintaining comparable generation quality.

The analysis in Figure 2, however, requires analyzing intermediate results, and thus requires model's implementation. Unfortunately, at the time of our experiments, the official implementation of GOAT is still under actively updating. Attempting our own reproduction, especially within a limited timeline, would risk introducing implementation biases.

We are keen to include GOAT in Figure 2 based official implementation or a faithful reproduction.

Q4 More explanation on the third subfigure in Figure 2

We have provided a detailed explanation on the subfigure in the revised manuscript, and a point-to-point response in the following:

  • The boundary of x21x_2^1 refers to one sheet of a two-sheeted hyperboloid with foci at x0ix_0^i and x0jx_0^j, passing through the x21x_2^1. This surface defines the decision boundary for whether a permutation is needed between the i-th and j-th rows, supported by Proposition 4.3 and 4.4.
  • These circles around x01 and x21 represent the neighborhood V(x0i;r)V(x_0^i; r) , where a specifical radius rr is chosen to ensure that these neighborhood are non-overlapping. Within V(x0i;r)V(x_0^i; r), the optimal matching is the identity permutation.
  • Phase I denotes the permutation phase, where permutation is needed. Phase II corresponds to the adjustment phase, where no further permutation is needed during Phase II.

Q5: Geometric-Informed Prior

1. Construction: for size of molecules (number of atoms), prior is created by:

  • Alignment: Using Equivariant Optimal Transport to align all geometric structures within the subset.
  • Averaging: After alignment, we compute the mean of all atomic features (including spatial coordinates) across all molecules in the subset.

2. Number of Representatives: We construct one representative structure for each molecule size contained in molecules.

3. Inference: Following the previous frameworks [2,3], we first decide the number of atoms based on the distribution of training data, which is done by sampling from the histogram of molecule size. Once the number of atoms is fixed, we retrieve the corresponding prior and plug it in Equation (7) to create θp\theta_p.

Q6: Consistency Parameter

1. Training Phase Separation: Consistency training can suffer from high variance during training, because the consistency objective enforces self-consistent outputs across all time steps, creating complex optimization goals [4] .

By introducing tstablet_{stable}, we apply the consistency objective only during the adjustment phase—once a stable atomic structure has been largely determined. This simplifies the learning task and substantially reduces training variance.

We trained our model on QM9 with and without the tstablet_{stable} threshold from Equation (9), repeated each experiment 5 times and reported the mean and standard deviation.

MethodMolecule Stability %
With tstablet_{stable} (Our Method)92.5 ± 1.8
Without tstablet_{stable} (Full Trajectory)82.7 ± 10.3

The results demonstrate the effectiveness of acceleration and variance-reduction of our method.

2. Terminology and Proposition 4.6: Proposition 4.6 establishes that the original BFN objective intrinsically trains the network to predict ground truth molecules from intermediate time steps, which is conceptually identical to that of consistency models. Consequently, both the original BFN and the consistency parameter objective are utilized to stabilize the training process, without introducing conflicting optimization goals.

3. How does the method handle cases where the structure does not stabilize? For the permutation phase, the network is trained using only the BFN loss, meaning the acceleration observed in this phase primarily stems from the geometric-informed prior.

Q8: Include ablation result in Figure 2

Thank for the constructive suggestion. We include visualization in Figure 2 in the revised manuscript

Reference

[1] Qu, Yanru, et al. "MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space." ICML 2024

[2] Hoogeboom, Emiel, et al. "Equivariant diffusion for molecule generation in 3d." ICML 2022

[3] Xu, Minkai, et al. "Geometric latent diffusion models for 3d molecule generation." ICML 2023

[4] Song, Yang, and Prafulla Dhariwal. "Improved Techniques for Training Consistency Models." ICLR 2024

评论

I'm pleased to see the detailed response, particularly the conditional generation benchmark and actual runtime results. Based on this, I am raising my score and recommend the paper for acceptance.

评论

Thanks for your acknowledge of our work and for raising score. Thanks again for your time and effort in reviewing our paper!

审稿意见
6

Proposes a 3D molecule generation model that can generate molecules in a less number of sampling steps compared to existing flow-matching and diffusion-based models. The generation process split into permutation phase and adjustment phase. A geometric-informed prior is introduced to reduce the number of permutations. The adjustment phase is made efficient using consistency training.

优缺点分析

Strengths

  • This is a well-written paper with a clear flow. The method is well-motivated and introduced in an easy to understand manner.
  • Achieves better performance over existing 3D molecule generation methods such as EquiFM and GOAT in a fewer number of sampling steps.

Weaknesses

The weaknesses I noticed in this paper are very minor and do not affect the contributions.

  • The relative speed of MolTD is expressed as a proportion of sampling steps (i.e., 100 x faster than diffusion-based models). Do these speedups reflect in wall clock time?
  • The geometric-informed prior is separately calculated for each N number of atoms. How does the distribution of molecule size affect the quality of the generated molecules?

问题

  • Does a geometric-informed prior calculated for one dataset differ significantly from the prior for same-sized molecules obtained from a different dataset? If so, how would you transform the geometric-informed prior to another dataset?
  • Typo in line 44: analysis -> analyze

局限性

yes

最终评判理由

I confirm my final review score.

格式问题

None

作者回复

W1: Wall clock time comparison

We appreciate the opportunity to clarify the computational efficiency of our model. Apart from the fewer sampling steps, the efficiency advantages in wall-clock time are more significant. The following table presents the wall-clock times required to generate 1000 samples on the QM9 dataset using various generative models, all performed on a single RTX 3090 GPU with a batch size of 64:

ModelNumber of Function Evaluations (NFEs)Wall-Clock Time (seconds)
MolTD124.86
GeoBFN10034.52
EquiFM200180
EDM1000760

As the results demonstrate, MolTD achieves a sampling speed over 100x faster than diffusion-based models like EDM and 20x faster than the flow-matching model EquiFM. Furthermore, the practical speed-up is even greater than what is reflected by the reduction in NFEs alone, as MolTD benefit from a more efficient implementation.

W2: How does the distribution of molecule size affect the quality of the generated molecules?

Thanks for the insightful question! Our approach leverages geometric-informed priors tailored to the number of atoms (N) in a molecule, since molecules with different numbers of atoms show distinct structural patterns (Figure 6 in appendix). This approach allows us to extract fine-grained geometric information within each cluster of molecular structures.

Our method was evaluated on the public QM9 and DRUG datasets, following the established settings of previous work [1,2]. The number of molecules available for each size varies within these datasets. Two factors contribute to improved generation quality:

  • A Better Prior: A structural prior derived from more extensive data better captures the full spectrum of geometric features.
  • A Better Generative Model: More training data yields a more robust generative model. To empirically validate this, we conducted an experiment on the QM9 dataset. For molecules with 12 and 19 atoms respectively, we generated 1000 samples and evaluated the results:
Number of atoms1219
Number of molecules in QM980713364
Atom stability98.899.6

As the results illustrate, molecule sizes corresponding to a higher distribution of training data exhibit demonstrably better generation quality compared to those with lower distribution.

However, we highlight that our method could boost the generation efficiency, even with only 10 samples to construct the prior. We randomly sample 10 molecules for each N to create prior and generated 10000 molecules on QM9:

NFEAtom StabilityMol StabilityValidity
GeoBFN10098.687.293.0
EquiFM20098.988.394.7
MolTD1299.492.5396.04
MolTD (Prior constructed from 10 samples)1299.190.195.3

This result demonstrates the substantial practical value of the geometric-informed prior, even in data-scarce settings.

Q1: How to transform prior to another dataset, the difference of prior constructed from different dataset and its impact on generation quality

We thank the reviewer for this insightful question. Our method utilizes distinct structural priors for each molecule size, derived from the DRUG dataset, and applies them to the QM9-trained model during inference. The effectiveness of this cross-dataset prior depends on the molecule size (N).

For small N, the space of valid molecular geometries is relatively constrained. In this scenario, priors derived from different datasets (QM9 vs. DRUG) are quite similar, and we observe only a marginal performance difference when applying the DRUG prior.

For large N, however, the geometric configuration space becomes vast and complex. The DRUG dataset provides significantly more data for these larger sizes, leading to a much richer and more informative prior. A prior constructed from this much larger and more diverse set captures a broader spectrum of valid geometric features, leading to a notable improvement in generation quality.

To empirically validate this, we conducted an experiment for larger molecules (N = 25 to 29), generating 1000 samples for each size and comparing the results when using priors derived from QM9 versus those from DRUG.

Number of Atoms2928272625
Number of molecules in QM9250266481506
Atom stability of MolTD96.498.598.899.099.4
Number of molecules DRUG5669743809316472567217419
Atom stability of MolTD with DRUG prior98.398.899.299.299.4

These results demonstrate general applicability of geometric-prior, which could utilize external dataset to improve generation quality.

Q2: Typo in line 44: analysis -> analyze

Thanks for your careful assessment and for catching this typo! We will correct it in the revised manuscript.

Reference:

[1] Hoogeboom, Emiel, et al. "Equivariant diffusion for molecule generation in 3d." ICML 2022

[2] Xu, Minkai, et al. "Geometric latent diffusion models for 3d molecule generation." ICML 2023

评论

Thank you very much for the detailed responses addressing each comment separately. I appreciate the efforts authors put into running additional experiments to support the responses. I believe adding these results to the camera-ready version will further strengthen the paper.

评论

Thanks for your acknowledge of our work! We'll make sure to incorporate the additional experiments results into camera-ready version as suggested :)

审稿意见
5

In this paper, the authors introduce MOLTD, a novel approach to accelerating 3D molecule generative models by addressing geometric generation challenges. Through theoretical and empirical analysis, they identify a two-phase generative pattern—permutation reordering and atomic feature adjustment—and propose two key techniques for accelerating each phase: a geometric-informed prior for faster re-ordering and a consistency parameter objective for accelerated adjustment.

优缺点分析

Strength

  1. The idea of Trajectory Diagnosis is interesting, as it also reflects, to some extent, how the molecular generative model understands the process of molecular generation. Moreover, the authors propose targeted improvements for both stages of the generation process.
  2. The authors propose a quantitative framework for analyzing the generative trajectory. This approach allows researchers to identify key considerations for developing improved methods while highlighting the fundamental differences between 3D molecular generation and general domains.
  3. The speed-up of molecular generation is impressive

Weakness

  1. Including more comparisons with relevant acceleration methods from the graph generation field would make the paper more comprehensive, especially considering that molecular generation is naturally more closely related to graph generation tasks.
  2. The term ‘Permutation’ is not that straight forward. (Graph) geometry or structure stage maybe more clear, as actually the ‘permutation’ comes with the structure alignment. And in the view of graph structure generation, maybe more methods in protein structure generation and graph structure generation could be discussed.

问题

  1. Can the findings and the generative framework proposed in this paper be extended to protein structure generation or Structure-based Drug Design?
  2. The Permutation stage and Refinement stage is similar to the graph structure generation and graph feature generation in Graph Generation Domain. Could you please provide some insights and open discussions on this interesting connection?
  3. The model architecture and loss could be SE(3) and Permutation-invariant, how about incorporating Permutation invariant loss that aligns molecule structure first?
  4. The geometric prior is connected with fragment-based molecular generation, how MOLTD perform against these fragment-based molecular generation methods? will the structure prior degrade the diversity of generated samples? As the diversity is important for exploring in the vast molecular space.

局限性

Yes, the authors have discussed the limitations in the Appendix.

最终评判理由

This paper provides a two stage acceleration pipeline for various generative models in SBDD. I think this paper is above the bar in the specific domain but fails to provide more insights in ML sides. But the components and insights the authors propose are not new in generative models and even in another related application domains such as SBDD, protein and Graph Generation. In general, I recommend acceptance for this paper.

格式问题

N/A

作者回复

W1: Comparison with acceleration methods from graph generation

We thank the reviewer for this insightful suggestion regarding comparisons with graph generation acceleration methods.

While related, a key distinction makes direct comparison challenging: 2D graph generation primarily models discrete states (e.g., the adjacency matrix and node types), whereas 3D molecule generation must model continuous states (i.e., the 3D atomic coordinates), which adds a significant layer of complexity. This fundamental difference in data modalities means that acceleration techniques are often not directly transferable.

Nonetheless, we agree that a comparison is valuable. We evaluated our method against two state-of-the-art 2D graph generation models known for their efficiency:

  • GraphBFN [1] : Using probabilistic interpolation for acceleration of graph generation
  • DruM [2]: Utilizes a mixture of diffusion processes to model graph topology.
ModelNFEAtom Stability on QM9Molecule Stability on QM9Atom Stability on DRUGMolecule Stability on DRUG
GraphBFN10099.494.7--
DruM100098.887.383.00.51
MolTD1299.492.586.96.37

The results show that while 2D methods perform well on stability metrics—a natural result of focusing on the simpler, discrete task of graph generation—our method (MolTD) achieves comparable stability with a drastically lower NFE (12 vs. 100-1000), highlighting its high efficiency.

W2: Terminology choice of 'Permutation' stage

We thank the reviewer of this useful comment! We agree that the 'permutation stage' could be ambiguous in the context of our full pipeline, and 'geometry alignment stage' is more consice. We will update it in the next version.

Q1: Application on Structure-based Drug Design

Thanks for the insightful comments! We have applied our method to structure-based drug design, integrating it with MolCRAFT [3], a state-of-the-art method in this domain.

The geometric-informed prior is constructed using Algorithm 1 from the set of target ligands within the training set. Given that MolCRAFT also utilizes BFNs as its generative backbone, we were able to seamlessly apply our consistency parameter objective to its parameter space. We adhered to MolCRAFT's established evaluation settings to report our results:

ModelNFEQEDSAVina Score (mean)Vina Score (median)Vina Min (mean)Vina Min (median)
TargetDiff10000.480.58-5.47-6.30-6.64-6.83
Decomp-R10000.510.66-5.19-5.27-6.03-6.00
MolCRAFT1000.500.69-6.59-7.04-7.27-7.26
MolCRAFT250.510.65-5.95-6.70-6.73-6.89
MolTD250.540.72-6.60-6.91-7.36-7.24

For Vina-based metrics, lower value indicate better performance. For others, larger value indicate better performance. Our method demonstrates comparable or superior performance with significantly fewer NFEs. This outcome underscores its considerable practical value.

Q2 & W3: Relation to protein structure generation

Thanks your for the insightful question! AlphaFold3, the state-of-the-art model for protein structure prediction, utilized the diffusion models to generate spatial coordinates of proteins. Our ablation study in Figure 3 demonstrates that our method can be successfully applied to diffusion-based models, yielding significant acceleration. Thus, we believe the protein structure generation could be similarly decomposed as permutation phase and adjustment phase, and applying our method with proper modification will boost the generation efficiency. Furthermore, we anticipate the permutation phase would be less computationally intensive for proteins, as the fixed amino acid sequence already provides strong conditional information that constrains the possible ordering of nodes.

We are actively exploring this promising direction as part of our ongoing research.

Q3: Discussion on the relation between permutation stage and refinement stage to graph structure generation and graph feature generation

We thank the reviewer for this insightful question, which highlights an interesting analogy between our method and graph generation:

  • Our Permutation Phase is analogous to generating the 2D graph topology (e.g., an adjacency matrix), as it establishes the fundamental geometrical structure.
  • Our Refinement Phase is then similar to the 2D-to-3D "lifting" step, where 2D topology is mapped to 3D geometric space.

However, we wish to clarify a critical distinction: In 3D molecular generation models, we do not explicitly model edges or an adjacency matrix. Instead, the final molecular topology is implicitly defined by the geometric configuration of the generated atoms. While in graph generation, the topology is directly generated as the adjacency matrix. This distinction may suggest why specialized methods, such as those we introduce for the permutation and adjustment phases, may be more beneficial for advancing 3D molecule generation compared with general graph generation.

Q4: Using permutation invariant loss

We thank the reviewer for this insightful suggestion. The suggested approach, which involves aligning molecular structures before applying a permutation-invariant objective, is a valid strategy similar to methods like EquiFM.

In our preliminary experiments, we explored this exact direction. We used Equivariant Optimal Transport (EOT) to align molecular structures first, with the goal of simplifying the subsequent generation task. However, we found this approach to be less effective than our current method.

The primary issue was that pre-aligning the spatial features made it significantly harder for the model to predict the discrete atom types. As a result, the generation process became less efficient, nullifying the potential acceleration benefits. This performance trade-off is empirically demonstrated in Figure 2 of our manuscript, represented by EquiFM.

Consequently, we opted for our current architecture, which we found addresses the challenges of both discrete and continuous generation more effectively, leading to a more robust and efficient model.

Q5. Comparison with fragment-based molecular generation

This is a great question! Adapting fragment-based methods to 3D generation is a challenge due to the combinatorial complexity of assembling fragments in 3D space. Consequently, few models have demonstrated high performance in this area.

For our comparison, we selected two representative state-of-the-art models:

  • Symphony: An auto-regressive model with E(3)-equivariance that iteratively adds molecular fragments using a spherical harmonics framework.[4]
  • HierDiff: A hierarchical diffusion model that generates a coarse fragment-based structure before refining it to a full atomic representation.[5]

We recognize that these models were developed for distinct tasks and were originally evaluated with different metrics. Symphony is primirly evaluated for validity and statistical difference between training data and generated molecules, while HierDiff is foused on drug-likeness. To ensure a fair and direct comparison, we report the Validity and Uniqueness, which are standard metrics consistently used across all compared methods.

ModelTypeNFEValidity % on QM9Validity & Uniqueness on QM9Validity % on DRUGValidity & Uniqueness on DRUG
SymphonyAuto-regressive-68.166.5--
HierDiffDiffusion100087.885.994.094.0
MolTDBFN1296.0493.295.3394.5

As the results show, our methods achieve superior generation quality and efficiency.

Q6: Impact of geometric-informed prior on diversity

We thank the reviewer for raising this important point regarding sample diversity.

You are correct that a structural prior can introduce a trade-off between sample quality and diversity. Our method is designed to make this trade-off fully controllable.

As shown in our ablation study in Figure 5, the influence of the prior can be precisely modulated. By applying a prior with lower accuracy level (or equivalently, inject the prior at the first few steps in generation), the model retains significant freedom to explore diverse chemical space while still benefiting from prior information.

Crucially, our primary results demonstrate that this control does not hinder the model's ability to generate a diverse set of molecules. As reported in Table 1, our method achieves Validity, Uniqueness, and Novelty scores (67.03 combined) that are comparable to state-of-the-art models. Furthemore, we compare the Validity & Uniqueness on QM9 in the following:

ModelNFEV&U
EDM100090.7
GeoBFN10091.5
EquiFM20093.5
GOAT9091.9
MolTD1293.1

This confirms that our approach successfully explores the vast molecular space to generate valid and diverse structures.

Reference

[1] Song, Yuxuan, et al. "Smooth Probabilistic Interpolation Benefits Generative Modeling for Discrete Graphs." ICML 2025

[2] Jo, Jaehyeong, Dongki Kim, and Sung Ju Hwang. "Graph generation with destination-predicting diffusion mixture." (2023).

[3] Qu, Yanru, et al. "MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space." ICML 2024

[4] Daigavane, Ameya, et al. "Symphony: Symmetry-Equivariant Point-Centered Spherical Harmonics for 3D Molecule Generation." ICLR 2023

[5] Qiang, Bo, et al. "Coarse-to-fine: a hierarchical diffusion model for molecule generation in 3d." ICML 2023

评论

I appreciate the efforts authors put in the rebuttal! The responses are point-to-point and thoughtful and I don't expect the authors conducted extra experiments on SBDD, fragment-based methods etc. I would like to thank the authors for these efforts and will update my score.

评论

Thanks for your acknowledge of our work and for updating the score! Thanks again for your time and effort in reviewing our paper.

最终决定

This paper introduces a method to accelerate 3D molecule generative models by analyzing and decomposing the generative process into two phases: permutation reordering and atomic feature adjustment. The authors propose a geometric-informed prior to reduce inefficiencies in the permutation phase and a consistency parameter objective to improve the adjustment phase. Empirical results on QM9 and GEOM-DRUG show that MOLTD achieves state-of-the-art performance with large speed-up compared to baselines.

The reviewers were mainly concerned about limited comparisons to baselines, terminological ambiguity. These were largely addressed in the rebuttal. Overall, the reviewers agree that the paper is technically solid and impactful. I recommend acceptance.