PaperHub
8.2
/10
Poster4 位审稿人
最低5最高5标准差0.0
5
5
5
5
3.8
置信度
创新性3.0
质量3.0
清晰度3.0
重要性3.0
NeurIPS 2025

Straight-Line Diffusion Model for Efficient 3D Molecular Generation

OpenReviewPDF
提交: 2025-05-05更新: 2025-10-29

摘要

关键词
Molecule generation; Diffusion model

评审与讨论

审稿意见
5

The paper introduces a novel diffusion process characterized by near-straight-line trajectories. To achieve this, the authors propose a new perturbing forward kernel designed to minimize the second-order truncation error in ODE discretization schemes (e.g., Euler’s method) for sampling. The primary application is 3D molecular generation, where the authors argue that the proposed diffusion process is particularly suitable due to the high noise sensitivity of molecular data. The model achieves faster sampling speeds without sacrificing performance.

优缺点分析

Strengths

The paper proposes a reverse-engineering approach in which the authors first analyze the truncation error of first-order ODE solvers in sampling. They then design a diffusion process that cancels out higher-order terms, enabling the use of larger step sizes and accelerating the sampling process. To the best of my knowledge, this idea appears novel and interesting.

Weaknesses

  • In terms of molecular generation, this paper heavily relies on the existing framework of UniGEM [1], adopting its model architecture and generation process for molecular atom types and coordinates. However, it remains unclear whether the proposed method offers substantial improvements, given that UniGEM [1] has already achieved strong performance. Notably, Table 1 lacks a direct comparison with UniGEM [1], making it difficult to assess the claimed advantages. As a result, the paper appears to contribute only marginal novelty to the field of 3D molecular generation.

  • The paper does not provide sufficient justification for why a straight-line diffusion process is particularly suitable for molecular data. The explanation given between lines 41 and 45 is unconvincing. Intuitively, whether the diffusion follows a straight-line trajectory or not, small perturbations would still introduce noise and potentially generate invalid molecular conformations. In addition, one could argue that a straight-line diffusion process might struggle to adequately explore the whole molecular conformation or atom coordinate distributions, while standard stochastic perturbations could achieve better coverage, leading to more reliable and smoother data score estimation. Although the authors attempt to address this by introducing a small constant variance, they do not include an ablation study to show how varying this variance impacts model performance.

  • I remain skeptical about the novelty of molecules generated by the model trained under the small-variance straight-line diffusion process, particularly since the authors have not reported any novelty metrics to address this important aspect.

  • Furthermore, several key theoretical derivations lack clarity (see Questions).

问题

  1. Line 43: It remains unclear why straight-line diffusion aligns well with molecular data (see Weaknesses)?

  2. Could the authors incorporate the novelty metric employed in the GeoBFN [2] framework? Specifically, how does the novelty of the generated molecules compare to the training set? It would also be helpful to discuss whether the low stochasticity in the straight-line constant-variance diffusion process increases the risk of the model memorizing training samples.

  3. Line 154: Could you clarify what is meant by a “more stable generative process” and why your proposed schedule leads to such stability? It is not immediately clear what Figure 2 is illustrating—is it showing the generative process or merely the forward noising process? If it is indeed the generative process, how is the same conformation generated across all three models? Based on the figure, it appears that GeoBFN achieves a stable conformation as early as step 4, whereas your method reaches a similar conformation only at the final step. This seems counterintuitive and appears to contradict the claim that your method accelerates the sampling of stable conformations.

  4. Table 1: Did you evaluate the results using multiple random seeds? It would be important to report the standard deviations as in the baselines to ensure a fair and meaningful comparison.

  5. How does the model's performance vary when using different constant sigma values? It would be helpful to understand the sensitivity of your method to this important parameter.

  6. Lines 145-147: It is unclear why the use of Langevin dynamics would make the trajectories to appear more linear or straight at the initial steps. Could you elaborate on the underlying mechanism or intuition behind this observation?

  7. Equation 7: It is unclear where the parameter βt\beta_t comes from? Although Appendix A.5 discusses it, the choice of βt\beta_t remains non-trivial and would benefit from further explanation. Moreover, in Algorithm 2, the purpose of the inner condition is not clearly stated. Could you elaborate on its role?

  8. It would be helpful to include a discussion comparing the training processes of UniGEM and the proposed framework. Specifically, why does the straight-line diffusion approach result in significantly longer training times than the standard diffusion method—10 days versus 7 days on QM9, and 16 days versus 6.5 days on GEOM-Drugs—despite both methods using the same architecture, learning rate, and optimizer? Intuitively, one might expect that using a simpler straight-line constant-variance diffusion would accelerate training as well. Clarifying this discrepancy would strengthen the experimental section.

References

[1] UniGEM: A Unified Approach to Generation and Property Prediction for Molecules. In ICLR25.

[2] Unified Generative Modeling of 3D Molecules via Bayesian Flow Networks. In ICLR24.

局限性

yes

最终评判理由

The paper introduces a straight-line diffusion model (SLDM), which is mathematically grounded, and performs well on 3D molecular generation. During the rebuttal, the authors have addressed my primary concern about SLDM relying on the previous UniGEM model. However, the authors still need more clarification on Figure 2 in the revised version. During the rebuttal, the authors have added several new results, which I believe will further strengthen the paper. It's thus that I recommend accepting the paper.

Best,

格式问题

No formatting error found

作者回复

Thank you for the detailed and thoughtful feedback, which has greatly helped improve the clarity of our work. We carefully incorporated these explanations in the revised version. We also sincerely appreciate your recognition of our novel contribution.

We believe the main concern may stem from misunderstandings, and we hope our responses will clarify the key points. If so, we would be truly grateful if you would consider increasing the score.

Lack a direct comparison with UniGEM

Reply: We believe there may be misunderstandings of the experimental results. Contrary to the reviewer’s impression, our work does include direct comparisons with UniGEM. Specifically:

  • The EDM(U) baseline used in Table 3 and Figure 7 corresponds to the main model proposed in the UniGEM paper, which uses EDM as diffusion model.
  • Our results demonstrate that under the UniGEM framework, SLDM achieves significant improvements in sampling efficiency over UniGEM. For example the mol_stable:
    • T=1000: unigem 89.8 v.s. SLDM 95.42
    • T=50: unigem 85.73 v.s. SLDM 93.37
    • T=10: unigem 23.56 v.s. SLDM 87.46

We intentionally did not include UniGEM in Table 1, because Table 1 is focused on comparing different diffusion models (e.g. BFN, DDPM/EDM, Flow matching), while UniGEM is not a new generative model, but rather a change in the generation paradigm—i.e., decoupling the generation of coordinates and atom types. In fact, UniGEM can be combined with EDM, BFN, or other diffusion models. But if you believe it would improve clarity, we are happy to include UniGEM in Table 1 as well.

Why a straight-line diffusion process is particularly suitable for molecular data (lines 41-45) ? Small perturbations would still introduce noise and potentially generate invalid molecular conformations.

Reply: We believe there might be some misunderstanding of the role of noise scheduling and our statements. For any diffusion-based model, the forward (noising) process inevitably drives the signal-to-noise ratio (SNR) toward zero, resulting in fully corrupted, uninformative structures regardless of whether the trajectory is straight or not.

Our contribution is not to avoid this endpoint, but to control the rate at which information is destroyed. In particular, SLDM employs a slower decay of SNR, meaning the molecular structure is not immediately and completely perturbed at early steps. Instead, noise is introduced more gradually, preserving meaningful spatial relationships longer and enabling a more uniform and gradual reconstruction in the reverse process. This contrasts with EDM schedules that overly disrupt structure early, potentially making recovery harder.

We also note that the brief explanation between lines 41–45 is expanded in detail in Section 3.2 for the benefits of such a schedule in the molecular setting.

Incorporate the novelty metric? SLDM may struggle to explore the whole distributions

Reply: We thank the reviewer for raising this important point regarding novelty metric in molecular generation.

  1. On novelty evaluation in QM9: As discussed in the EDM paper [1], novelty metrics are not meaningful on QM9 dataset because it is an exhaustive enumeration of small organic molecules within a narrow chemical space. EDM further observes that novelty consistently decreases as training progresses. Following standard practice, we thus omit novelty reporting on QM9, as do prior works including EquiFM, END, GeoLDM, and UniGEM.
  2. On meaningful novelty in broader chemical space: To provide a more meaningful evaluation, we report novelty scores on GEOM-Drugs, a dataset sampling sparsely from a much larger chemical space. As shown below, our method (SLDM) achieves higher novelty than EDM (implemented in UniGEM), while maintaining strong validity and uniqueness: || Atm_stab(%)|Validity(%)|Valid&Unique(%)|Valid&Unique&Novel(%)| |-|-|-|-|-| |EDM (U) (T=1000)|84.93|98.36| 98.31|98.18| |SLDM (T=1000)|88.19|99.96|99.91|99.72| |SLDM (T=50)|89.16|99.52|99.49|99.42|

These results confirm that SLDM can generate novel, valid, and unique molecules in realistic settings. We appreciate the reviewer’s concern regarding the potential risk of memorization in a low-variance setting. However, SLDM does not remove stochasticity, rather, it carefully balances stochasticity over time through the annealed temperature scheme eq. 9. This ensures early-stage exploration while gradually refining the structure in later stages for stability. This design prevents collapse into duplicated training samples while maintaining high fidelity. As shown in Appendix C.4, we tune the annealing rate to maximize the unique & valid score, achieving an effective balance between diversity and chemical stability.

Ablation of constant sigma

Reply: We find our method remain robust across a reasonable range of σ values. We only trained for 1280 epoch due to time limitation of rebuttal.

epoch=1280Atom_sta(%)Mol_sta(%)Valid(%)V*U(%)
σ=0.0399.392.695.292.4
σ=0.0499.293.395.392.7
σ=0.0599.392.795.792.8

Random seeds for tab.1

QM9GEOM-Drugs
mol_stabatm_stableValidityV*Uatm_stabValidity
Seed 193.3599.3096.3393.3889.1599.45
Seed 293.2299.2896.3193.4189.1299.53
Seed 393.4699.3196.2993.6189.0199.56
Mean ± Std93.34±0.1099.30±0.0196.31±0.0293.47±0.1389.10±0.0699.51±0.05

Other issues on clarity:

Line 154: What is “more stable generative process” and why?

Reply: By “more stable,” we refer to the fact that our schedule injects significantly less noise/slower SNR drop compared to EDM in the early steps of the reverse process, resulting in a more gradule, less chaotic trajectory. This stability avoiding large, abrupt shifts that lead to invalid conformations in small t.

Fig2 : backward or forward process?

Reply: It is the forward noising process. Showing the reverse sampling process would not ensure that all models generate the same molecule, making comparison unclear.

Fig2: GeoBFN faster?

Reply: We respectfully note that this is a misinterpretation. The figure shows the diffusion process from t = 1 to t = 0 (left to right). In diffusion models, even when using fewer sampling steps, all methods simulate the full trajectory from t = 1 to t = 0 by discretizing it differently, not by stopping early (e.g., at t = 0.5). GeoBFN’s schedule allocates many steps focusing on fine-tuning geometry, while compressing the challenging structure formation phase into very few early steps, limiting model capacity.

L145-147: why Langevin dynamics make trajectories linear?

Reply: We clarify that Langevin dynamics does not directly make the sampling trajectory more linear, but rather mitigates the impact of deviations from the ideal trajectory at early steps, i.e. improving robustness to initial sampling error. We agree the wording in the original text may be misleading and will revise the manuscript accordingly.

An ablation support this, SLDM maintains high validity and stability (surpassing other baselines in paper) even when the initial distribution deviates from the theoretical optimum(ε: standard Gaussian noise) (Sampling steps T=50)

Prior distributionMol_stab (%)Atom_stab(%)Validity (%)V*U (%)
σ·ε (ideal)93.4399.3296.1093.31
2·σ·ε91.8199.1295.0092.79
0.1·σ·ε93.3199.2996.0793.46
0.01·σ·ε92.9499.2496.0693.06

Eq.7: where does β come from?

Reply: Eq.7 approximates the true reverse transition distribution p(xt−Δt∣xt) using a Gaussian with a mean and variance parameterized by β. While the true reverse distribution is intractable and non-Gaussian, it is possible to derive its true conditional mean analytically under our forward process assumptions, as shown in Eq.41. To ensure our Gaussian approximation is as faithful as possible, we determine β s.t. the mean of Eq.7 matches this true conditional mean.

Alg 2: why inner condition?

Reply: The inner condition (i.e., skipping noise injection at the final step) is a common technique used to improve the final sample quality. Intuitively, this step allows the sampler to directly apply the learned score function to denoise the sample, effectively implementing Tweedie’s formula.This idea has been adopted in previous works such as DDPM [1, Alg.2] and improved score-based models [2, Alg.1], and we follow the same practice here.

[1] Denoising Diffusion Probabilistic Models

[2] Improved Techniques for Training Score-Based Generative Models

We additionally provide ablation results below to show the effect of this choice:

Tif last step has noiseMol Stable (%)Atom Stable (%)Validity (%)U×V
50w/ noise90.9198.9994.9492.25
50w/o noise (ours)93.3799.3096.2493.63
1000w/ noise94.9799.3696.9190.55
1000w/o noise (ours)95.4299.4397.0790.42

Why longer training time?

Reply: Thanks for the suggestion. We confirm that SLDM already outperforms UniGEM under the same number of epochs and training time:

QM9GEOM-Drugs(GD)
Model (Epochs, T=1000)mol_stabatm_stableValidityV*Uatm_stabValidity
UniGEM (QM9:2000, GD:13)99.0089.8095.0093.2085.1098.40
SLDM (QM9:2000, GD:13)99.4195.3297.0490.4887.8799.55
SLDM (QM9:2980, GD:32)99.4395.4297.0790.4288.3099.95

While longer training leads to slight performance gains, we found that the mol_stable and atm_stable metrics remain stable around 2000 epochs on QM9 and 13 epochs on GEOM-Drugs during the training process. This indicates that extended training time is not a necessary factor for SLDM’s performance advantage.


If there are any further questions or clarifications needed, we would be glad to continue the discussion.

评论

Thank you for your detailed responses. However, I still remain unconvinced about them.

1. Lack a direct comparison with UniGEM

I understood what UniGEM [1] proposed. However, your framework still relies significantly on the UniGEM framework. The results you presented in the rebuttal are unconvincing, as UniGEM's results can be applied to different diffusion models. I would expect to see if SLDM can achieve good performance without the UniGEM approach, meaning predicting atom types and coordinates separately as EDM did. If your model performs well, I would be satisfied with your contributions. If not, I would be afraid that your approach to success relies significantly on the UniGEM's success.

In addition, the claim that UniGEM is not a new generative model is somewhat of a bold statement. I read the UniGEM paper, where the authors claimed that it is the first diffusion-based unified model to successfully integrate molecular generation and property prediction, delivering superior performance in both tasks, as stated in the paper's abstract. Even though they are not a new diffusion model as you said, an ablation study of work on using SLDM alone without UniGEM will strengthen your contributions.

2. Incorporate the novelty metric? SLDM may struggle to explore the whole distribution

Novelty metrics are not meaningful on the QM9 dataset. I have seen some papers using this fact to avoid evaluating the novelty metric on QM9. I think this is not a good excuse and not a good practice. What would make sense to learn and generate your training data again? If it's not a good dataset, it should be excluded from the evaluation.

Moreover, GeoBFN [2] is a framework that evaluates the novelty metric on QM9, achieving reasonable results ranging from 65% to 70%. You compared SLDM with GeoBFN on QM9, but ignored the novel metric.

In contrast, I think GEOM-Drugs is not a good dataset to evaluate the novelty metric, as its search space is very huge due to a wide range of atom sizes. To the best of my knowledge, most diffusion models can easily achieve 100% novelty of this dataset. That's also a reason why several models ignored the novelty metric on GEOM-Drugs.

Again, I am not yet convinced about the linear straight-line diffusion model with a small noise variance in your experimental results. It would be beneficial to measure the novelty metric of this approach on the challenging dataset QM9, as you did for GEOM-Drugs.

3. FIGURE 2

I am still unclear on your answers to Figure 2. The figure shows the diffusion process from t = 1 to t = 0 (left to right)' which is exactly what I read about the figure from left to right. I could not interpret your phrase, the diffusion process from t = 1 to t = 0. When I think about a diffusion process, it is a forward noise process that should go from t=0 to t=1. Could you make your interpretation clearer?

It is the forward noise process. Showing the reverse sampling process would not ensure that all models generate the same molecule, making comparison unclear. I still could not understand your phrase. If, as you said, this is a forward process, I still see the same molecules on the right side of the figure. Why would you say they are not the same molecules?

I began to become skeptical about Figure 2. I thought that you would try to show the generation/ sampling process as GeoBFN did in their Figure 3 [2], in which I read that GeoBFN generated stable molecules faster than EDM. However, through the rebuttal, it seems not to be what I thought. And I think it's a good way to visualize, as GeoBFN did in Figure 3, because SLDM also tried to generate stable molecules faster. So, I am still unsatisfied with your answers on Figure 2. In addition, I think this is the most important and intuitive figure of your manuscript, visualizing the performance of different approaches.

I understand that you have attempted to strengthen your approach through theoretical development and additional experimental results in the rebuttal. I would appreciate this and consider it a good point. There are still some minor things that can be discussed in your manuscript. However, given the limitation on the rebuttal period, it would be sufficient to clarify the above questions. Thank you!

[1] Feng et al. UniGEM: A Unified Approach to Generation and Property Prediction for Molecules. In ICLR 2025.

[2] Song et al. Unified Generative Modeling of 3D Molecules via Bayesian Flow Networks. In ICLR 2024.

评论

Concerns about Figure 2:

Sorry for the confusion. Figure 2 was generated by adding noise to the same initial molecule following the timestep schedule of the forward diffusion process. This is what we meant by stating "Figure 2 illustrates the forward noise process" in our previous response. To further confirm, we consulted the authors of GeoBFN, who clarified that the visualization in their paper (Fig. 3) follows the same logic: it also depicts the forward noise-adding process (drawn by adding noise to the same initial molecule). Both the visualization method and our analytical approach align with those used in GeoBFN’s work.

Regarding the terminology: In our paper, the term "diffusion process" may refer to either the forward (noise adding) process or the reverse (generation) process, depending on the context. This usage is consistent with the convention adopted in previous work e.g.[1]. If we were referring to the forward noise process, it would indeed go from t = 0 to t = 1, which is consistent with your understanding.

[1] Score-Based Generative Modeling through Stochastic Differential Equations, ICLR 2021

Thank you again for your careful and detailed feedback. Please feel free to reach out with any additional questions or concerns.

评论

Thank you very much for your prompt response and constructive discussion. We hope our clarifications can addressed your concerns effectively, and we remain happy to provide further details or explanations to facilitate a thorough evaluation.

  1. I would expect to see if SLDM can achieve good performance without the UniGEM approach, meaning predicting atom types and coordinates separately as EDM did.

Reply: Compared to EDM, UniGEM provides a more advantageous generative framework. In a similar way to how many prior methods were developed based on EDM, our proposed SLDM is built upon the UniGEM framework.

In our previous response, we have already compared the accuracy between SLDM and UniGEM, showing that SLDM achieves better performance. To further address your new question regarding whether SLDM can work independently of UniGEM, we introduce a variant termed SLDM (joint diffusion). In this setting, the atom type is generated using SLDM’s own schedule rather than UniGEM’s prediction strategy. The results are shown in the table below.

Atom Sta (%)Mol Sta (%)Valid (%)V×U (%)
EDM (T=1000)98.782.091.990.7
GeoBFN (T=1000)99.0890.8795.3192.96
Sldm (joint diffusion) (T=1000)99.0191.5795.291.57
GeoBFN (T=500)98.7888.4293.3591.78
Sldm (joint diffusion) (T=500)99.8990.5794.0091.11
GeoBFN (T=100)98.6487.2193.0391.53
Sldm (joint diffusion) (T=100)98.9290.3193.8489.36
GeoBFN (T=50)98.2885.1192.2790.72
Sldm (joint diffusion) (T=50)98.7088.0992.8489.53

Although the parameters may not be fully optimized due to the limited time available during the rebuttal period, SLDM (joint diffusion) still outperforms EDM and either outperforms or is comparable to GeoBFN, with this advantage being particularly pronounced in scenarios with limited sampling steps T.

  1. It would be beneficial to measure the novelty metric of this approach on the challenging dataset QM9.

Thank you for your thoughtful follow-up. We understand your concern about evaluating the novelty metric on QM9, and we appreciate the opportunity to further clarify our position.

As noted in our previous response, it is standard practice in recent molecular generative modeling literature not to report the novelty metric on QM9. This is not merely an excuse, but a reasoned and widely accepted decision, grounded in the nature of the dataset itself. QM9 is a filtered subset of the GDB-17 universe, containing all 133,885 stable small organic molecules with up to 9 heavy atoms (C, N, O, F), exhaustively enumerated under predefined rules of valency, chemical stability, and synthetic feasibility.

In fact, the validation and test sets combined account for only about 23.7% of QM9. Thus, any generated molecule outside this portion, yet still marked as “novel”, is likely invalid under the QM9 and GDB-17 filtering rules. Therefore, reporting high novelty scores on QM9 may not indicate improved performance; it may instead suggest a model’s failure to faithfully learn the target distribution.

This view has been widely recognized and adopted by the community. For example, EDM and [1] explicitly state this position. EDM and [2] also observe that novelty steadily decreases during training. We observed the same trend in our model as well as in the training of GeoBFN.

However, QM9 remains a valuable benchmark, as it is widely used to assess a model’s ability to generate chemically valid and stable molecules, a nontrivial task, given that many models still struggle with valency and stability constraints (which are evidently improved by our SLDM).

Besides, we believe GEOM-Drugs better reflects real-world molecular design, where training data sparsely covers a much larger and more diverse chemical space, in which case the novelty metric is reasonable.

[1] Top-n: Equivariant set and graph generation without exchangeability, 2021

[2] Geometric Representation Condition Improves Equivariant Molecule Generation, ICML 2025

(Please see next post for remaining replies...)

评论

Thank you for your thoughtful feedback! I believe your novel results, which importantly prove that SLMD can work independently with GeoBFN. Thus, my primary concern has been solved. However, my belief in Figure 3 of GeoBFN remains unchanged. It should be a generation process, as explicitly stated in Section 3.3 and the GeoBFN author's discussion with the Reviewer mPXU on OpenReview. As the authors have provided substantial novel results during the rebuttal, I will increase the score to 5 and recommend for acceptance.

Best,

评论

Dear reviewer,

We sincerely thank you for your dedication during the review process. Your professional insights and constructive suggestions have significantly improved the overall quality of the manuscript. We have carefully recorded all points raised and will incorporate them into the revised version to ensure better rigor and clarity. Thank you again for your time, expertise, and thoughtful suggestions.

Wishing you all the best!

审稿意见
5

The paper proposes a type of diffusion model specifically designed for 3D molecular generation. The core idea is to choose a schedule \mu and \sigma of the SDE to achieve a near straight-line backward process. This allows the model to produce high quality samples with a significantly smaller number of steps.

优缺点分析

Strengths:

  1. The goal of the method is to generate molecular samples with a small number of steps. The technical motivation for that is natural and clear: To reduce the error of first order discretization, choose a schedule of the diffusion SDE so that the backward process is linear. The exposition revolves around this motivation and is clear and self-contained.

  2. The experimental results shown in Table 1 and Figure 4 are impressive, demonstrating that the method achieves significant speed up over the baselines with better quality.

Weaknesses:

  1. There are plenty of works focusing on accelerating diffusion generation for other domains such as image generation (e.g. [1]). In the experimental setup (Section 4.1) the authors only seem to compare to methods already used in molecular generation. I suspect directly adapting the progress in fast image generation to molecular generation can already give significant speed-up. More such experimental comparison is favorable.

References: [1] Consistency Models, Song et al.

问题

  1. In Appendix A.4, I don't quite understand the logic jumping from the toy example of delta distribution to a general data distribution. Is it just a loose analogy, without rigorous mathematical reasoning or intuition?

局限性

I do not have much to say about it.

最终评判理由

Overall I think the paper is well motivated and the solution is on point. I keep my original positive rating.

格式问题

None

作者回复

We sincerely thank the reviewer for the thoughtful comments and for recognizing the motivation and contributions of our work. If deemed useful, we will include the discussion in our paper.

I suspect directly adapting the progress in fast image generation, e.g. consistency model, to molecular generation can already give significant speed-up. More such experimental comparison is favorable.

Reply: Thank you for this insightful suggestion. We fully agree that progress in accelerating diffusion models from other domains is potentially inspiring for molecular generation. We have discussed their connection with SLDM in Appendix D.2.

However, in practice there are nontrivial domain-specific challenges that limit their effectiveness in molecular settings. For example, a recent blog post “Equivariant Diffusion for Molecule Generation in 3D using Consistency Models” attempted to apply the Consistency Models to the 3D molecule generation task using the EDM framework (which shares the same architecture and evaluation protocol as ours). Although they achieved up to 24× speed-up, the best atom stability reported was only 19%, which is significantly lower than both our method and most baselines (>95%). In our view, this reflects a domain gap that requires careful adaptation, beyond simple plug-and-play transfer, underscoring the importance of developing specialized and efficient generative models for molecular data — which is precisely the goal of our work. We believe transferting image domain techniques to molecules is an important future research.

In Appendix A.4, I don't quite understand the logic jumping from the toy example of delta distribution to a general data distribution. Is it just a loose analogy, without rigorous mathematical reasoning or intuition?

Reply: The delta-distribution case in Appendix A.4 serves as an intuitive starting point, not a derivation for general data distributions. Specifically, our goal is to design a schedule that yields as linear sampling trajectories for any data distributions as possible. As a necessary condition, the linearality should holds for any special data distribution. Due to its analytical tractability, we use the delta distribution case to derive a necessary condition for the forward process: a constant noise level (σ = const) and a linearly interpolating mean (μ(t) linear). The necessary condition and boundary condition(line85-86) determine the SLDM schedule.

We then move to the general case and provide a separate, formal analysis (Thm3.1) showing that SLDM produces approximately linear trajectories even for realistic data distributions. So the toy case is used only as motivation, while the general result is proved independently. We believe this structured approach, from intuition to formal validation, provides both interpretability and generality.


If there are any further questions or clarifications needed, we would be glad to continue the discussion. If our responses have addressed your concerns or contributed to a clearer understanding of our work, we would deeply appreciate your consideration for a higher score or confidence, which would greatly support the visibility and potential impact of our work within the community!

评论

Dear reviewer QjdK,

After considering the rebuttal to your review and the other reviews/rebuttals, how and why has this affected your position on this submission? Please reply with an official comment (not just the mandatory acknowledgement) reflecting your current view, any follow-up questions/comments etc.

Note the Aug 6 AoE deadline, make sure to respond in time for the authors to be able to submit a response if necessary.

审稿意见
5

This paper presents a straight-line diffusion method (SLDM) for generating molecules in 3D. A unified framework is presented to understand truncation errors that lead to low-efficiency sampling across existing diffusion, flow matching, and Bayesian flow methods for molecular generation, framing them all as continuous-time ODEs. It is observed that large-magnitude second-order terms in the numerical approximation to the backward solution trajectory of the ODE necessitates small step sizes that limit efficiency. To mitigate this, SLDM is proposed to enforce a near-linear sampling trajectory that minimizes second derivative magnitudes and allows for larger step sizes and higher efficiency generation. Additional algorithmic components are introduced to improve sampling fidelity, including a Langevin dynamics term with time-annealing temperature control. Results are included and compared to baseline methods for unconditional and conditional generation with QM9, GeomDRUGs, and toy datasets.

优缺点分析

Strengths:

  1. The approach taken “[to] understand why existing methods suffer from low efficiency, we analyze the issue through the lens of truncation error in sampling” is well-posed
  2. Establishing a unified framework for assessing this over diffusion, flow matching, and BFNs is nice, framing them all as “as continuous-time Ordinary Differential Equations (ODEs)”
  3. The observation (line 31) that second-order terms can be large and connecting this why small step sizes are required to reduce truncation error (ie readout) is very useful. Proposing a solution to “strive for a linear sampling trajectory," i.e., to push the 2nd-order term toward 0 is also well-posed
  4. Introducing a linearly decaying μ(t)=1tT\mu(t) = 1 - \frac{t}{T} w/ small constant σ\sigma to guarantee a near-linear trajectory is really nice
  5. Strong results in Figure 1 (speedup w/ high stability) and Table 1 (generation quality metrics). In particular, it’s quite impressive to achieve rather high validity & stability while predicting bond types from coordinates, meaning that the geometries generated are quite good
  6. Introducing the Langevin component to mitigate observed initial deviations from linear sampling in the stochastic sampler is really nice
  7. Figure 2 demonstrating the slow diffusion, stable process, unfolding uniformly from the origin appears quite realistic and elegant (though see questions)

Weaknesses:

  1. Predicting atom types based on coordinates instead of diffusing jointly. I see that UniGEM is referenced as having "superior performance," but I don’t understand the chemical feasibility of generating 3D coordinates without knowing atom types, as the same set of atoms could have any number of reasonable geometries depending on what the atom types of each node are. Or, ie, the atom identities are typically thought to determine the geometry (and they do). So is the idea here to learn geometries of nodes that sort of implicitly contains atom-type information and purely from data, then fill in atom types post-hoc? This seems like it would limit generalization capacity, but I could be wrong. Please comment
  2. The baseline (NatomsN_{atoms} classifier) for conditional generation is not super relevant – better would be to train a real classifier from, e.g., 2D graphs or 1D strings. Or, more realistically, to compare to other conditional or classifier-guided generation methods from the literature, particularly given that the conditioning mechanism here seems to be fairly simplistic, though it is barely formulated or discussed. More information on how the conditional generation is done would be nice, as this is an important setting
  3. Molecular stability is not included in GeomDRUGs, which leads one to assume it’s fairly low. This might be reasonable given it is challenging, but there are methods that can achieve strong performance (e.g., VoxMol). Please comment, strong results here would increase significance of the work
  4. Very slow training (Appendix E, line 822)

问题

  1. Could you clarify why the SLDM formulation results in the deviation from linear sampling at early steps, ultimately correcting to linear?
  2. After line 154 (and in Figure 2), it is said that “the process unfolds uniformly from the origin, preserving the relative spatial relationships of atomic coordinates in intermediate states.” How is this so? I also don’t quite understand how the process unfolds from the origin, is this in the forward or reverse process? And is this a part of the implementation? Please clarify. One could imagine with a set of atom types initialized from the origin, there are many ways to unfold the process; if chemical/spatial relationships are preserved, you might observe that the final structure is essentially determined in early time steps. Is this what you see and is this beneficial? Or, do you observe that the initial unfolding involves some structure (atom-type) rearrangements, ie, that the early states have a degree of flexibility or stochasticity before the final molecule is largely determined? One might imagine this would be beneficial to prevent forms of memorizing from the initial origin (ie v high noise state), which seems to be what plagues other methods. Is this what's taken care of by time-annealing temperature schedule for the Langevin in (9)?
  3. Is Figure 2 showing forward or reverse processes? Sort of confusing matching with the main text description in 155–162 and noise scheduling in Figure 3

局限性

yes

最终评判理由

During the rebuttal process, the authors added clarification to the questions raised by this reviewer. This reviewer's rating for clarity of the manuscript was already high at "good" and remains there, reserving the maximum score "excellent" for exceptionally clear manuscripts. However, it is because of the clarifications that this reviewer's confidence in the work's acceptance has increased, and the confidence rating has been increased accordingly.

格式问题

NA

作者回复

Thank you for your recognition and the thoughtful suggestions, many of them inspired us to improve both the clarity and completeness of the paper. We will carefully revise the manuscript to address these points.

Unigem rationale discussion

Reply: We would first like to clarify that the generation framework used in UniGEM is not the contribution of our paper, but we are happy to share our understanding on this design choice. We fully agree with the reviewer that atom identities typically determine the geometry. However in de no vo molecule generation, both atom types and coordinates need to be modeled. Typical way is to jointly iteratively refine atom types and coordinates by diffusion model. However, modeling discrete variables (like atom types) in diffusion models is challenging. UniGEM notes that atom type generation by diffusion can lead to instability and mode oscillation (i.e., frequent switching between atom types during denoising), which in turn negatively impacts coordinate generation. By first generating continuous coordinates alone, UniGEM avoids this problem and achieves more stable and accurate 3D predictions.

This idea has been further supported by [1] (Sec. 4.3 last paragraph), showing that generating positions first and predicting bonds/types later leads to better results.

Regarding:"Is the idea here to learn geometries implicitly contain atom-type information and purely from data, then fill in atom types post-hoc?" Yes, that is indeed the key assumption behind UniGEM.

[1] Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule

Conditional generation and baseline (classifier).

Reply: The classifier is not part of the generative model, but is used post hoc to evaluate whether generated molecules match the conditioning properties. We apologize for any confusion and will clarify the setup in the revised version. All settings follow baselines. Briefly:

  • QM9 is split into two halves: one to train the property predictor 𝜙, the other for training the conditional generative model.

  • 𝜙 evaluates generated molecules; conditional accuracy is measured by MAE between predicted and target properties.

  • Conditioning machanism: property values are concatenated to atom types as EGNN's invariant input. Though simple, this method performs well and even outperforms recent guidance-based approaches: |Category|Model|Cv|µ|α| ∆ε|HOMO|LUMO| |-|-|-|-|-|-|-|-| |Classifier-guidance|EEGSDE|0.941|0.777|2.50|487|302 |447| |Training-free guidance|TFG-Flow|1.750|0.817|2.32|804|364 |941 | |Conditional generation|Cond-Flow|1.520|0.962 | 3.10|805|435|693 | |"| Cond-EDM|1.065|1.123|2.78|671|371|601| |"| SLDM|0.745|0.797|1.46|440|320|348|

Classifier-guided methods require an auxiliary classifier at each diffusion step, which incurs high computational cost. Since our goal is to evaluate the ability and efficiency of the generative model, we chose not to include such methods, but we’re glad to include them if you find it helpful.

Lack of mol_stable metric in GeomDrugs and compare to VoxMol

Reply: We agree that molecular stability is a key metric for 3D molecule generation. For GEOM-Drugs, the low molecular stability is not specific to our method. As noted in [1], EDM-style bonding heuristics are not well-suited for this dataset due to its diverse structures. Even ground truth molecules score poorly under this evaluation (see Table below). Despite this, our model (SLDM) shows clear improvement, validating the benefit of our low-temperature sampling (lines 189–193).

Mol Stab(%)Atm Stab (%)Valid (%)Unique (%)
Data2.8086.5099.90100.00
SLDM (T=50)7.0989.0399.5799.97

[1] GEOM-Drugs Revisited: Toward More Chemically Accurate Benchmarks for 3D Molecule Generation

Regarding VoxMol, we appreciate the reference. VoxMol adopts a different evaluation setup, which use via OpenBabel to get bond and allows charged atoms. We re-evaluated SLDM under their protocol and outperforms VoxMol:

Mol Stab(%)Atm Stab (%)Valid (%)Unique (%)
Data99.9099.9099.80100.00
EDM40.3097.8087.8099.90
VoxMol75.0098.1093.4099.10
VoxMol (Oracle)81.9099.0094.7097.40
SLDM (T=50)84.8899.2699.5899.99
SLDM (T=1000)89.4099.7199.9799.92

Allowing charged atoms may overly relax evaluation and may be unsuitable for models that do not generate charges. Thus, we did not adopt this metric, but we’re happy to include the comparison in the appendix if you find it helpful.

Slow training

Reply: Thanks for point this out. We compared the performance of SLDM with other baselines, such as UniGEM.

QM9GEOM-Drugs(GD)
Model (Epochs, T=1000)mol_stabatm_stableValidityV*Uatm_stabValidity
UniGEM (QM9:2000, GD:13)99.0089.8095.0093.2085.1098.40
SLDM (QM9:2000, GD:13)99.4195.3297.0490.4887.8799.55
SLDM (QM9:2980, GD:32)(reported in our paper)99.4395.4297.0790.4288.3099.95

We observes SLDM already achieves state-of-the art result under the same number of training epochs and training time. While longer training leads to slight performance gains, we find that the mol_stable and atm_stable metrics remain stable around 2000 epochs on QM9 and 13 epochs on GEOM-Drugs during the training process. This indicates that extended training time is not a necessary factor for SLDM’s performance advantage.

Clarify why the SLDM formulation results in the deviation from linear sampling at early steps, ultimately correcting to linear?

Reply: The deviation from a linear trajectory in SLDM primarily originates from the trajectory crossing phenomenon in the forward diffusion process, as discussed in Rectified Flow. Although the forward noising process is defined as xt=(1t)x0+σ(t)ϵx_t=(1−t)x_0+σ(t)ϵ which suggests a linear mean trajectory (1t)x0(1 - t) x_0, the added stochasticity introduces crossing between diffusion paths, i.e. the same xtx_t can be derived from multiple different x0x_0. This crossing effect becomes especially pronounced at early sampling steps (t1t\to1), where the noise dominates and leads to high uncertainty in the reverse mapping.

Specifically, in SLDM, the reverse ODE is given by Eq.36 dxtdt=𝐸[x0xt]\frac{dx_t}{dt} = -𝐸[x_0|x_t]. If 𝐸[x0xt]𝐸[x_0|x_t] were constant, the sampling path would be linear. But due to forward crossing, 𝐸[𝑥0𝑥t]𝐸[𝑥_0∣𝑥_t] shifts over time: from dataset average at high t to more precise 𝑥0𝑥_0 as t→0, resulting in curved trajectories. This behavior is quantitatively characterized in Theorem 3.1 and visualized in Figure 6.

Notably, this issue is common in diffusion models. SLDM controls this deviation within initial generative steps (Theorem 3.1), and our stochastic sampling helps mitigate error accumulation over time. Other approaches often rely on costly training techniques like distillation or solving OT to overcome this issue, as discussed in Appendix D.2.

Clarify Figure 2 setup, memorizing issue

Reply: We apologize for the confusion. Figure 2 shows the forward noising process. We used this for consistent comparison across EDM, GeoBFN, and SLDM on the same target molecule. That said, we also visualized the reverse sampling process (not shown here due to rebuttal limitations on links/media), and found consistent trends:

  • EDM produces very noisy intermediates throughout;
  • GeoBFN quickly settles, barely changing after early steps, underusing later steps;
  • SLDM shows less noisy and more progressive refinement.

Importantly, our sampling method is stochastic, allowing flexible rearrangement of atomic positions. This helps avoid rigid memorization and promotes diversity. To demonstrate this, we show that SLDM trained on GEOM-Drugs can generate novel molecules not seen in the training data:

SLDM (T=1000)SLDM (T=50)
Valid&Unique&Novel99.72%99.93%

As for annealed Langevin temperature in Eq.9, it enables fine-grained control over the stochasticity by injecting more randomness early (for exploration), and reducing it later (for precision), which we found helpful for balancing chemical stability and diversity.


If there are any further questions or clarifications needed, we would be glad to continue the discussion. If our responses have addressed your concerns or contributed to a clearer understanding of our work, we would deeply appreciate your consideration for a higher score or confidence, which would greatly support the visibility and potential impact of our work within the community!

评论

Dear reviewer 1HAM,

After considering the rebuttal to your review and the other reviews/rebuttals, how and why has this affected your position on this submission? Please reply with an official comment (not just the mandatory acknowledgement) reflecting your current view, any follow-up questions/comments etc.

Note the Aug 6 AoE deadline, make sure to respond in time for the authors to be able to submit a response if necessary.

评论

The reviewer thanks the authors for the thorough responses clarifying questions raised and adding additional supplementary evaluations. This reviewer's confidence in the work's acceptance has increased and the rating has been increased accordingly. No further clarifications or evaluations are needed for this reviewer.

评论

Dear reviewer,

We sincerely thank you for your time and effort in evaluating our work. We are delighted that our clarifications have resolved your concerns, and we truly value your engagement in this process.

Wishing you all the best!

审稿意见
5

This paper proposes Straight-Line Diffusion Models (SLDM) for 3D molecular generation. The authors design a special noise schedule for training diffusion models, which then can be used to generate novel molecules with time-annealing sampling. Experiments shows SLDM has noticeable advantage in generation speed and quality compared with other models.

优缺点分析

Strengths:

  1. Research on fast molecule generation is important for the AI for Science area as the generation speed of existing methods is still unsatisfactory.
  2. The paper is clearly written and easy to follow.
  3. SDLM indeed shows strong performance when sampling with only 50 steps. It seems very promising. Overall, SDLM could be a valuable contribution to the molecule generation community.

Weaknesses:

  1. Recently, fast sampling from diffusion models usually relies on ODE solvers instead of directly sampling from the SDE trajectory. Could the authors add ODE results for the experiments in Section 4? I wonder if using ODE solvers can get even faster generative model.
  2. It would be better if the authors can provide some qualitative examples of the generated molecules to help readers intuitively understand the generation quality.

问题

See Weaknesses.

局限性

N/A

最终评判理由

After reading the other reviews and rebuttals, I would like to thank the authors for their additional results and detailed explanations. They further solidify the methodology proposed in the paper, especially on the faster sampling speed. Therefore, I agree with the other reviewer's consensus opinion on accepting the paper. I changed my rating to 'accept' accordingly.

I encourage the authors to include these new discussions and quantitative results in the revised version as promised.

格式问题

N/A

作者回复

We sincerely thank the reviewer for the thoughtful comments. We will carefully revise the manuscript to address these points.

Fast sampling from diffusion models usually relies on ODE solvers instead of directly sampling from the SDE trajectory. Could the authors add ODE results for the experiments in Section 4? I wonder if using ODE solvers can get even faster generative model.

Reply: Indeed, ODE-based sampling is a common strategy to accelerate generation in diffusion models. In our work, this is taken into account when comparing with the baseline EquiFM, which leverages flow matching with multiple efficient ODE solvers. We have already compared with the best result of EquiFM in Figure 1 and Table 1, showing that SLDM outperforms in both efficiency and sample quality under comparable settings. To further address your suggestion, we will include EquiFM results in Table 2 for the conditional generation task on QM9. The following table summarizes the results (lower is better):

Propertyα∆ϵϵHOMOϵLUMOμCv
EquiFM2.415913375301.1061.033
SLDM1.464403203480.7970.745

It would be better if the authors can provide some qualitative examples of the generated molecules to help readers intuitively understand the generation quality.

Reply: Thank you for your constructive suggestion. We have randomly generated 40 molecules respectively using our trained models on QM9 and GEOM-Drugs in the unconditional setting, and visualized their 3D structures using PyMOL. These examples clearly demonstrate the structural plausibility and diversity of our generated molecules. Unfortunately, the rebuttal phase does not allow us to include external links or supplementary material. However, we will definitely include representative visualizations in the appendix of the revised manuscript to provide readers with a more intuitive understanding of the generation quality.


We sincerely thank the reviewer for recognizing the strong performance and contribution of our work. We've open-sourced our code at submission, as we genuinely hope our method can be useful to the community. We’d greatly appreciate a more favorable score to help increase the visibility and impact of this contribution! If there are any further questions or clarifications needed, we would be glad to continue the discussion.

评论

Thank you for addressing my concerns! I will keep my ratings unchanged.

评论

In light of the rebuttal and the other reviews/rebuttals, please edit this comment and update with reasoning for /why/ you keep the rating unchanged.

评论

After reading the other reviews and rebuttals, I would like to thank the authors for their additional results and detailed explanations. They further solidify the methodology proposed in the paper. Therefore, I agree with the other reviewer's opinion on accepting the paper. I changed my rating to 'accept' accordingly.

评论

Dear reviewer,

Thank you very much for carefully reading the reviews and rebuttals. We have carefully recorded all points raised during rebuttal and will incorporate them into the revised version to ensure better rigor and clarity. Thank you again for your time and expertise.

Wishing you all the best!

最终决定

The paper proposes (near) straight-line diffusion models for 3D molecule generation. The approach relies on a relatively small modification to the forward process compared to e.g. EDM that adds a small amount of noise that is constant across time. The method performs competitively at both high and moderate number of function evaluations across a range of datasets.

The reviewers found the approach interesting and reached a consensus recommending acceptance, and I find no reason to disagree. When revising please pay attention to add the new discussions, results, and request for clarifications that were brought up by the reviewers.