Empower Structure-Based Molecule Optimization with Gradient Guided Bayesian Flow Networks
Gradient-based BFN for targeted drug design
摘要
评审与讨论
In this paper, the authors propose a method that leverages gradient guidance in the context of structure-based drug design. In particular, they augment MolCraft (that uses Bayesian Flow Networks as generative model for structure-conditioned ligand design) to be compatible with gradient guidance according to some (learned) energy function. The guidance is applied on both continuous (atom coordinates) and discrete (atom types) variables. The authors also propose a a backward correction strategy for more effective optimization. They show good results on CrossDocked2020 benchmark and on sub-structure conditioned generation.
Update after rebuttal
I thank the authors for their rebuttal. However my main concerns were not really addressed, mainly lack of novelty (yet another classifier guidance to a generative model very similar to diffusion) and irrelevant experiments (only trained on crossdocked on properties that are totally irrelevant). If the authors argue that this is a general framework for classifier guidance on BFN, results should have been shown on other datasets/tasks (molecular data or not) and other properties. One model trained in one dataset (known to be vey flawed) on irrelevant properties does not show empirically that this is a "general method". Therefore, I will keep my rating.
给作者的问题
See above.
论据与证据
- The paper is not very well written an difficult to follow. It would be helpful to have better general overview of BFN/MolCraft models on the main paper for more clarity. The paper also misses a lot of experimental details on how guidance is done, making the understanding of experimental results a bit tricky.
- The main contribution of the paper is to show that it is possible to do gradient guidance with Bayesian Flow Networks. This is not particularly surprising, given the relation between BFNs and diffusion models/flow matching (Xue et al. ICML24).
- Moreover, the authors show results on only a single dataset, that has been highly overfitted in the last few years. It would be nice to see results of guidance in the context of BFN in other datasets/tasks to show that the propose model really works in practice (either molecular datasets or even other modalities like images, other molecules, language, etc).
- A lot of design choices are needed to be made (guidance temperature, the backward correction, the property predictors, training/sampling hyperparameters etc). It's not trivial to conclude if this approach only works because it has been overly finetuned for the dataset or if this approach would work in other settings.
- It seems to me that the hyperparmeters of the model (and there are many of them) have bee tuned on the test set of crossdocked.
方法与评估标准
- The proposed method (gradient guidance on top of MolCraft) makes sense.
- The authors show results on a single dataset (CrossDocked2020), which has been highly studied/overfitted in the last few years.
- It is well known that this dataset nor the metrics (eg docking score, QED, SA, etc) are very relevant for actual drug design. Some other metrics (like those from PoseCheck/PoseBuster papers) provide a bit more insight into quality of the molecules. I think the results from PoseCheck should be displayed on the main document instead of on the appendix. From Table 9, we can see that MolJO has similar Posecheck metrics as in MolCraft, which hints to the fact that the gradient guidance does not improve the quality of the generated conformations.
理论论述
The theoretical claims seem coherent, but I did not went through the details
实验设计与分析
- I feel that comparing the proposed method with other approaches that do not do guidance is not very informative. The most relevant comparison to do is between the proposed approach and MolCraft (since this is a version of the model w/o gradient guidance).
- It would also be nice if we would see the improvement of gradient guidance in tasks/datasets other than CrossDocked. This could be other molecule datasets (either conditioned on target pocket or not) or other modalities (proteins, images, or anything else where BFN has been applied).
- The properties optimized in this paper are not relevant for drug discovery and it is difficult to say if any contribution on the paper would actually reflect any improvement on real use-cases.
- With respect to inference time: What is the computational overload of the proposed gradient guidance? How does the inference time compares with MolCraft?
补充材料
I quickly skimmed the supplementary material on the appendix of the manuscript. I did not go through details of the provided source code.
与现有文献的关系
This works proposes to do gradient guidance on top of Bayesian Flow Networks (BFN). In particular, they build on top of MolCraft, a BFN-based generative model for pocket-conditioned ligand generation. Structure-based drug design an important problem on drug discovery, however, the dataset used by ML practitioners---as well as the metrics used to measure performance on this dataset---are know to not be very useful in practice.
遗漏的重要参考文献
N/a
其他优缺点
See above.
其他意见或建议
n/a
伦理审查问题
n/a
We sincerely appreciate the reviewer's thorough reading and insightful feedback, which have helped us identify areas for improved clarity and presentation. We shall address each point in our responses below, and we welcome further questions.
Q1: Explanation of BFN Fundamentals
We thank the reviewer for highlighting the need for improved clarity. We will enhance the paper's readability by providing a better overview and more detailed explanations of our guidance approach.
For experimental details of guidance, our gradient consists of pretrained MolCRAFT and plug-and-play energy functions. We describe the guided sampling in Algorithm 1 and Appendix D.1, and we will make them clearer.
Q2: Novelty
Thanks for raising this important question. While the work of Xue et al. establishes connections between BFNs and SDEs, our contribution goes beyond merely showing the theoretical feasibility. We derive a principled approach to gradient guidance specifically within the BFN framework, rather than reducing BFNs to SDEs and applying existing techniques.
Our contributions lie in both the methodology (deriving gradient guidance within BFNs and proposing a generalized sampling strategy) and in implementing and evaluating SBMO applications. (1) Methodologically, we show how gradient works through guided Bayesian update, offering a unique perspective as distinct from discretizing an SDE, and we believe contextualizing the guidance within BFN is a novel contribution. (2) Practically, the empirical results in Table 3 also show MolJO's distinct advantage compared to simply reducing BFN to SDE.
Q3: Limited Dataset Evaluation
We appreciate this concern. We add the evaluation on PoseBusters V2 test set (180 out of 384 complexes, after excluding those with sequence identity > 30% or with non-standard residues) as a held-out test. We also report the PoseBusters passing rate (PB-Valid), showing that MolJO's improvements are not dataset-specific or the result of overfitting.
| PB-Valid | RMSD < 2 | PB-Valid & RMSD < 2 | Vina Score Avg | Vina Score Med | Vina Min Avg | Vina Min Med | Vina Dock Avg | Vina Dock Med | SA | QED | Connected | Success Rate | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MolCRAFT | 56.4% | 43.6% | 26.5% | -6.93 | -6.95 | -7.18 | -7.14 | -7.77 | -7.76 | 0.65 | 0.45 | 93.9% | 20.6% |
| MolJO | 68.0% | 53.1% | 36.3% | -7.74 | -7.73 | -8.16 | -8.09 | -8.66 | -8.69 | 0.74 | 0.57 | 95.7% | 43.5% |
Q4: Hyperparameter Choices and Potential Overfitting
As mentioned in Q3, the consistent performance on both datasets confirms that MolJO generalizes well beyond the specific choices for CrossDock. Moreover, they remain robust across reasonable ranges. For property predictors or training / sampling, we used the same architecture and the same BFN hyperparameters () as MolCRAFT without specifically tuning for the task.
Q5: PoseCheck Metrics in Main Paper
We agree that the PoseCheck metrics should be presented in the main text rather than the Appendix for better accessibility, and we will revise our manuscript accordingly.
Though not directly incorporating strain energy as an objective, MolJO indeed improves the conformation quality as suggested by Figure 9, Appendix G, where our CDF is consistently above that of MolCRAFT. Furthermore, the evaluation on PoseBusters (see Q3) reveals notable improvements in PB-Valid.
Q6: Baseline Comparisons
We appreciate this excellent point. Our ablation studies in Section 5.4, Table 3 indeed provide direct comparisons, showing how each component contributes to performance improvements. Furthermore, we're expanding our comparison to include additional optimization baselines such as DecompDPO, where MolJO maintains competitive. Please refer to Q5 in Reviewer fSvs.
Q7: Relevance of Optimized Properties
We appreciate the reviewer's expertise in drug discovery. Our primary contribution is a general optimization framework whose efficacy can be validated through these in-silico metrics. Notably, we've observed that improvements correlate with enhanced key interactions and PB-Valid that are relevant to drug design.
Q8: Computational Overhead
Compared to MolCRAFT, MolJO requires approximately 2× or longer inference time in our updated study. The computational overhead depends on the complexity of the energy functions employed. MolCRAFT generally takes ~22s, previous MolJO with 9-layered energy proxies took ~146s, and we have experimented with 4-layered proxies that take ~45s. MolJO can be further accelerated with an efficient strategy where gradient is applied only at selected timesteps rather than at every step [2], which we leave for future work. We thank the reviewer for motivating this analysis, as it led to more efficient implementations.
[1] A Periodic Bayesian Flow for Material Generation.
[2] Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models.
This paper propose a gradient-based molecule optimization framework for the SBDD task, which in experiment achieves state-of-the-art performance on CrossDocked2020 benchmark. Besides, it extend MolJO to a wide range of optimization settings, including multi-objective optimization and challenging tasks in drug design such as R-group optimization and scaffold hopping, further underscoring its versatility.
给作者的问题
- Although the optimization in this work is based on BFN, in the case of Gaussian BFN (where coordinates are treated as variables), the optimization process is structurally similar to guided diffusion. Could you explain the similarities and differences between the two in terms of their formulation and underlying principles?
- The guided term’s energy E —how is it obtained? It would be helpful to introduce this in the beginning of Methods for clarity.
- Moreover, this guidance mechanism bears some resemblance to energy-based compositional diffusion [6], as it can be seen as a superposition of sampling across two energy landscapes. However, since this work is BFN-based (modeling in parameter space), its physical interpretation is less explicit. Could this aspect be further discussed?
- In Claim 2 of Section 5.3: “Optimized molecules form more key interactions for binding,” while this supports the idea that interaction expansion allows the model to explore a broader chemical space, in many real-world drug optimization tasks, certain lead compounds possess specific key interactions (e.g., π-π stacking or hydrogen bonding). During optimization, it is often desirable to preserve these critical pharmacophoric interactions rather than modify them arbitrarily. For example, in cases like 1A2G and 2PC8, the optimized molecules retain the original interactions while expanding upon them, which enhances their practical applicability. How do the authors perceive this issue in the context of their model’s optimization strategy?
If my suggestions can be adopted and the questions I raised can be clarified, I will consider appropriately increasing my rating.
论据与证据
Yes, most of the claims are clear and supported by the evidence.
方法与评估标准
The method is mainly operated in the probability parameter space instead of the raw data space with the paradigm of BFN. The application is wide and appropriate for the task.
理论论述
The theory and proposition in the paper follows BFN and others such as GeoBFN and MolCRAFT. The formulation of update of parameters is correct.
实验设计与分析
The experiment is complete, while some of the methods are missing. We hope several baseline should be included for more comprehensive comparison, such as VoxBind [1], DiffBP [2] and D3FG [3], in which the metrics can be found in the recently proposed benchmark [4]. Besides, for the optimization methods, can DecompDPO be a competitor? If the comparison is hard to conduct, please explain why.
Finally, in some specific scenarios like fragment growing and scaffold hopping, CBGBench [4] also take these tasks into consideration, please discuss related work and appropriately include evaluated baselines in the benchmark to demonstrate the superiority of the proposed method.
[1] https://arxiv.org/abs/2405.03961 [2] https://pubs.rsc.org/en/content/articlelanding/2025/sc/d4sc05894a [3] https://arxiv.org/abs/2306.13769 [4] https://arxiv.org/abs/2406.10840
补充材料
Yes, I have reviewed most of them. Specifically, the experimental-related parts are reviewed in detail.
与现有文献的关系
The concept related to probabilistic models has been mentioned in BFN. The task related to SBDD has been partially adopted in MolCraft. However, I have not previously observed molecular optimization based on BFN in prior work.
遗漏的重要参考文献
To enhance the completeness of the paper in the Pocket-Aware Molecule Generation section, it is essential to include VoxBind and DiffBP to provide a more comprehensive overview of recent advancements in the field. Additionally, in flexible SBDD, the recently proposed FlexSBDD[5] represents a state-of-the-art (SOTA) approach in SBDD-related drug design. It is recommended to include it in the Related Work section.
For the Gradient-Based Molecule Optimization section, additional optimization methods, such as DecompDPO and various 2D-based approaches, should be incorporated. This will facilitate a smoother introduction for readers unfamiliar with molecule optimization and help contextualize the proposed approach within the broader landscape.
In the experimental evaluation, particularly in the tasks of scaffold hopping and fragment growing, incorporating CBGBench for reference would be beneficial. This would not only provide a standardized benchmark for assessing performance but also help clarify the significance of these tasks in molecular optimization.
其他优缺点
All of them are listed and mentioned.
其他意见或建议
In conclusion, I suggest that:
- Add relevant related work and baselines, such as DiffBP, D3FG, and VoxBind, as these methods have been evaluated on previous benchmarks with established metrics. This will enhance the completeness of the paper.
- Include a discussion and reference to CBGBench in the constraint optimization section.
We sincerely appreciate the reviewer's thorough reading and insightful feedback, which have helped us improve clarity and presentation. We shall address each point in our responses below, and we welcome further questions.
Questions
Q1: Similarities and Differences between Gaussian BFN and Guided Diffusion
Thank you for this insightful observation. We explain the fundamental differences and structural similarities between Gaussian BFN and guided diffusion as follows:
- Key Differences: Our approach guides in parameter space () rather than in data space (). In the continuous case, exhibits lower input variance and therefore provides more informative signals for guiding the generative process toward the desired output. Guided diffusion typically steers the sampling process in the data space directly, while our BFN-based approach performs inference in the parameter space. Such parameter-space guidance arguably connects deeper to the final target properties due to lower input variance, which allows for more explicit Bayesian modeling of uncertainty and more reliable property optimization.
- Similarities: Both approaches use gradient information to steer the generative process toward desired outputs, operating for Gaussian distributed variables.
Q2: Regarding the Guided Term's Energy Function E
We thank the reviewer for suggesting this clarification. While we included this discussion in the Appendix, we agree it should be introduced earlier for clarity and will revise accordingly in revision.
The energy function forms part of a Boltzmann distribution over parameters . Although the target property is explicitly defined only at , Bayesian inference defines the parameter space for any timestep . This allows us to associate property values with every throughout the generative process, enabling prediction of time-dependent properties during intermediate optimization stages.
We train a predictor directly over the parameter space to estimate given at different accuracy levels. This approach proves effective as it provides a consistent guidance signal throughout the sampling process.
Q3: Connection to Energy-Based Compositional Diffusion
We appreciate the connection to [6], which introduced how to interpret diffusion models from the perspective of energy-based model (EBM), and thus applies the additivity from EBM to diffusions.
In our approach, the energy function can be viewed as the negative log-likelihood estimated by a conditional model for given properties. In practice, our method resembles classifier guidance since we train property predictors to serve as the energy function. However, our framework doesn't require explicit "labels" as conditions for controllable generation.
Q4: Preservation of Key Interactions
We appreciate the reviewer's expertise in drug design and the important point raised about preserving key pharmacophoric interactions. We fully agree that in real-world drug optimization, certain critical molecular substructures forming intermolecular interactions (e.g., π-π stacking, hydrogen bonding) should be preserved rather than arbitrarily modified.
Our MolJO framework actually addresses this concern by design. The framework allows for precise control over which molecular substructures should be modified and which should be preserved, while our energy function can guide the redesign of remaining substructures. This better aligns with practical applications such as scaffold hopping.
Other Suggestions
Q5: Additional Related Work and Baselines
We thank the reviewer for suggesting these important methods for comparison. We agree that including these baselines will enhance the completeness of our evaluation. We commit to including comprehensive comparisons with these methods in our revision.
We have cited the results from CBGBench (DiffBP, D3FG) and borrowed the samples to calculate the median Vina affinities and Success Rate based on the statistics available for VoxBind. For DecompDPO, we directly cite the numbers from their paper. It can be seen that our MolJO maintains superiority in optimizing the overall properties, reflected by its highest Success Rate (51.3%).
| Success Rate | Vina Score Avg | Vina Score Med | Vina Min Avg | Vina Min Med | Vina Dock Avg | Vina Dock Med | QED | SA | Div | |
|---|---|---|---|---|---|---|---|---|---|---|
| DiffBP | - | - | - | - | - | -7.34 | - | 0.47 | 0.59 | - |
| D3FG | - | - | - | -2.59 | - | -6.78 | - | 0.49 | 0.66 | - |
| VoxBind | 21.4% | -6.16 | -6.21 | -6.82 | -6.73 | -7.68 | -7.59 | 0.54 | 0.65 | - |
| DecompDPO | 36.2% | -6.10 | -7.22 | -7.93 | -8.16 | -9.26 | -9.23 | 0.48 | 0.64 | 0.62 |
Q6: CBGBench for Constraint Optimization
We appreciate the reviewer highlighting CBGBench. This is indeed an excellent work that sets up extensive experimental design for structure-based molecule optimization. We will add a thorough discussion of CBGBench, particularly in the context of constraint optimization tasks.
This paper proposes MolJO, a framework that jointly guides continuous 3d coordinates and discrete atom types of 3d molecules based on the geometry of the target protein pocket and one or more molecular property classifiers. The paper also proposes a backward correction strategy that corrects parameters of Bayesian update distribution based on the current optimized sample and parameters from previous timepoints. It is shown that MolJO outperforms existing methods for generating molecules with better docking scores and molecular properties such as QED, SA, notably methods that do not involve gradient-based guidance or do not do guidance over discrete atom types.
update after rebuttal
Thank you to the authors for providing explanations and running additional experiments for the Top-of-N comparison and error bars. It would also be good to include a Top-of-N comparison for TagMol in future versions.
For Table 3, does w/o guidance mean no guidance over both atom types and coordinates? If so, authors should include an ablation row where guidance is done only over coordinates and not atom types (equivalent to TagMol if I understand correctly), so that we can observe the impact of only turning on guidance over atom types (current row 4 of table 3), since this is a key argument being made by authors (and also TagMol + BC). Also, it doesn't seem like there is a huge difference between the docking scores of rows 1 and 4, which measures the impact of guidance--a p-value would be helpful here. Do authors have any intuition for why BC and guidance work synergistically?
I think the method is interesting but the clarity of the work can be significantly improved, I will keep my score as-is.
给作者的问题
[1] Why are the values different between Table 3 row 6 and Table 1 row 14? Based on the writing it seems like these should be the same settings; if they are different, then authors should clarify this in the writing.
[2] Can authors clarify experimental information like: how they restrict the size of generated molecules in Fig. 3; how classifier guidance SDE differs from "without and with guidance" settings in Table 3?
[3] What are "me-better" molecules? I couldn't find a definition in the paper.
[4] Are the gains in Table 1 rows 13 vs 14 only due to guidance over atom types, or also due to BC? Authors should revise the claims if the latter is the case, or ideally provide an ablation for this (run MolJO at k=1 or MolCRAFT with BC).
[5] Why don't authors do a top-of-N comparison for other methods?
论据与证据
The paper claims that gradient guidance is needed over discrete atom types in the setting of SBMO, since optimizing for molecular properties requires knowledge of the molecular topology, hence existing methods such as TAGMol suffer because they only do guidance over atom coordinates. MolJO proposes two novelties: gradient-based guidance over discrete atom types and backward correction.
-
One thing that is not clear to me: do the values in Tables 1-2 incorporate backward correction? The difference between TAGMol and MolJO in Table 1 (lines 13-14) is about the same or smaller than the difference between MolJO with BC vs MolJO without BC in Table 3 and Figure 8 in terms of docking score. Since the authors claim that their improvement over TAGMol is due to guidance over atom types, it would be helpful to clarify this.
-
The backward correction section is a bit hard to follow -- what exactly do authors mean by "correcting the past" if parameters from past timesteps (n-k) are being used to update p_U at the current n-th timestep? Also, is the goal of the method essentially variance reduction like in [1]?
-
The guidance component based on molecular properties is very important since this is one of the key problems authors are tackling but implementation of property-guidance is in the appendix. In my opinion this should be in the main paper.
[1] https://link.springer.com/article/10.1007/s10107-016-1030-6
方法与评估标准
Authors demonstrate their method to improve molecule generation on an established benchmark on both constrained and unconstrained optimization. Authors evaluate their method on three types of molecular properties (affinity, QED, SA) and their combinations. The proposal of gradient-based guidance over atom types for optimizing molecular properties is cool jointly with continuous coordinates is interesting.
理论论述
I have gone through the proofs in the main paper and they seem reasonable. Equations (11) and (12) are a bit hard to read, it might help to put the RHS all on one line instead of splitting it.
实验设计与分析
The experiments generally make sense and authors compare with a lot of baselines in Table 1 which is good. There are several things that I would like to clarify.
- Why don't the tables contain error bar?
- Why did authors only report top-10 for MolJO and not other methods?
- For Figure 3, how did authors restrict the size of generated molecules? The table seems to have different sizes for each method (also should this be called a Figure or a Table?)
- For the ablation in Table 3, should the values in the last row (row 6) match line 14 in Table 1? If not, what are the difference between Table 3, row 6 and Table 1, row 14?
- For Table 3, it's confusing to me why the SDE from Xue et al. is called "SDE with classifier guidance", but then authors do an ablation of methods without and without guidance. Can authors clarify what they mean by SDE classifier guidance?
补充材料
I have gone over sections D and F in the supplementary material.
与现有文献的关系
The paper tackles an important problem in the scientific community, which is generation of molecules based on the structure of a protein target site, while also optimizing for specific molecular properties, since molecule often have to meet several properties.
遗漏的重要参考文献
To my knowledge, the work is not missing essential references.
其他优缺点
Strengths:
- The combination of gradient-based guidance over discrete and continuous data types is an interesting application of BFNs.
- The empirical gains from the proposed backward-correction are notable based on the ablation study.
- Authors compare with a large amount of baselines and show gains on several metrics.
Weaknesses:
- The novelty of the backward correction sampling is not clear since authors write that Qu et al. implement the sampling for when k=n, and in Fig 8 write that the "strategy is robust within the range k \in (50, 200]" and n=200, so the need for a variable window size is not obvious to me since (if I understand correctly), all samples are generated over 200 steps?
- Some experimental information is unclear or not described in the main paper (listed above; importantly, information on property guidance).
其他意见或建议
- page 6: "fourthfold" --> "four-fold"?
We sincerely thank the reviewer for the careful reading and insightful feedback that helps improve the clarity and completeness of our work.
Questions
Q1: Table Value Inconsistencies
We apologize for the confusion. The values in Table 1 and 3 were obtained from different runs. For Table 3 (ablation studies), we sampled 10 molecules per pocket instead of 100 per pocket. We will clarify this in our revision.
Q2: Fig (Table) 3 Clarification
Thanks for bringing up this issue that helps improve our clarity. All models use the same size specifications from DecompDiff. The differences result from some methods having failed molecule generations (e.g., invalid, not connected). We apologize for the confusion and will correct this labeling (from "Figure" to "Table") in our revision.
For guided SDE, we followed Xue et al. and perform guidance by modifying the conditional score following common practices.
Q3: "Me-better" Definition
Thank you for noting this oversight. "Me-better" refers to molecules with improved specific properties over existing compounds—a term borrowed from traditional drug discovery [1]. We will add the definition in the revision.
[1] Me-Better Drug Design Based on Nevirapine and Mechanism of Molecular Interactions with Y188C Mutant HIV-1 Reverse Transcriptase. https://pubmed.ncbi.nlm.nih.gov/36364174/
Q4: Effect of Joint Guidance and Backward Correction
Thank you for this insightful question. The ablation study is actually presented in Table 3, which directly addresses this question by isolating the contributions of each component. We apologize for the confusion caused by our initial abbreviation "w/ (guidance)" (MolJO-based) and "w/o (guidance)" (MolCRAFT-based) that can be misinterpreted as w/ or w/o (BC). We will improve the clarity in our revision.
As the reviewer suggested, we evaluated "w/ guidance, Vanilla" (row 4) that corresponds to MolJO at k=1 (guidance without BC), and "w/o guidance, BC" (row 3) that corresponds to MolCRAFT with BC. Table 3 shows that both components contribute to performance gains. Notably, the combined improvement exceeds the sum of individual improvements, suggesting these components work synergistically.
Q5: Top-of-N Comparison
Thanks for the great suggestion. We will add top-of-N comparisons for all methods in our revision, which generally shows what the "concentrated space" for desirable drug-like candidates looks like for generative models. Our method shows the best Success Rate (70.3%), indicating better optimization efficiency.
| Method | Success Rate | Vina Score Avg | Vina Min Avg | Vina Dock Avg | QED | SA | Div |
|---|---|---|---|---|---|---|---|
| AR | 19.1% | -6.71 | -7.12 | -7.81 | 0.64 | 0.7 | 0.6 |
| Pocket2Mol | 40.5% | -5.8 | -7.18 | -8.32 | 0.67 | 0.84 | 0.59 |
| FLAG | 9.6% | 50.37 | 6.27 | -6.57 | 0.74 | 0.78 | 0.71 |
| TargetDiff | 32.6% | -7.06 | -8.1 | -9.31 | 0.64 | 0.65 | 0.67 |
| DecompDiff | 32.1% | -5.78 | -6.73 | -8.07 | 0.61 | 0.74 | 0.61 |
| MolCRAFT | 55.0% | -7.54 | -8.4 | -9.36 | 0.65 | 0.77 | 0.63 |
| IPDiff | 34.6% | -8.15 | -9.36 | -10.65 | 0.6 | 0.62 | 0.69 |
Additional Clarifications
Q6: Tables 1-2 Component
Thank you for highlighting this potential source of confusion. Yes, Tables 1-2 results include backward correction. To clarify, our contribution over TAGMol is two-fold: (1) We derived joint guidance over atom types and coordinates within the BFN framework. (2) We proposed the BC sampling algorithm that further improves optimization performance. As described in Q4, the results in Table 3 demonstrate that both contributions are significant. In our revised manuscript, we will make these distinctions clearer to avoid confusion.
Q7: Backward Correction Explanation & Novelty
We acknowledge that "correcting the past" is potentially confusing terminology. The correction applies to timesteps [n-k, n), while preserving parameters from [0, n-k).
As for its novelty, we develop a more flexible sampling strategy, establishing a sliding window approach that generalizes previous methods and explores a more nuanced control of variance. We appreciate the reviewer connecting this to variance reduction techniques, and we will investigate this connection further in our future work.
Q8: Error Bars
Thanks for the advice! We report the error bars as 95% confidence intervals for our main result in Table 1, and will add it to the Appendix.
| Vina Score | Vina Min | Vina Dock | QED | SA | |
|---|---|---|---|---|---|
| AR | 0.066 | 0.049 | 0.082 | 0.004 | 0.003 |
| Pocket2Mol | 0.063 | 0.058 | 0.097 | 0.003 | 0.002 |
| TargetDiff | 0.172 | 0.102 | 0.075 | 0.004 | 0.003 |
| FLAG | 0.778 | 0.525 | 0.142 | 0.003 | 0.002 |
| DecompDiff | 0.060 | 0.048 | 0.073 | 0.004 | 0.003 |
| IPDiff | 0.141 | 0.088 | 0.072 | 0.004 | 0.003 |
| MolCRAFT | 0.122 | 0.070 | 0.097 | 0.004 | 0.003 |
| DecompOpt | 0.415 | 0.210 | 0.528 | 0.011 | 0.006 |
| TAGMol | 0.175 | 0.088 | 0.135 | 0.004 | 0.003 |
| Ours | 0.136 | 0.078 | 0.083 | 0.003 | 0.003 |
Q9: Presentation & Typos
We thank the reviewer for pointing these out, and we will revise our manuscript as requested.
The paper proposes MoIJO, a gradient-guided framework for SBMO. The key contributions are:
Joint gradient guidance over both continuous (coordinates) and discrete (atom types) modalities via Bayesian Flow Networks, avoiding modality inconsistency.
Backward correction strategy with a sliding window to balance exploration-exploitation.
SE(3)-equivariance preservation through energy function design.
Experiments on CrossDocked2020 show SOTA results. The method also generalizes to multi-objective optimization and drug design tasks.
给作者的问题
How does the Taylor expansion in Proposition 4.1 handle highly non-convex energy landscapes? Would higher-order terms significantly affect guidance?
论据与证据
1 The experiments lack statistical significance tests (e.g., p-values), making the improvement questionable.
方法与评估标准
1 The molecule size bias is not fully addressed. Larger molecules inherently have better Vina scores (Fig. 5), but MoIJO’s superiority on size-controlled subsets (Table 4) is only briefly discussed.
理论论述
1 The Taylor expansion (Eq. 18) assumes E(θ,t) is locally linear, which may not hold for complex energy functions. The approximation error is unquantified.
2 The SE(3)-equivariance proof assumes protein CoM is zero, but how do you keep this for real-world pockets?
实验设计与分析
1 The reported RMSD ratio is based on non-symmetry-corrected values (Page 21), which may overestimate pose consistency.
2 While MoIJO outperforms baselines in strain energy (Table 9), the absolute energy values (163 kcal/mol) are still higher than reference molecules (114 kcal/mol).
补充材料
All the supplementary materials has been reviewed.
与现有文献的关系
The work builds on BFNs and gradient-guided diffusion.
遗漏的重要参考文献
GraphVF[1] Combines SE(3)-flows and GNNs for joint coordinate-type optimization. [1] Sun, F., Zhan, Z., Guo, H., Zhang, M., & Tang, J. (2023). GraphVF: Controllable Protein-Specific 3D Molecule Generation with Variational Flow. arXiv [q-Bio.BM]. Retrieved from http://arxiv.org/abs/2304.12825
其他优缺点
The BFN derivation (Appendix A) is overly condensed; a step-by-step example would improve readability.
其他意见或建议
Suggestion: Add a schematic diagram of the backward correction process
We sincerely appreciate the reviewer's thorough reading and insightful questions, which have helped us identify areas for improved clarity and presentation. We shall address each point in our responses below as well as our revised manuscript, and we welcome further questions.
Q1: Smoothness of Energy Landscapes
Thanks for raising the insightful question about the limitations of first-order Taylor expansion in our approach. We have followed established guided diffusion methodology [1] which assumes that with sufficiently small step sizes, the changes in the energy landscape between consecutive steps remain modest. We acknowledge that this approach inherently assumes the energy landscape is relatively smooth in the local region of expansion, which may not hold universally for complex energy functions.
As the reviewer correctly pointed out, higher-order terms could potentially provide more accurate gradient estimates, which is an active area of research in guided diffusion models [2]. However, higher-order methods introduce computational overhead and potential numerical instability during sampling, which we will leave for future work. While the empirical results demonstrate the practical utility of our approach despite this approximation, we plan to further develop adaptive guidance schemes that adjust based on estimated local curvature.
[1] Diffusion Models Beat GANs on Image Synthesis.
[2] Inner Classifier-Free Guidance and Its Taylor Expansion for Diffusion Models.
Q2: BFN Derivation Clarity
We will enhance our presentation in the revision with a more detailed, step-by-step explanation of the Bayesian Flow Network derivation.
Q3: Schematic Diagram of Backward Correction
Thank you for this suggestion. In Figure 1D, we have a schematic diagram illustrating the backward correction process, and we will continue to make it clearer in our revision.
Q4: Statistical Significance
We acknowledge the importance of statistical validation. We have conducted pairwise t-tests comparing our guided Backward Correction against both Vanilla and guided SDE. The results show statistically significant improvements (p<0.05), and we will add it to our revision.
| vs. Vanilla | vs. SDE | |
|---|---|---|
| Vina Score | 2.63E-13 | 2.55E-19 |
| Vina Min | 3.31E-31 | 6.48E-19 |
| Vina Dock | 2.79E-35 | 7.84E-4 |
| SA | 8.10E-115 | 2.10E-50 |
| QED | 1.98E-26 | 1.82E-12 |
Q5: Molecule Size Bias
The reviewer makes an excellent point regarding molecule size bias, which is a crucial issue for SBDD. We apologize for the confusion caused by Table 4, and we would like to clarify that rather than solving the size bias issue directly, our primary goal in size-controlled experiments was to demonstrate that our guidance approach effectively improves molecules across the size ranges, and consistently outperforms baselines across different molecular sizes.
For Table 4, we will improve our presentation by calculating optimal scores separately for each size (Ref & Large), which will better highlight that our improvements stem from enhanced molecular quality rather than size bias.
Q6: SE(3)-equivariance with Real-world Pockets
The reviewer correctly identifies a limitation for unknown protein pockets, where reference positions are required to clip and obtain the pocket region. We acknowledge this as a limitation of current SBDD methods and certainly worth exploration. For known protein pockets, we simply subtract the centroid to ensure the center of mass is at the origin.
Q7: RMSD Ratio Without Symmetry Correction
Thank you for this important observation. Here we followed the calculation from MolCRAFT for fair comparison with existing methods. This approach calculates symmetry-corrected RMSD between Vina Docked PDBQT and generated molecules when possible, but falls back to non-symmetry-corrected values in cases where symmetry correction cannot be applied. We appreciate the reviewer highlighting this important consideration for accurate assessment of molecular pose consistency, and we will clarify the detail in our revision and investigate the issue further in the future.
Q8: Strain Energy Higher Than Ref
Thank you for this important observation. Although our results are generally within reasonable ranges for computational generation, we agree there is room for improvement. Future work could incorporate additional chemical prior knowledge to further reduce strain energy. We appreciate the reviewer highlighting this point, as it identifies an important direction for continued refinement of our approach.
Q9: Missing Citation to GraphVF
We sincerely thank the reviewer for bringing GraphVF to our attention. This work indeed makes valuable contributions by encoding different modalities in latent space for joint coordinate-type optimization. We agree that the unified space represents a promising direction. We will update our manuscript to include this important reference and add relevant discussion on how it relates to our work.
This study addresses the problem of structure-based molecular optimization. In this setting, the task involves optimizing over discrete objects, which makes it challenging to apply molecule generation methods based on gradient guidance. This paper proposes a method to overcome this challenge. All reviewers agreed on the practical importance of the problem addressed in the paper, and the effectiveness of the proposed method is clearly demonstrated through experiments on real benchmark datasets. During the rebuttal period, the authors provided additional experimental results, which successfully resolved most of the reviewers' concerns.