4.7

/10

Poster3 位审稿人

最低3最高7标准差1.7

3.3

置信度

正确性3.0

贡献度2.7

表达3.0

NeurIPS 2024

TurboHopp: Accelerated Molecule Scaffold Hopping with Consistency Models

Kiwoong Yoo,Owen Oertell,Junhyun Lee,Sanghoon Lee,Jaewoo Kang

OpenReview PDF

提交: 2024-05-15更新: 2025-01-21

TL;DR

Fast, and efficient E(3)-equivariant scaffold-hopping model utilizing consistency models for rapid generation additionally powered by RL

摘要

关键词

Scaffold HoppingConsistency ModelsDiffusion Models3D Structure-Based Drug DesignReinforcement LearningDrug DiscoveryGenerative Models

评审与讨论

审稿意见

评分: 3置信度: 52024-07-05

This paper presents TurboHopp, an accelerated pocket-conditioned 3D scaffold hopping model designed to enhance the efficiency and speed of drug discovery. It addresses the slow processing speeds of 3D-SBDD generative models by offering up to 30 times faster inference speed while maintaining or improving on key metrics like drug-likeness, synthesizability, connectivity, and binding affinity. Additionally, it incorporates reinforcement learning to further optimize molecule designs, demonstrating its potential in various drug discovery scenarios.

优点

Accelerated Generation: TurboHopp's inference speed is 5-30 times faster than that of DDPM-based models, greatly improving the efficiency of drug discovery. Combination with Reinforcement Learning: By leveraging the fast inference speed of consistency models, TurboHopp applies reinforcement learning to 3D-SBDD-DMs, enabling fine-tuning of generative models based on specific objectives, such as improving binding affinity or reducing steric clashes, for more refined molecule design.

缺点

The consistency model and reinforcement learning are well-established techniques, each with a robust body of research. The integration of reinforcement learning into diffusion models has been explored in the literature. In this paper, the authors concatenate these two methodologies for the purpose of scaffold hopping, without explicitly detailing any novel strategies or innovations in their implementations. The empirical comparison can be done in a more thoroughly by comparing with other latest state-of-the-art drug discovery algorithms.

问题

Could the authors provide a more detailed analysis of the trade-off between inference speed and the quality of the generated molecules?
How does the novelty exceed 1 at Table 1?
The part that reduces time is mainly due to the consistency model. How did the author transfer the consistency model? What improvements have been made after the transfer?
What are the advantages of this model over SBDD models, such as Targetdiff [1] and Decomdiff [2]?
In terms of performance comparison, the enhancement provided by reinforcement learning to the model is significant. Can it be understood that surpassing DiffHopp primarily relies on reinforcement learning? Is it reasonable to use evaluation metrics directly for optimization? Can you conduct an ablation study focused solely on the integration of reinforcement learning?

[1] Jiaqi Guan, Wesley Wei Qian, Xingang Peng, Yufeng Su, Jian Peng, and Jianzhu Ma. 3d equivariant diffusion for target-aware molecule generation and affinity prediction. arXiv, 2023.

[2] Jiaqi Guan, Xiangxin Zhou, Yuwei Yang, Yu Bao, Jian Peng, Jianzhu Ma, Qiang Liu, Liang Wang, and Quanquan Gu. Decompdiff: diffusion models with decomposed priors for structure-based drug design. arXiv, 2024.

局限性

The authors state that they have improved generation efficiency, but they do not compare their approach with methods that generate molecules from scratch.

作者回复

2024-08-07

First, we thank the reviewer for the great insights for authors to consider. As stated in your limitations, we acknowledged that comparison with de-novo generative molecules were important. However, since direct comparison would be unfair, we had to build repurposed versions of de-novo models which we plan to additionally release. Please refer to the global rebuttal section for further details.

Q1. Regarding inference speed and generation quality (Table 1 main, Table 3,5,6,7,8,9 as additional reference)

A1. In order to explain the relationship between speed and quality, authors extended previous Table 1 that compares DiffHopp, TurboHopp and variations. As shown in Table 1, fewer steps in TurboHopp leads to overall poor quality compared to versions with more steps. Among the 3 variants of TurboHopp, TurboHopp100 seemed to have the best efficiency regarding sample quality as well as generation time. Table 1 as well as Table 3 show that our model is efficient with comparable metrics compared to diffusion-based models despite the reduce in time . However, in Table 5,6,7,8,9, where we compared geometric properties with existing conditional diffusion models, we see some geometric properties remain a property to improve. In future research if we train the model to learn bond distributions, this may resolve this issue.

Q2. Novelty issue

A2. We apologize if we confused the reviewer. Novelty is capped at one and the digits after the plus-minus sign is the standard deviation.

Q3. Consistency models and improvements

A3. We did not train using consistency distillation but trained a consistency model from scratch using consistency training [3]. We referred to improved techniques used for consistency training as well and found that MSE loss instead of psuedo-huber loss is better and that training without skip connection in the model is more stable [4] . In addition, we had to change the original consistency model so that it is suitable for conditional molecule generation. It had to take account conditions(protein, functional groups etc.) as well as expand the model to be multimodal and SE(3) equivariant. Overall,we believe our framework can be broadly adapted to many of the molecule generative diffusion models existing and hopefuily improve sampling efficiency.

Q4. Advantages over de-novo conditional generative diffusion models(Algorithm 1, Table 3)

A4. Scaffold-Hopping aims at finding novel scaffolds that connect key functional groups (interacting with protein residues) while de-novo generative models focus on building the whole molecule given a protein pocket. Both models are, SBDD models, and they both aim on building molecules that have great potential in finding novel, potent molecules. However, because of the lack of 3D-binding complex data, it is a realistically hard task for even advanced models to learn the vast chemical space of potent molecules. Therefore, by conditioning the functional groups with potential interactions, we can reduce the chemical space the model should learn, which leads to more efficient learning and sampling(Figure 1, paper). This is additionally powered by the speed boost of our consistency model in allowing 1) faster, efficient generation which is often requested in the pharmaceutical domain and 2) faster optimization in certain molecular metrics which opens possibilities of applications in human-in-the-loop model optimization.

Model structure-wise, straight comparison between de-novo SBDD models and scaffold-hopping models is unfair since for input conditions, scaffold-hopping models have an extra ligand condition to consider during generation. This may be more harder for the model to learn since there's an extra constraint, but on the flip-side also mean less chemical space for the model to learn. We do, however, compare inpainting variations of de-novo conditional generative diffusion models which we have additionally built for [1] and [2]. Scaffold hopping can be seen as an inpainting task, where models trained on generating de-novo molecules treat the scaffold as a missing piece that needs to be filled in, while the known functional groups of the molecule are provided as context. In a gist, as shown in Table 3, we have better outcomes in overall molecular metrics. Also, inpainting variations of SBDD models are too slow to practically use compared to our model. For more information, please refer to the global rebuttal.

Q5. Questions regarding RL performance(Table 1,2)

A5. The authors agree that there is a significant impact of reinforcement learning. However, we do not believe that surpassing DiffHopp primarily relies on RL. In particular, by leveraging the consistency model’s ability to give a number of high quality predictions within significantly less steps, we can use metric sampling to achieve competitive results(Table 1). However, we also showed that the inclusion of RL provided significant gains over diffusion based models (Table 2). To counteract reward hacking in standard docking tasks, we followed the methods of [5], using a combination of QED, synthesizability and docking scores to mitigate this issue (Equation 11). In future research, we plan to create a more robust reward function aligned to better tackle reward hacking further. While it is true that to some extent we are overoptimizing for these metrics, we found that many metrics excluded from the reward function were able to maintain quality. This hints that the molecules were not being overoptimized. See the response for reviewer Cq9Y for more information

[3] Song, Yang, et al. "Consistency models." arXiv preprint arXiv:2303.01469 (2023).

[4] Song, Yang, and Prafulla Dhariwal. "Improved techniques for training consistency models." arXiv preprint arXiv:2310.14189 (2023).

[5] Ghugare, Raj, et al. "Searching for high-value molecules using reinforcement learning and transformers." arXiv preprint arXiv:2310.02902 (2023).

审稿意见

评分: 4置信度: 32024-07-12

This paper proposed a pocket-conditioned 3D molecular scaffold hopping model based on the well-established consistency models. The framework is superior in terms of inference speed. Besides, the authors also proposed a corresponding RL method to fine-tune the model towards generating molecules with desirable properties. The experimental results show the effectiveness of the proposed approach compared with DiffHopp.

优点

This work first introduced consistency models to molecular scaffold hopping and achieved promising results.
The evaluation was done from various perspectives, including connectivity, QED, SA, Vina, etc.
Introducing RL for optimizing the generated molecules towards desired properties is useful in practice.

缺点

More baselines are needed. There are also some other methods for scaffold hopping, such as [1,2,3], etc. Comparison with these methods is necessary to show the significance of the proposed method in the practice of drug discovery.
Some related works are missing. For example, this work introduced Reinforcement Learning for Consistency Models to improve the properties of generated molecules. There are related works in the field of diffusion models for molecular science that utilize the similar idea. For example, [4] use a RL method (e.g., actor-critic) to fine-tune the diffusion model to generate molecules with higher binding affinity, and [5] also uses RL-like methods to improve the quality of sampled docking poses in the protein-ligand docking task.
Lack of some ablation studies. The effectiveness of some proposed modules is not clear, e.g., the metric-based sampling methods.
Though many evaluation metrics are utilized in this work. some are still needed. For example, the geometric properties (e.g., bond lengths, bond angles, and torsion angles) are needed to be checked.

References:

[1] Hu, Chao, Song Li, Chenxing Yang, Jun Chen, Yi Xiong, Guisheng Fan, Hao Liu, and Liang Hong. "ScaffoldGVAE: scaffold generation and hopping of drug molecules via a variational autoencoder based on multi-view graph neural networks." Journal of Cheminformatics 15, no. 1 (2023): 91.

[2] Yu, Yang, Tingyang Xu, Jiawen Li, Yaping Qiu, Yu Rong, Zhen Gong, Xuemin Cheng et al. "A novel scalarized scaffold hopping algorithm with graph-based variational autoencoder for discovery of JAK1 inhibitors." ACS omega 6, no. 35 (2021): 22945-22954.

[3] Zhou, Xiangxin, Xiwei Cheng, Yuwei Yang, Yu Bao, Liang Wang, and Quanquan Gu. "DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization." arXiv preprint arXiv:2403.13829 (2024).

[4] Zhou, Xiangxin, Liang Wang, and Yichi Zhou. "Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process." arXiv preprint arXiv:2403.04154 (2024).

[4] Corso, Gabriele, Arthur Deng, Benjamin Fry, Nicholas Polizzi, Regina Barzilay, and Tommi Jaakkola. "Deep confident steps to new pockets: Strategies for docking generalization." ArXiv (2024).

问题

See the weaknesses.

局限性

The author have discussed the limitations.

作者回复

2024-08-06

We first thank the reviewer for the constructive feedbacks regarding additional baselines/metrics for the authors to consider. Please refer to global rebuttal as well as PDF attached!

Q1. Other scaffold-hopping baselines (Table 3,4)

A1. Thank you for adding valuable information regarding baseline models. Regarding the first two models the reviewer mentioned: Scaffold-GVAE[1], GraphGMVAE[2] are both VAE models built for scaffold hopping. DecompOpt[3] is a recently published diffusion-based model for molecular optimization and has scaffold-hopping related experiments with results. Baselines regarding Decompopt are in the global rebuttal section.

1) Comparison with VAE-based models (Table 4)

Unfortunately, most VAE based scaffold-hopping models are either SMILEs-based which makes comparison with our model difficult. Also, the codes/data for most of them are missing. The authors of ScaffoldGVAE[1] provide a trained model, but the vocabulary necessary for tokenization, encoding, and decoding is missing. Consequently, we couldn't use the model with the provided checkpoint alone. To properly implement ScaffoldGVAE, we would need to generate a new vocabulary and retrain the model using the 1.9 million ChEMBL compounds mentioned in their paper. Given the limited time frame for the review process, we found it challenging to experiment with multiple baselines that require such extensive setup and training. To ensure fairness in terms of training data, we decided to train ScaffoldGVAE on the PDBbind dataset, which is the same dataset used for our model. This approach allowed us to evaluate ScaffoldGVAE under comparable conditions to our model, maintaining consistency in the data used across experiments. Our approach involved pretraining on the training data followed by target-specific fine-tuning for evaluation on the test set. The results are as follows:

Method	Validity(↑)	Connectivity (↑)	Diversity (↑)	Novelty (↑)	QED (↑)	SA (↑)	QVina (↓)	Time
ScaffoldGVAE	0.489	0.894	0.584	0.702	0.577	0.703	-5.270	-
TurboHopp-100_metric	0.993	0.906	0.486	0.935	0.502	0.710	-7.204	8.18

To define connectivity for 2D molecules, we considered a molecule to have connectivity if the generated scaffold could be successfully substituted for the original core and form bonds with the remaining functional groups. Despite the high QED values, most scaffolds generated failed to connect with functional groups, leading to low validity and connectivity. Furthermore, most of the scaffolds had low similarity with the reference scaffold. This is presumably due to the fact that the SMILEs based models do not consider 3D pocket structure during generation. In addition most VAE based scaffold-hopping models require tailoring on specific targets. Our model's performance can be attributed to its ability to utilize 3D structural information of target proteins. This allows for generation of molecules even for pockets with few ligands known to bind .

2) Comparison with other SBDD diffusion models (including Decompopt) (Table 3)

Please refer to the global rebuttal section.

Q2. Related works regarding further research on RL-applied molecular science tasks. (Table 2)

A2 Authors thank the reviewer for highlighting these works which we will cite. We do note that these works are different from ours in that [4] proposes to use a critic function which we found to be superfluous for consistency models. Further, we highlight that they focus on a single reward metric, while we create an array of target metrics to ensure that no metrics are sacrificed. Also during a single optimization task, we optimize our model towards multiple targets while [4] focuses on single target. Moreover, we differ from [5] since [5] is related to molecule docking tasks while we are covering a different problem regarding molecule generation. Nevertheless, we agree these works are important related research to cite and will include these works.

Q3. Metric Based Sampling (Table 1)

A3 Metric-based sampling is a technique that selects the best result towards the end of the sample trajectory, rather than selecting the final output. Figure 8 in Appendix D of our paper illustrates an example between conventional end-of-trajectory sampling and metric-based sampling. We extended Table 1 to ablate the effectiveness of this method, which shows a slight improvement. This is further demonstrated by enhancements observed in DiffHopp (Table 1: DiffHopp_scored vs. DiffHopp), indicating that it can yield higher quality samples. However, in DiffHopp, it significantly reduces sampling efficiency, as it necessitates more frequent evaluations of the generated products towards the end(100 sec vs 440 sec)

Q4. Geometric property metrics evaluation (Table 5,6,7,8,9)

A4 Authors compared geometric properties including those related to bond (Please refer to Table 5,6,8,9) and ring distributions(Please refer to Table 7). Results show that our model has closer bond length/angle/atom-atom length distributions to the reference molecules compared to TargetDiff, but poorer results compared in bond/torsion angles. In all aspects, DecompDiff was outstanding largely because it learns the distribution of bonds. For ring distributions, results show that our model is capable of generating similar ring types compared to the reference. In future research, we plan to design our model to learn bond properties , and we expect better results regarding geometric properties.

2024-08-13

Thanks for your detailed response!

The new experimental results provide a more comprehensive evaluation of the proposed methods.

It seems that the proposed methods show a similar performance compared with DecompOpt on the task of scaffold hopping. And as the tables in the attached PDF show, the geometric properties of the molecules generated by the proposed method are unsatisfactory.

Taking the above into consideration, I keep my current score.

评论- Response to Comment by Reviewer VN4j

2024-08-13

Thank you for your feedback! We wish to clarify some details concerning our results. Due to the unavailability of Decompopt's code implementation, performance metrics come from published literature, precluding a direct computational comparison (the scaffold masking method might be different, number of optimization runs are not shown, docking methods may be different). Consequently, the values presented for Decompopt should be considered as reference values rather than for direct comparisons. Furthermore, our variant, TurboHopp 50-RL, not only achieved higher docking scores exceeding that of Decompopt and reference but also demonstrated a significantly faster generation speed compared to other SOTA models. We kindly request the reviewer to consider the efficiency of generation regarding inference time and quality!

Method	Validity (↑)	Connectivity (↑)	Diversity (↑)	Novelty (↑)	QED (↑)	SA (↑)	QVina (↓)	Time
TargetDiff_inpainting	0.927	0.826	0.841	0.914	0.424	0.661	-5.896	740.33
DecompDiff_inpainting	0.876	0.722	0.856	0.895	0.420	0.648	-6.225	1263.72
DecompOpt_inpainting	-	-	-	-	0.490	0.710	-7.280	-
TurboHopp-100	0.990	0.853	0.484	0.936	0.488	0.702	-7.051	6.17
TurboHopp-100_metric	0.993	0.906	0.486	0.935	0.502	0.710	-7.204	8.18
TurboHopp-50RL_metric	0.997	0.951	0.800	0.952	0.524	0.674	-8.798	3.51
CrossDocked Test	1.000	-	1.000	0.599	0.476	0.727	-7.510	-

审稿意见

评分: 7置信度: 22024-07-13

Given a protein pocket and a reference ligand, the authors suggest a method to generate different scaffolds to be able to eventually come up with new ligands with similar or even improved properties. Precisely, the authors learn a consistency function which maps noise to a scaffold (created as a 3D-conformation) while being aware of the protein pocket and the functional groups of the reference ligand. This created scaffold together with the functional groups build a new potential ligand. By combining the consistency-based model with RL, the generation process can be biased towards scaffolds which build together with the functional groups ligands with optimized properties, e.g. binding affinity or protein steric clashes. The proposed method is compared with a baseline. Also the impact of adding RL to the approach is evaluated.

优点

Originality:

(S-O): As far as I know, scaffold hopping with consistency models and combining them with goal-directed RL has not been done before. Therefore applying consistency models to scaffold hopping and combining them with RL to optimize chemical properties is novel.

Quality:

(S-Q1): In terms of clarity and writing style the quality is very high (see clarity section)
(S-Q2): The results section shows that the proposed method is promising and helpful for scaffold hopping.

Clarity:

(S-C1) The paper is written very well. This together with a good paper structure, a good introduction and nice figures achieves that key take aways get very clear.
(S-C2) The introduction and the related work section set the stage very well for the proposed method. The authors give a good overview about recent work along with their strengths and weaknesses. Figure 1 highlights nicely the idea of the proposed method and Figure 2 shows its effectiveness.
(S-C3) The generation approach is described very well. Also Figure 3 is good and helpful. The auhors clearly describe which components of their pipeline are learnable.

Significance:

(S-S1): The results are significant because the suggested approach outperforms the compared baseline.
(S-S2): Error bars are reported.

缺点

Quality and Clarity:

(W-QC): The mathematical notation / the formulas sometimes seem a bit cluttered and inconsistent:
- In formula (8), $f_\theta^{n+1, x}$ is used for $f_\theta(Z_{n+1}, t_{n+1}|u)$ without having mentioned that one is the shorthand for the other.
- Equation (5) indicates that $F$ has two outputs. The way this equation is written is rather code-style notation than a well defined mathematical expression since $x^\prime_t, h^\prime_t$ is from a mathematical viewpoint not well defined. This is why this equation style should be avoided.
- $\sigma_\text{data}$ is not introduced but used in (7).

Significance:

(W-S): Because of (Q) the relevance for the RL-based scaffold generation might be limited for real-world scenarios.

问题

(Q): For SMILES based goal-directed optimization [1] show that there is a risk that the generator learns to find blind spots in the reward function and rather learns to trick the scoring function than to optimize real-world properties. Do the authors think this might be an issue also for their RL-based approach?

[1] Renz, Philipp, et al. "On failure modes in molecule generation and optimization." Drug Discovery Today: Technologies 32 (2019): 55-63.

局限性

The authors describe potential areas for improvement in the conclusion section.

作者回复

2024-08-06

We sincerely appreciate the reviewer's positive feedback on our model. Thank you for highlighting the typos and mathematical notations to fix; we will ensure they are corrected in the final draft.

Q1. Concerns regarding reward hacking (Table 2)

A1.

As the reviewer mentioned, authors agree that it is true that generators are prone to learning blind spots in the reward function, and this might indeed affect how useful it is in real-world properties. SMILEs-based RL algorithms exploit their state and action spaces to design chemically trivial molecules with exceptionally high docking scores. To counteract reward hacking in standard docking tasks, we follow the methods of [2], using a combination of QED, synthesizability and docking scores to mitigate this issue (Equation 11). In future research, we plan to create a more robust reward function aligned to better tackle reward hacking further. To this end, KL regularization to the original model usually fixes this problem (a common practice in LLM alignment with RLHF) and could be used if other metrics / real world fabrication showed issues. Other methods [3] have also been developed for these issues and likely would be extendable to consistency models. As shown below (please refer to Table 2 of PDF in global rebuttal), TurboHopp RL maintains other metrics other than docking score and also has high diversity compared to TurboHopp. Furthermore, TurboHoppRL trained on PDBBind test sets had competent generation quality on CrossDocked test sets, without training on CrossDocked test sets, exceeding the reference docking score, which indicates that our model generalizes well to new data. Contrary to other works related to RL for diffusion [4] which optimize towards a single protein pocket, this may be because we optimize on an array of target metrics to ensure that overfitting does not occur.

Method	Connectivity (↑)	Diversity (↑)	Novelty (↑)	QED (↑)	SA (↑)	Vina (↓)	Steps	Time
TurboHopp-100_metric	0.997	0.561	1.000	0.664	0.737	-8.298	100	7.14
TurboHoppRL-50_metric	0.980	0.869	0.936	0.619	0.680	-9.804	50	3.69
PDBBind Test	1.000	-	1.000	0.599	0.742	-8.643	-	-

[2] Ghugare, Raj, et al. "Searching for high-value molecules using reinforcement learning and transformers." arXiv preprint arXiv:2310.02902 (2023).

[3] Uehara, Masatoshi, et al. "Fine-tuning of continuous-time diffusion models as entropy-regularized control." arXiv preprint arXiv:2402.15194 (2024).

[4] Zhou, Xiangxin, Liang Wang, and Yichi Zhou. "Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process." arXiv preprint arXiv:2403.04154 (2024).

评论- Answer to rebuttal

2024-08-12

Thank you for answering the raised question.

I have read the other reviews and the authors' responses. On the one hand, the other reviewers seemed to raise valid points, e.g., a lack of baselines. On the other hand, the authors added information in this regard during the rebuttal. Assuming that the issues with respect to baselines and related work are solved (I hope reviewer VN4j will comment on this), I'd like to stick to my score because pocket-conditioned 3D molecular scaffold hopping is interesting to the community and the manuscript is of high quality.

评论- Response to answer

2024-08-13

Dear Reviewer Cq9Y,

Thank you for your positive feedback, your support and constructive engagement with our work!

Best regards,

Authors

作者回复

2024-08-06

We appreciate all the valuable feedbacks from the reviewers. Here we answer a question asked in common about comparing our model with de-novo molecule generative diffusion models.

Q. Comparison with other SBDD diffusion models (Table 3)

A: Although there exists a plethora of de-novo 3D-SBDD models , direct comparison with a scaffold-hopping model wasn't easy, but we have tried our best to compare as fairly as possible. In order to expand our baselines, we additionally applied inpainting[1] (refer to Algorithm 1 of PDF) on recent conditional de-novo diffusion models to create variations suitable for scaffold hopping. In terms of inpainting for scaffold hopping, "knowns" refer to functional groups conditioned, while "unknowns" represent the scaffolds that need to be generated. We use the same Bemis-Murcko scaffold to determine scaffolds and functional groups. With regards to sampling conditions, we fix the number of atoms to the reference scaffold for all models. For Decompdiff, since it additionally uses bond diffusion, we had to create a bond mask accordingly, and we use reference priors instead of applying Alphaspace. The sampling hyperparameters for inpainting (resampling and jump length parameters) were determined by sampling and finding the ones with best validity.

Also, since these models were trained on CrossDocked, we additionally trained our model on CrossDocked for fair comparison. We follow the same train-test split suggested in DecompDiff, but we add an additional QED minimum filter of 0.3 when constructing the dataset, resulting in training/validation dataset size of 84057/251 molecules . We only use the alpha carbon residues of the protein pocket atoms in order to reduce computational burden.

Please note that the code for Decompopt was released most recently (August, 2024), and scaffold hopping code for reproducing the results are missing. Decompopt uses Vina Score for validation, while ours use QVina2, and values denoted in the table below are those reported in the paper. "Inpainting" refers to models using inpainting method. The parameters for "metric" refers that inference was done with metric-based sampling. QVina(kcal/mol) refers to estimated binding affinity measured by QVina2.

Below are the results (we omit the standard deviations due to space issues, for the full table, refer to Table 3 of PDF):

Method	Validity (↑)	Connectivity (↑)	Diversity (↑)	Novelty (↑)	QED (↑)	SA (↑)	QVina (↓)	Time
TargetDiff_inpainting	0.927	0.826	0.841	0.914	0.424	0.661	-5.896	740.33
DecompDiff_inpainting	0.876	0.722	0.856	0.895	0.420	0.648	-6.225	1263.72
DecompOpt_inpainting	-	-	-	-	0.490	0.710	-7.280	-
TurboHopp-100	0.990	0.853	0.484	0.936	0.488	0.702	-7.051	6.17
TurboHopp-100_metric	0.993	0.906	0.486	0.935	0.502	0.710	-7.204	8.18
CrossDocked Test	1.000	-	1.000	0.599	0.476	0.727	-7.510	-

Despite having lower diversity compared to other diffusion models, our model has much faster generation speed as well as relatively high docking score close to DecompOpt, which optimizes molecules for multiple rounds, meaning it would probably take longer than DecompDiff. Please do note that inpainting in general increases generation time (but our model is still much faster without inpainting). Consequently, our findings show that custom scaffold-hopping model outperforms a repurposed de-novo model.

[1] Lugmayr, Andreas, et al. "Repaint: Inpainting using denoising diffusion probabilistic models." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.

评论- Additional results including TurboHoppRL-50_metric evaluated on CrossDocked

2024-08-10

For all reviewers, thank you for your time and effort for having interest in our work. We additionally add the results of TurboHoppRL-50 trained on CrossDocked.

Through RLCM(RL for consistency models) training, we were able to increase all metrics (Validity, Connectivity, Diversity, Novelty, QED, Docking Scores) except synthesizability exceeding those of the reference set especially for docking scores. Please refer to the novelty/diversity of generated molecules regarding model overfitting.

Method	Validity (↑)	Connectivity (↑)	Diversity (↑)	Novelty (↑)	QED (↑)	SA (↑)	QVina (↓)	Time
TargetDiff_inpainting	0.927	0.826	0.841	0.914	0.424	0.661	-5.896	740.33
DecompDiff_inpainting	0.876	0.722	0.856	0.895	0.420	0.648	-6.225	1263.72
DecompOpt_inpainting	-	-	-	-	0.490	0.710	-7.280	-
TurboHopp-100	0.990	0.853	0.484	0.936	0.488	0.702	-7.051	6.17
TurboHopp-100_metric	0.993	0.906	0.486	0.935	0.502	0.710	-7.204	8.18
TurboHopp-50RL_metric	0.997	0.951	0.800	0.952	0.524	0.674	-8.798	3.51
CrossDocked Test	1.000	-	1.000	0.599	0.476	0.727	-7.510	-

If you have any issues to discuss about please leave any comments!

最终决定Accept (poster)

2024-09-25

The authors present the application of consistency models and reinforcement learning to the problem of scaffold hopping. The original paper is well written and overall presentation is good. Reviewers raised concerns in particular regarding the lack of baselines and proper ablation. During the rebuttal phase these concerns have mostly been addressed. In particular, the authors approach outperform current SOTA on this task and do so with much faster inference time. Due to the importance of fast inference times in drug discovery and lead optimization, the latter contribution on its own warrants significant recognition. I recommend the authors include these baseline and ablations in their revised manuscript, and ensure that the reviewers concerns are properly addressed.