6.3

/10

Poster4 位审稿人

最低6最高7标准差0.4

2.8

置信度

正确性3.0

贡献度2.8

表达2.8

NeurIPS 2024

Navigating Chemical Space with Latent Flows

Guanghao Wei,Yining Huang,Chenru Duan,Yue Song,Yuanqi Du

OpenReview PDF

提交: 2024-05-11更新: 2024-11-06

摘要

关键词

Dynamical SystemOptimal TransportMolecular DiscoveryDeep Generative Models

评审与讨论

审稿意见

评分: 6置信度: 32024-07-03

The authors built a general latent flow-based framework unifies traversal and optimization in the molecular latent space. The flow is trained by utilizing energy functions so that the vector field aligns with the gradients, with regularization imposed by an auxiliary classifier that tries to differentiate each distinct flow. Under multiple evaluation settings, ChemFlow outperforms or is generally on par with previous SOTA.

优点

This paper is generally well-written and easy to follow.
It is a novel contribution to formulate the manipulation and optimization of molecules in latent space as learning the vector fields toward optimal distribution.

缺点

Typo: L747 "we first verify if the learned variational poster also follows a Gaussian distribution and we find that it does learn so", poster -> posterior
For Table 1, it risks not fully revealing the optimization ability if only TOP3 results are reported. Please consider adding mean and median values, too.

问题

In Table 2 and 3, why did ChemFlow perform best under mild similarity constraint, while suboptimal when the similarity threshold gets higher than 0.4? I think the similarity constraint is important, since usually in practical drug design people would expect to develop new therapeutics based on some drugs whose effects are already known, and the newly designed drug molecules are preferred to be similar so as to keep the effect.
For Figure 4, why is the predictor deviating so much from ground truth? LogP, QED and SA don't seem very hard to learn as far as I know.
Please consider elaborating on why "the learned variational poster[ior] also follows a Gaussian distribution" and "a strong correlation between almost all molecular properties and their latent norms" would contribute to the observed result that "a random latent vector taking a random direction will change the molecular property smoothly and monotonically".

局限性

This paper does not include a discussion on efficiency as compared with ChemSpace.

作者回复

2024-08-07

C1: Typo: L747 "we first verify if the learned variational poster also follows a Gaussian distribution and we find that it does learn so", poster -> posterior.

A: Thanks for pointing it out. It was a typo and we will fix it in the revised manuscript.

C2: In Table 2 and 3, why did ChemFlow perform best under mild similarity constraint, while suboptimal when the similarity threshold gets higher than 0.4? I think the similarity constraint is important, since usually in practical drug design people would expect to develop new therapeutics based on some drugs whose effects are already known, and the newly designed drug molecules are preferred to be similar so as to keep the effect.

A: Thank you for the question. We did not explicitly trained our methods for similarity constraint optimization. In the future work, we will explicitly encode more constraints to encourage our method to maintain similarity while optimizing the molecular properties. In addition, even though the absolute improvement of our method is not optimal for similarity threshold gets higher than 0.4, it has a optimal success rate compared to baseline methods.

C3: For Figure 4, why is the predictor deviating so much from ground truth? LogP, QED and SA don't seem very hard to learn as far as I know.

A: Thanks for the question. In Figure 4 (original paper), we set up an out-of-distribution scenario where the ground truth/prediction was the ZINC250k dataset (our VAE model was trained on a mixed of MOSES, ChEMBL and ZINC250k dataset where ZINC250k dataset was only a small fraction). However, the training set for the surrogate model was 10k randomly sampled molecular structures from the VAE model. We also only sampled 10k data points from ZINC250k which might not be representative of ZINC250k either. We have uploaded a new Figure 4 (Figure 2 in the uploaded PDF in the general response) showing the comparison on the full ZINC250k dataset (still out-of-distribution but slightly better).

We have further reported the training/test performance for the surrogate model, see Table 1 and 2 (training MAE and test MAE). To further study the out-of-distribution scenario we set up, we include Figure 3 in the uploaded PDF to show our training/test set for the surrogate model and ZINC250k dataset.

Table 1 Training Error (In-Distribution)

	plogp	qed	sa	drd2	jnk3	gsk3b
MAE	9.840	0.189	0.956	0.006	0.016	0.041
RMSE	12.948	0.231	1.205	0.011	0.021	0.052

Table 2 Test Error (Out-of-Distribution)

	plogp	qed	sa	drd2	jnk3	gsk3b
MAE	9.976	0.341	2.030	0.010	0.017	0.039
RMSE	11.076	0.366	2.203	0.038	0.025	0.049

C4: Please consider elaborating on why "the learned variational poster[ior] also follows a Gaussian distribution" and "a strong correlation between almost all molecular properties and their latent norms" would contribute to the observed result that "a random latent vector taking a random direction will change the molecular property smoothly and monotonically".

A: Thanks for the question. As VAE enforces the variational posterior to be a Gaussian distribution and we know that the geometry of high-dimensional Gaussian distribution is spheraical and the mass concentrates on the shell (if it is zero-centered, the norm of the sampled data is around $\sqrt{d}$ where d is the dimension of the data). In $R^d$ , any random direction would eventually take it to the outer shell of the Gaussian ball with larger norms and further increase or decreases the property value.

Thus the observation of the correlation between property values and latent norms is an emergent geometry of the learned latent space. Under this geometry, it is reasonable that any random direction could lead to monotonic and smooth change of the property.

C5: This paper does not include a discussion on efficiency as compared with ChemSpace. Similar to the table 2 in chemspace

A: Thanks for suggesting discussing the efficiency. Below is a table that summarize the efficiency of all baselines and our methods. It is benchmarked by training the model described in the unconstrained optimization task. The inference time is the time for optimizing 100,000 molecules for 1 step using a batch size of 10,000.

Even though our method has a slower training time compared to ChemSpace because learning the flows requires learning a neural network instead of a linear model (e.g. linearSVM), they have similar inference time. This fast inference time ensures that our method is also capable of conducting high-throughput molecule optimization and screening for drug discovery.

Method	Training	Inference/Iter (without oracle)
ChemSpace	<1 min	0.01 s
Gradient Based	7 min	0.03 s
Supervised Guidance	20 min	0.03 s
Unsupervised Guidance	32 min	0.03 s
Langevin Dynamics	7 min	0.03 s

2024-08-08

C6: For Table 1, it risks not fully revealing the optimization ability if only TOP3 results are reported. Please consider adding mean and median values, too.

Thank you for the suggestion. We have included a new table with mean, median, and standard deviation in the general response.

2024-08-13

Thank you for the response. My concerns are addressed and I've raised my score to 6.

审稿意见

评分: 6置信度: 22024-07-09

Designing new functional molecules within the vast chemical space is challenging, which necessitates efficient exploration and understanding of this space. The paper introduces a new framework called ChemFlow, which leverages latent space learned by molecule generative models and navigates it using flows. ChemFlow formulates the problem as a vector field that guides the molecular distribution to regions with desired properties or structure diversity. The paper conducts extensive empirical studies and justifies the effectiveness of the proposed method.

优点

The proposed ChemFlow unifies the previous approaches via the vector field, which is novel and effective for learning a latent space with rich nonlinearity information. This can benefit various downstream tasks, including drug-related properties and protein-ligand binding.
Extensive experiments have been conducted to provide a good insight into the components of the proposed method. ChemFlow achieves faster empirical convergence and higher success rates, especially using Langevin dynamics.
The paper is generally well-written, with clear illustrations and tables.

缺点

Despite the novelty of the proposed method, the ChemFlow mainly focuses on small molecules. This might hinder the border impact of the learned latent space for macromolecular tasks like protein.
ChemFlow employs multiple approaches to learning different latent flows. However, the experiment's results show that different methods have different specialties, and the paper does not discuss the connection between flow learning and downstream tasks.
The paper mentioned the out-of-distribution generation problem in Appendix D.7 and Sec. 4.2. ChemFlow has encountered such a problem in an unsupervised manner. This could hinder the utility of the learned latent space for scenarios with distribution shifts.

问题

Is it possible to visualize the vector field using a tool like t-SNE or UMAP to provide further insight into the entanglement of the molecular properties?
Could you discuss the connection between flow learning and downstream tasks?
Could you discuss the extension of ChemFlow to macromolecular tasks like protein?

局限性

The paper could provide further insights into the connection between flow learning and downstream tasks. Additionally, the author can consider discussing the extension of ChemFlow to macromolecular tasks like protein, which could make the proposed framework have a broad impact.

作者回复

2024-08-07

C1: ChemFlow employs multiple approaches to learning different latent flows. However, the experiment's results show that different methods have different specialties, and the paper does not discuss the connection between flow learning and downstream tasks.

A: Thanks for the question. This is indeed a good point. For molecular optimization task, we might have better properties of one flow vs another (e.g. Langevin dynamics vs gradient flow for optimization). However, for the latent traversal or unsupervised setting, we often do not know which flow is better. Nevertheless, we can use this as a prior to perform traversal while simultaneously structuring the latent space. As indicated in Table 5 of the Appendix, we demonstrate that the different flows work as various inductive biases to improve the performance.

C2: The paper mentioned the out-of-distribution generation problem in Appendix D.7 and Sec. 4.2. ChemFlow has encountered such a problem in an unsupervised manner. This could hinder the utility of the learned latent space for scenarios with distribution shifts.

A: We appreciate reviewer points out that out-of-distribution generalization hinder the utility of the learned latent space. The out-of-distribution generalization is not a particular problem for our proposed method, but rather a limitation of the underlying generative models and the surrogate model used. We thus leave this for future study.

C3: Is it possible to visualize the vector field using a tool like t-SNE or UMAP to provide further insight into the entanglement of the molecular properties?

A: Thanks for suggesting visualizing the vector field. We provided the t-SNE visualization of the traversal trajectory for each different property using both supervised and unsupervised wave flow in Figure 1 of the uploaded PDF. The plot shows that almost all trajectories grow towards a unique direction in the t-SNE plot. This implies the disentanglement of learned directions and, thus, molecular properties. In addition, the figures display sinusoidal wave-shape trajectories, indicating the flow is following the wave-like dynamics.

In the unsupervised t-SNE plot, the trajectories of some properties overlap, such as plogP and sa. It is because some properties correlate with the same disentangled direction, so their traversal follows the same direction, thus the same trajectories.

2024-08-08

C4: Despite the novelty of the proposed method, the ChemFlow mainly focuses on small molecules. This might hinder the border impact of the learned latent space for macromolecular tasks like protein. Could you discuss the extension of ChemFlow to macromolecular tasks like protein?

Thank you for the suggestion. We have included a discussion of the broader application of our approach in the general response.

2024-08-09

Thanks for your responses. It addresses all my problems.

评论- Thanks for the reply

2024-08-10

We thank the reviewer again for the time spent and we are glad our revisions addressed your concerns. If the reviewer has any further questions or concerns, please don't hesitate to let us know!

审稿意见

评分: 6置信度: 32024-07-12

The authors propose a new method called ChemFlow, which navigates molecular distributions in chemical space through flow.

优点

The method demonstrates high generality, applicable to various molecular optimization tasks.
Based on the experimental results presented, the method shows significant optimization of molecular properties.

缺点

The description of the experimental section lacks detail, such as which software was used to measure the docking scores?
Table 1 displays properties of several indicators, but showcasing only the top 3 among numerous sampled molecules may lack sufficient persuasiveness. It would be better to include a broader distribution, such as the mean and so on.

问题

Why were only two specific targets evaluated in the docking score experiment?
I would like to get some intuitive understanding : What problems arise if training a molecular property predictor in latent space and updating it directly based on its gradient? How is this issue typically addressed in the proposed method?

局限性

This work is a preliminary study.

作者回复

2024-08-07

C1: The description of the experimental section lacks detail, such as which software was used to measure the docking scores?

A: Thanks for pointing it out. We will thoroughly revise the experimental section in the revised manuscript to make sure all details are fully explained. Specifically, we used AutoDock [1] to calculate the docking score. We also used RDKit [2] and TDC [3] to calculate other molecular properties and structure similarity.

[1] Morris, G.M. et al. (2009) ‘AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility’, Journal of computational chemistry, 30(16), pp. 2785–2791.

[2] RDKit: Open-source cheminformatics. RDKit. Available at: https://www.rdkit.org.

[3] Huang, K. et al. (2021) ‘Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development’, arXiv [cs.LG]. Available at: http://arxiv.org/abs/2102.09548.

C2: Why were only two specific targets evaluated in the docking score experiment?

A: Thanks for the comment. To respect previous literature, we follow the set-up as in [4] that target the binding sites of two human proteins ESR1 and ACAA1. Human estrogen receptor (ESR1) is chosen because it is a well-characterized protein with known disease relevance. Human peroxisomal acetyl-CoA acyl transferase 1 (ACAA1) is chosen to demonstrate the model’s de novo drug design ability as ACAA1 has no known binders.

In addition, computing docking scores requires significant computational resources, as calculating the docking score for 10,000 molecules generated by a single method takes approximately 20 hours on one GPU.

[4] Eckmann, P. et al. (2022) ‘LIMO: Latent Inceptionism for Targeted Molecule Generation’, Proceedings of machine learning research, 162, pp. 5777–5792.

C3: I would like to get some intuitive understanding : What problems arise if training a molecular property predictor in latent space and updating it directly based on its gradient? How is this issue typically addressed in the proposed method?

A: Thanks for the comments. Training a property predictor and updating the latent vector directly based on its gradient is the exact method used in the previous work LIMO [4] as gradient-based optimization can be viewed as discretization of a gradient flow. In our work, we generalize this method as different types of flows. Gradient-based optimization may suffer from several challenges, such as stuck in local minima and poor convergence, especially in a high-dimensional space with noisy gradient guidance.

In our proposed method, these issues can be improved using techniques like Langevin dynamics, which introduces diffusion noise into the gradient updates. Intuitively, this approach helps the model escape local minima by injecting stochasticity into the optimization process, thus promoting better exploration of the latent space.

2024-08-08

C4: Table 1 displays properties of several indicators, but showcasing only the top 3 among numerous sampled molecules may lack sufficient persuasiveness. It would be better to include a broader distribution, such as the mean and so on.

Thank you for pointing it out. We have included a new table with mean, median, and standard deviation in the general response.

2024-08-13

Thank you for your clarification and effort. I wold like to raise my score to weak accept (6).

审稿意见

评分: 7置信度: 32024-07-13

This paper presents a novel gradient flow-based method to traverse the latent space of molecular generation models, known as ChemFlow. The authors instantiate their framework with a number of different flows inspired by dynamical systems. They also investigate the use of supervised and unsupervised guidance for the flow methods with the goal of optimising molecular properties of interest. The authors perform many experiments focused on molecular optimisation in order to evaluate their method. Specifically, they investigate unconstrained optimisation, similarity-constrained optimisation and multi-property optimisation.

优点

The authors present a novel formulation for exploring chemical latent spaces and optimising molecular properties which is an important and relevant task within the pharmaceutical and molecular design domains. They present a few different versions of their method, including an implementation which can use a surrogate property prediction model to guide the flow to optimised chemical space, as well as an unsupervised implementation which aims to maximise structural changes to the molecule.
The authors perform an extensive evaluation on molecule optimisation related tasks, comparing a number of different instantiations of their framework. They also benchmark against a previously introduced model for latent space traversal and a random traversal strategy.
ChemFlow methods show very promising results in comparison to baselines, especially when only looking at top performing molecules or applying similarity-constrained optimisation.
The authors also present a useful analysis of the latent changes under the random traversal strategy and an explanation for why random traversal can work reasonably well.
The paper is mostly very well written and, the evaluations in particular, are very clearly presented and easy to follow.

缺点

I find the methodology section on its own quite unclear since it's not clear how to actually use the objective functions that are outlined. Appendix sections D4 and D5 are helpful but ideally it would be possible to follow the main text on its own. Particularly, I think the text would benefit from showing the full loss function in the methodology and including a short outline of the training and sampling procedures and referring to the appendix.
The baselines for some tasks are a bit weak, particularly for the unconstrained molecular optimisation. For this task techniques like evolutionary algorithms and reinforcement learning fine-tuning have been proposed and widely used before. It would be very useful to see a comparison of ChemFlow with methods like these, as well as an evaluation of the training and sampling time for each.

问题

For the unsupervised guidance, when you match flows with properties, what the correlations are computed between? Does this require you to have an existing dataset of molecule-property pairs?
For the unsupervised cases did you experiment with different values of k? Which values of k were used? It seems to me that k might need to be very large in general in order to find a flow which matches with an arbitrary property.

Other suggestions: I assume equation 8 should have $\phi^k$ instead of $\phi$ ? Additionally, the ordering of t and z in $\phi(.,.)$ or $\phi^k(.,.)$ is inconsistent - compare figure 1, the caption for figure 1 and equation 11 with all the other places were $\phi$ is used.

局限性

While the authors evaluate their approach on optimising two properties simultaneously it's unclear how well this would work with a larger number, since their approach relies on simply adding the guidance terms together.
In this study the authors used a VAE with a fixed input size. It's unclear how easy it would be to apply a similar approach to a model with an arbitrary number of elements in an input or latent sequence, such as chemical language models or many diffusion-based models, which are much more commonly used in practice for molecular generation.
As far as I can tell the method as it stands requires a latent space optimisation to be done for every sample. This could lead to much longer sampling times than other methods such as RL fine-tuning which allows samples to be generated as normal but from an optimised model.

作者回复

2024-08-07

C1: The baselines for some tasks are a bit weak, particularly for the unconstrained molecular optimisation. It would be very useful to see a comparison of ChemFlow with methods like EA and RL fine-tuning.

A: Thanks for the suggestion. As our method focuses on the latent space of deep generative models, we compared mostly with methods for molecular optimization and traversal in a similar setup. However, we agree with the reviewer that the paper would benefit from additional baselines. We add an evolutionary algorithm-based (EA) approach to optimize molecules in the latent space. The pseudocode is provided as Algorithm 1 in the uploaded PDF in the general response. For a fair comparison, all methods in Table 1 in the uploaded PDF have the same number of oracle calls. The results in Table 1 show that our methods outperform all EA approaches.

It is possible to use reinforcement learning to guide the search in the latent space of molecular generative models but the main reason to use the latent space of generative model is to avoid the discrete nature of molecular structures and instead conduct optimization over a continuous space. We believe it will be nontrivial to propose a new reinforcement learning algorithm and thus leave it as a future study.

C2: For the unsupervised guidance, when you match flows with properties, what the correlations are computed between? Does this require you to have an existing dataset of molecule-property pairs?

A: We compute “the correlation between the property and a natural sequence (from 1 to time step t) along the optimization trajectory” (line 177). Specifically, we compute the following measurement:

$Spearman([P(m_1), P(m_2), …, P(m_t)], [1,2,...,t])$

where $P(\cdot)$ is the function to measure the real chemical property of a given molecule $m_t$ at time $t$ .

Even though we do not require access to a dataset of molecule-property pairs, we do rely on minimal supervision to match the direction to the list of properties it may control. In reality, we do this by scoring the molecules using oracle functions.

C3: For the unsupervised cases did you experiment with different values of k? Which values of k were used?

We observe as long as the value of k is larger than the actual number of properties, the results will not be impacted much. We agree having an approximation of the number of properties is a prior knowledge of determining the hyperparameters, but a good practice is always to start with a relatively large k and select the active ones after training.

C4: While the authors evaluate their approach on optimising two properties simultaneously it's unclear how well this would work with a larger number, since their approach relies on simply adding the guidance terms together.

A: Thanks for the constructive advice. We get motivated by the disentanglement literature so we mainly focus on optimizing individual properties. Assuming each property is an energy $\phi^k$ so the stationary distribution is a Boltzmann distribution $p^k(x) \sim exp(\phi^k)$ , then the summation of the guidance terms (i.e. energies) corresponds to sampling from the product distribution $\pi \sim exp(\sum \phi^k) = \prod exp(\phi^k)$ . Even though this implicitly assumes independence among different properties, it is commonly used (also known as product of expert [1]) in machine learning, e.g. energy-based models [2]. We will leave how to leverage the correlation between different objectives as future work.

[1] Hinton, G.E., 2002. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8), pp.1771-1800.

[2] Du, Y. and Mordatch, I., 2019. Implicit generation and modeling with energy based models. Advances in Neural Information Processing Systems, 32.

C5: In this study the authors used a VAE with a fixed input size. It's unclear how easy it would be to apply a similar approach to a model with an arbitrary number of elements in an input or latent sequence, such as chemical language models or many diffusion-based models.

A: Thanks for the question. The fixed-length VAE model supports any input with a length less than its maximum limit. Indeed it can be less efficient for varied-length input such as text, however, it has still been widely used in languages and other sequence data. Moreover, despite that we validate the proposed method on a specific problem, molecular design, and select a fixed-length VAE model as the generative model, the method is not limited to fixed-length input.

As long as the model architecture has a well-defined latent space, e.g. [3] uses the bottleneck of U-Net as the latent space for diffusion models and [4] similarly in the attention head of large language models, it is possible to adopt our method in other networks to discover meaningful properties.

[3] Kwon, M., Jeong, J. and Uh, Y., Diffusion Models Already Have A Semantic Latent Space. In The Eleventh International Conference on Learning Representations.

[4] Li, K., Patel, O., Viégas, F., Pfister, H. and Wattenberg, M., 2024. Inference-time intervention: Eliciting truthful answers from a language model. Advances in Neural Information Processing Systems, 36.

C6: The method as it stands requires a latent space optimisation to be done for every sample. This could lead to much longer sampling times than other methods such as RL fine-tuning which allows samples to be generated as normal but from an optimised model.

We focus on pre-trained generative models, where multiple objectives can be composed or removed at any time, while we admit this necessitates the extra sampling time as the model itself does not directly sample from the desired distribution, RL-based fine-tuning would limit the flexibility of the pre-trained model once it has been tuned for a specific task. The extra inference time is also minimal as it only requires a surrogate model evaluation (which is a relatively simple MLP model).

2024-08-08

C7: I find the methodology section on its own quite unclear since it's not clear how to actually use the objective functions that are outlined. Appendix sections D4 and D5 are helpful but ideally it would be possible to follow the main text on its own. Particularly, I think the text would benefit from showing the full loss function in the methodology and including a short outline of the training and sampling procedures and referring to the appendix.

A: Thanks for the suggestion. We will add the training objective $\mathcal{L} = \mathcal{L_r} + \mathcal{L_\phi} + \mathcal{L_\mathcal{P}}$ for supervised scenario and $\mathcal{L} = \mathcal{L_r} + \mathcal{L_\phi} + \mathcal{L_\mathcal{J}} + \mathcal{L_k}$ for unsupervised scenario to Sec 3.1 in the revised manuscript. We will also briefly discuss them and link them to the pseudocodes in the appendix.

C8: Inconsistency in Section 3 and Figure 1

A: Thanks for pointing out the mistakes. We fixed the notation in Section 3 and Figure 1 to make everything consistent with $\phi^k$ .

2024-08-13

Thank you for your thorough response and for conducting extra experiments. Most of my concerns have been addressed, however, if my understanding of the method is correct, I think the following two limitations still remain:

I agree that RL fine-tuning methods (eg. REINVENT as a representative example) don't optimise within the latent space but they are attempting to solve the same problem as ChemFlow - sampling molecules which optimise some scoring function. Of course these methods have their own strengths and weaknesses compared to latent space methods but I still think a performance (and possibly evaluation time) comparison would be beneficial here.
The study doesn't address many scenarios that are likely to encountered in practice for molecular design, such as optimising many properties simultaneously and using larger generative models such as chemical language models. I believe VAEs are not really much in practice for molecular generation because single-step generation is too weak.

I would still like to thank the authors for the very interesting ideas presented and I am happy to increase my score to 7.

评论- Thank you for your comment

2024-08-13

We appreciate again the reviewers' efforts in providing useful comments that greatly helped us improve the manuscript. Given the limited rebuttal period, we cannot finish the additional experiments, but we will add them to the camera-ready version.

作者回复

2024-08-07

We thank all the reviewers for their valuable feedback that helps us improve the manuscript. We appreciate reviewer’s common sentiment that our work is novel, general, applicable, and well written. We are also glad that reviewers compliment our work has extensive experiments showing the effectiveness and significance of proposed methods.

We will first address the points raised by more than one reviewer in this general response and then provide individual responses to each reviewer.

1. Broader applications of the proposed approach to other tasks.

This paper mainly focus on the problem of molecular design and optimization. However, we do not foresee any issues attached to our proposed method to be applied to other tasks. As long as the generative model architecture has a well-defined latent space, e.g. similarly in the attention head of large language models [1], it is possible to adopt our method in other networks to discover meaningful properties.

For example, our framework can be applied to protein design tasks. Previous work uses gradient-based method to optimize protein in the latent space [2]. This is generalized as traversing with gradient flow in our framework. Diffusion has deomonstrated its powerful ability in generating de novo proteins with desired properties [3]. By define a latent space for diffusion models, such as using the bottleneck of U-Net [4], it is also possible to extend our method to diffusion models for de novo protein generation.

[1] Li, K., Patel, O., Viégas, F., Pfister, H. and Wattenberg, M., 2024. Inference-time intervention: Eliciting truthful answers from a language model. Advances in Neural Information Processing Systems, 36.

[2] Castro, E. et al. (2022) ‘Transformer-based protein generation with regularized latent space optimization’, Nature Machine Intelligence, 4(10), pp. 840–851.

[3] Watson, J.L. et al. (2023) ‘De novo design of protein structure and function with RFdiffusion’, Nature, 620(7976), pp. 1089–1100.

[4] Kwon, M., Jeong, J. and Uh, Y., Diffusion Models Already Have A Semantic Latent Space. In The Eleventh International Conference on Learning Representations.

2. Mean/STD/median of the scores in Table 1.

In addition to reporting the top 3 scores, we computed the mean and standard deviation for the top 100 molecules after unconstrained optimization. Each entry in the table below follows the format mean ± std (median). The table shows that our methods have overall the best optimization performance. In addition, HJ exhibits better performance on mean and standard deviation than on top 3, showing that minimizing the kinetic energy is efficient in pushing the distribution to desired properties.

Model	plogP $\uparrow$	QED $\uparrow$	ESR1 Docking $\downarrow$	ACAA1 Docking $\downarrow$
Random	2.345 ± 0.386 (2.259)	0.903 ± 0.014 (0.902)	-9.127 ± 0.360 (-9.015)	-8.454 ± 0.316 (-8.390)
Chemspace	2.580 ± 0.406 (2.446)	0.907 ± 0.014 (0.906)	-9.523 ± 0.409 (-9.395)	-8.749 ± 0.356 (-8.640)
Gradient Flow	2.664 ± 0.382 (2.537)	0.910 ± 0.012 (0.908)	-9.452 ± 0.338 (-9.365)	-8.735 ± 0.337 (-8.650)
Wave (spv)	2.536 ± 0.439 (2.388)	0.903 ± 0.015 (0.898)	-9.630 ± 0.399 (-9.525)	-8.764 ± 0.344 (-8.650)
Wave (unsup)	1.736 ± 0.401 (1.610)	0.845 ± 0.014 (0.840)	-9.074 ± 0.329 (-9.000)	-8.813 ± 0.265 (-8.745)
HJ (spv)	2.482 ± 0.397 (2.382)	0.899 ± 0.017 (0.894)	-9.544 ± 0.322 (-9.460)	-8.792 ± 0.332 (-8.675)
HJ (unsup)	3.405 ± 0.254 (3.377)	0.911 ± 0.009 (0.909)	-9.132 ± 0.321 (-9.090)	-8.668 ± 0.243 (-8.630)
LD	2.463 ± 0.388 (2.399)	0.905 ± 0.014 (0.903)	-9.400 ± 0.360 (-9.300)	-8.709 ± 0.372 (-8.585)

最终决定Accept (poster)

2024-09-25

All four reviewers agree to accept the paper (although three of them only weakly), with some reviewers raising their score after the author response. For the camera-ready version, the authors should be certain to include the new experiments and results described in the rebuttal, as well as the additional baselines promised for reviewer kzEx, and a clear presentation of the overall objective pointing to the appropriate appendix material as necessary for the new algorithm block.