PaperHub
5.2
/10
Rejected5 位审稿人
最低5最高6标准差0.4
5
5
5
5
6
4.2
置信度
ICLR 2024

Harmonic Prior Flow Matching for Multi-Ligand Docking and Binding Site Design

OpenReviewPDF
提交: 2023-09-23更新: 2024-02-11
TL;DR

a joint discrete-continous flow model for designing binding pockets for small molecules based on a flow matching approach for protein-ligand docking

摘要

关键词
flow matchinggenerative modelsproteinsmolecules

评审与讨论

审稿意见
5

This paper focuses on biomolecular designing problems. Two methods, HormonicFlow and FlowSite, are proposed. HarmonicFlow leverages flow matching for generating 3D binding poses of multi-ligands, and this method achieves better performance than the previous SOTA method. Furthermore, FlowSite leverages flow matching for designing binding sites for small molecules, where both discrete (residue types) and continuous (ligand pose) data are jointly generated.

优点

  1. HarmonicFlow applies flow matching on 3D binding pose generation for multi ligands. The corresponding experimental results are strong, surpassing the previous method, DiffDock. The method is simple yet effective.
  2. Binding site design is a novel task in machine learning community. This task is important to some real-world applications. As for novelty of methods, to design the residue types of the binding site, it generated both discrete and continuous variables under the framework of flow matching.
  3. Many reasonable tricks are used to improve the performance, such as harmonic prior and structure self-conditioning.
  4. The comprehensive experiments show the ability of both HarmonicFlow and FlowSite. Almost all experiment details are clearly clarified. The results demonstrate the effectiveness of proposed methods.

缺点

There are many other tricks used for training FlowSite and HarmonicFlow, such as many auxiliary losses similar to those in Alpafold2. But there is no related ablation study.

问题

Because no side-chains are involved in generation, binding affinity cannot be evaluated. Is there any post hoc method to generated side-chains for the designed binding site? If so, from my perspective, evaluation on binding affinity of generated binding sites and given ligands is more rationale.

评论

Thank you for the detailed comments and the clear distillation of several strengths of our method, as well as your two suggestions that we address below!

We are glad to see your appreciation of the novelty of the binding site design task in the ML community and the novelty of our joint flow over discrete and continuous data, considering that there has not been any discrete flow matching work yet.

Weakness:

“There are many other tricks used for training FlowSite and HarmonicFlow, such as many auxiliary losses similar to those in Alpafold2. But there is no related ablation study.”

  • We agree that our ablations focused on HarmonicFlow and thus added the following additional ablations for both FlowSite and HarmonicFlow that show how all components are relevant for performance:

  • FlowSite binding site design (Table 5):

  1. Dropping the auxiliary side-chain torsion angle loss
  2. Dropping the fake ligand data augmentation
  3. Using equivariant instead of invariant layers throughout the architecture
  4. Dropping the auxiliary refinement loss of the Equivariant Refinement TFN layers
  5. Adding noise to the backbone coordinates
MethodBLOSUM scoreRecovery
NO SIDE CHAIN TORSION LOSS45.947.7
BACKBONE NOISE39.642.4
NO FAKE LIGAND AUGMENTATION45.546.7
NO REFINEMENT LOSS45.747.6
ONLY EQUIVARIANT LAYERS29.835.3
FLOW SITE discrete loss weight = 0.846.947.8
FLOW SITE discrete loss weight = 0.247.649.5
  • 3D structure generation with Harmonic Flow (Table 4):
  1. Using isotropic Gaussian prior instead of harmonic prior
  2. Dropping the auxiliary refinement loss of the Equivariant Refinement TFN layers
Method%<2%<2*Med.
GAUSSIAN PRIOR17.029.23.8
VELOCITY PREDICTION11.928.83.8
STANDARD TFN LAYERS13.725.43.6
NO REFINEMENT LOSS9.822.13.7
NO SELF-CONDITIONING14.329.83.7
HARMONICFLOW σ = 018.331.33.5
HARMONICFLOW σ = 0.520.534.53.4

Questions:

“Is there any post hoc method to generated side-chains for the designed binding site?”

A great question! Multiple methods exist for the task of sidechain packing; however, they commonly are not aware of the ligand. In the updated paper, we improved our discussion in section 4.3 on binding affinity evaluation with respect to how, e.g., multiple residue types might be able to bind a particular ligand. We attempted to address this by developing “BLOSUM score,” which takes evolutionary and physicochemical similarity into account. Ultimately, we think a final evaluation has to be biological validation.


Thank you for the productive discussion so far! We hope that the improvements might warrant raising your score and we are happy to continue the conversation. Please also find other significant improvements in our general response above and have a great week!

评论

We authors thank you for your initial review and would be very grateful for any further discussion or acknowledgment as to whether any concerns remain after our improvements and responses!

评论

Thank you for the initial review! Any further discussion before today's deadline would be highly appreciated!

评论

Thanks for your response! Thanks for providing the extensive ablation results. However, the unavailability of the side-chains of the generated pocket is a weakness, which is also a concern of other reviewers. Due to this limitation, it may be hard to fairly or properly compare FlowSite and HarmoicFlow with some of the baselines. Experiments in [1] show that side chains play an important role in designing protein pockets. The technical novelty is limited based on the comments of Reviewers XYXw and k1W9. Considering the above two points, I decided to change my rating to 5.

References:

[1] Zhang, Zaixi, et al. "Full-Atom Protein Pocket Design via Iterative Refinement." NeurIPS 2023 Spotlight.

审稿意见
5

This paper proposes HarmonicFlow, an ODE flow model for generating pocket residue types and molecule structures.

优点

  • This paper studies an interesting problem.
  • The writing is generally clear.

缺点

  • The problem of multi-ligand binding is interesting, but I am not sure how important this task is. The authors briefly discussed this in Sec 2, and more detailed explanations can be helpful.

  • In Sec 1 & 2, the authors highlight that HarmonicFlow is the first DL method to handle multi-ligand docking. One critical question is that how fundamentally different multi-ligand docking is from protein-ligand docking (the existing methods). If they are different, then one question is that the authors compared HarmonicFlow with DiffDock, but they are solving different problems: one for multi-ligand docking and the other for protein-ligand (both single- and multi-ligand) docking. Why their numbers can be compared to each other? If they are the same in terms of the method, then HarmonicFlow is not the first DL method that can handle multi-ligand docking.

  • Does this vector field uu in Eq 1 have any physical meaning?

  • Can authors help summarize the essential difference between score matching / denoising diffusion with the objective (Eq 1&3) in this paper? Because score matching and denoising diffusion can be treated as ODE flow models. Also, is the proposed HarmonicFlow specific for multi-ligand docking? Now it seems that it can also fit for the general protein-ligand docking.

  • Many baselines, such as the related works mentioned in Sec 2, are missing in the experiments.

问题

Please see above.

评论

Thank you for the constructive questions that we now better address in the updated text and below! We hope the improvements warrant accepting the paper and are happy to make any further changes.

1, First bullet: “The problem of multi-ligand binding is interesting, but I am not sure how important this task is [...] more detailed explanations can be helpful”

  • Multi-ligand binding is indeed an important problem since it is crucial for, e.g., fragment-based drug design, and in our context, it is of interest for enzyme design where we would desire to design a binding site for multiple reactants / starting molecules of a chemical reaction that the enzyme should catalyze.

2, Second bullet: “the authors compared HarmonicFlow with DiffDock, but they are solving different problems: one for multi-ligand docking and the other for protein-ligand (both single- and multi-ligand) docking.”

  • We apologize for the following being unclear in the text and improved our description: The current implementation of DiffDock can actually only do single ligand docking. We compare HarmonicFlow (which can do both single- and multi-ligand docking) with DiffDock’s product space diffusion on a single-ligand benchmark. HarmonicFlow is not specific to only multi-ligand docking. Meanwhile, for our multi-ligand experiments, we compare with the diffusion process of EigenFold in Table 4. We hope the updated manuscript makes this sufficiently clear and welcome any further feedback!

3, Third bullet: “Does this vector field u in Eq 1 have any physical meaning?”

  • An excellent question! In general we want to be careful with such physics interpretations. The direction the vector field points in is where the expected x_1 is given that we currently are at x_t. At times very close to t=1 this can become similar to the direction of the forces on the atoms if our training data is from the Boltzmann distribution of the molecules.

4.1 Fourth bullet first part: “summarize the essential difference between score matching / denoising diffusion with the objective (Eq 1&3) in this paper?”

  • Another great question in our opinion! Flow matching and denoising diffusion are indeed similar, with flow matching being a generalization of diffusion models. The quantity ut(xx0,x1)u_t(x | x_0, x_1) that we regress against in equation 3 is a conditional vector field that points to a data sample (obtained by, e.g., the vector x1x0x_1 - x_0). Meanwhile, in diffusion models, we regress against the conditional score, which is a vector pointing toward the steepest ascent in a noised distribution. This way, in flow matching, we learn a drift that we can integrate to sample while in diffusion models, we have a predefined arbitrary drift that we have to combine with the learned score to sample. Please let us know if we can clarify this any further and if any questions remain!

4.2 Fourth bullet second part: “is the proposed HarmonicFlow specific for multi-ligand docking?”

  • HarmonicFlow is not specific to multi-ligand docking and can also be used for single-ligand docking which we hope is now better made clear in the updated text!

5, Fifth bullet: “Many baselines, such as the related works mentioned in Sec 2, are missing in the experiments.”

  • We adapt our related work section and baseline explanation in the experiments to better reflect the following:

  • With respect to HarmonicFlow: the works in section 2 (apart from DiffDock’s diffusion process, which we compare with) all require sidechain positions of the protein as input which is not available in our final application of interest — binding site design — where we aim to predict the side-chain identities.

  • With respect to FlowSite: of the mentioned inverse folding methods, we chose PiFold since it currently has the state-of-the-art recovery rate.


Thank you for the great questions! We hope the improvements that arose from them might warrant raising your score and we are happy to continue the conversation. Please also find other significant improvements in our general response above!

评论

We authors thank you for your initial review and would be very grateful for any further discussion or acknowledgment as to whether any concerns remain after our improvements and responses!

评论

Thank you for the initial review! Any further discussion before today's deadline would be highly appreciated!

评论

I appreciate the authors for the detailed replies to each of my questions.

However, some of the concerns remain:

  1. The architecture proposed here (with flow matching) is not specific to the multi-ligand docking problem.
  2. The algorithm has no knowledge of the physics. I read more literature papers in the past few days, especially the ones cited in the related work section. I think this is the issue of the whole research line: all the existing papers claim that they are handling the docking problem, which is a physical process. But indeed, only statistics modeling of the data distribution is considered.

I also appreciate the authors for the detailed response. Thus, combining all the pros and cons, I would like to keep my score.

审稿意见
5

The paper propose FlowSite as a new generative model of protein pocket residue types and ligand conformations.

Methodology novelties include 1) replace diffusion with flow matching; 2) use harmonic priors; and 3) modified model architecture.

Computational results on (multiple) docking and binding site design are analyzed.

优点

  1. Good organization.

  2. Elevation in final performances.

  3. Sufficient ablation studies.

  4. Fake ligand augmentation is an interesting proposal.

缺点

  1. Overall the paper is much of a summarization of known techs: harmonic priors are discussed in EigenFold; self-conditioning is applied in CV/ RFdiffusion; replacing diffusions with flow matching is well studied in CV; and model architecture improvements are somehow trivial.

  2. Three applications are proposed in this paper, while except for docking, multiple docking / binding site generation are not well investigated. The idea of multiple ligand docking itself is interesting, but not well studied. As it is mentioned in the title I'd expect some impressive case studies of multiple docking and some analysis of significance in the result. I would suggest the authors be more focused.

  3. Backbone flexibility is not allowed in the entire picture. All experiments are done on PDBBind which provides accurate holo structures of proteins. This very much limited the significance of improvements on figures in this paper.

  4. Baselines are weak.

问题

  1. Are sidechains configurations by any means included as data inputs?

  2. I would expect some analysis of the ODE dynamics, especially with regard to the joint dynamics of residue types and molecular conformations.

伦理问题详情

A shortened version of this paper is submitted to and accepted by NeurIPS GenBio. Ideas, figures and data are all the same. Check https://openreview.net/forum?id=XgkYO1S2vM .

评论

We appreciate your time taken for reviewing and are glad you find the fake ligand augmentation interesting and see the ablation studies as a strength.

Regarding the Dual submission ethics concerns:

  • We think there might be a confusion about the GenBio workshop having proceedings. The ICLR 2024 dual submission policy (https://iclr.cc/Conferences/2024/CallForPapers) states, “papers that [...] have been presented at workshops (i.e., venues that do not have publication proceedings) do not violate the policy.” Does this address your Ethics concerns?

Weaknesses:

1, “Overall the paper is much of a summarization of known techs”

  • We argue that there are 3 critical points of novelty that might be overlooked. We provide the first flow matching process to jointly generate discrete and continuous data, which we consider a novelty given that there is no work on discrete flow matching yet. Secondly, we think next to technical novelty, the novelty of the application is also important - our work is the first to tackle binding site design with deep learning. Thirdly, ours is (among another ICLR submission) the first to investigate flow matching for biomolecular structures, which exhibit (also in diffusion models) different behavior than CV.

2, “I'd expect some impressive case studies of multiple docking [...] I would suggest the authors be more focused”.

  • Thank you for the suggestion! We now included several case studies that were randomly picked from the test set and show how HarmonicFlow produces more realistic bond lengths and bond angles and respects physical constraints better. We acknowledge that our work indeed does not focus on multi ligand docking - we describe the problem and evaluate it since we see it as important for binding site design when the goal is to design binders for multiple molecules, such as reactants in an enzyme. In our updated version, we clarify this focus - we are happy to also change the title if you think that is a better description of the work.

3, “Backbone flexibility is not allowed in the entire picture. All experiments are done on PDBBind, which provides accurate holo-structures of proteins.”

  • We agree that for a full docking pipeline the performance on apo structures is interesting next to the holo performance. In our work, we evaluate docking performance only for the purpose of binding site design as required for our task definition which we chose for its alignment with the state-of-the-art traditional binding site and enzyme design approach of (https://www.nature.com/articles/s41586-023-05696-3) where the starting point is also a fixed backbone structure that is not altered.

4, “Baselines are weak.”

  • We added another new stronger baseline to Table 3 based on the anonymous ICLR 2024 submission DiffDock-Pocket [1]. It operates similar to our ground truth position comparison model, except that the fixed positions are given by a sample from DiffDock-Pocket, a recent state-of-the-art pocket level docking tool. We note that, as we are the first to consider this task, there are no existing baselines for it yet.
MethodBLOSUM ScoreRecovery Percentage
DiffDock-Pocket Pos42.645.0
FlowSite (ours)47.649.5

Questions:

1, Are sidechains configurations by any means included as data inputs?

  • They are not used as inputs and only a quantity that we predict as auxiliary target.

2, “I would expect some analysis of the ODE dynamics”

  • Thank you for the suggestion! We now included a plot (Figure 11) of the evolution of FlowSite's x_1 prediction's RMSD and the evolution of the output entropy of the residue type probabilities. The results show how a more determined structure prediction correlates with a decrease in uncertainty of the residue type prediction.

Thank you for the constructive conversation! We hope the improvements that arose from your suggestions might warrant raising your score and we are happy to continue the discussion.

评论

We authors thank you for your initial review and would be very grateful for any further discussion or acknowledgment as to whether any concerns remain after our improvements and responses!

评论

Thank you for the clarification of ethical concerns and other issues.

  1. the ethical concerns are resolved.

  2. thanks for the ode analysis which basically solves my concerns in the black-box issue about the ODE.

  3. the analysis of binding site design and multiple ligand docking is addressed, but the binding site design part is not well benchmarked against other inverse folding / inpainting baselines beyond PiFold. In fact, I would expect some larger elevation in the recovery rate because most of the sequences are given. The visualization results themselves are also not convincing to display the model's accuracy.

  4. about the concern “overall the paper is much of a summarization of known techs”, the author's reply still does not resolve my concerns. Indeed applications of techs transfering from one domain to another is interested, but current results are far from being strong enough to persuade a drug designer in believing the outputs from the model.

overall I would give this a 5.

评论

We are glad you found our improvements convincing and that several concerns were resolved!

Part of 3: "In fact, I would expect some larger elevation in the recovery rate because most of the sequences are given.

  • We wonder if there is confusion as to whether FlowSite would take a part of the sequence as input, such as the non-contact residues. This is not the case - neither FlowSite nor any of the baselines have access to any sequence information in the binding site design experiments. If there was indeed a confusion and we did not make it unclear in the paper, we will explain that better.

Regarding your point 4

  • We are glad that you find the application interesting and wonder if the above clarification makes the results stronger, considering that none of the methods use sequence as input.

Part of 3: Regarding more benchmarks "against other inverse folding / inpainting baselines beyond PiFold"

  • We hope that with the above clarification that none of the methods take the rest of the sequence as input, it is clear that sequence inpainting-based methods would not apply. We chose PiFold as the inverse folding model to compare with since it achieves the best sequence recovery in inverse folding benchmarks, and sequence recovery is also one of our evaluation metrics (next to BLOSUM score).

Part of 3: Regarding the visualization results

  • Are there other visualization results you would prefer, and are you referring to the multiligand docking case studies? We think in these randomly chosen complexes from the test set the accuracy seems remarkably good. We would happily provide any additional visualizations as soon as possible, either in the rebuttal period or afterward.

Thank you for engaging this much in the discussion, and we hope the fact that FlowSite does not take any sequence information (just like the baselines) might make the performance gap to the baselines sufficiently large.

审稿意见
5

The problem is to predict the structure of the binding pocket for a set of ligands, given the structure of the protein backbone and the 2D structure of the ligand(s). The authors introduce two flow matching models: HarmonicFlow (predicts the 3D structure of the pocket without residue types) and FlowSite (predicts both the 3D structure of the pocket and residue types).

In the main text, there are three groups of experiments:

Q1. Comparisons of HarmonicFlow with:

  • DiffDock on PDBBind,
  • EigenFold on MOAD.

Q2. Estimation of correctly predicted residues in the binding site on PDBBind and MOAD datasets.

Q3. Ablation studies for Flow matching design choices.

In addition, the authors introduce fake ligand data augmentation and recycling strategy for the flow model.

优点

FlowSite shows better quality than the methods to which it was compared in the reported settings.

缺点

  • The paper is rather hard to follow.
  • The design of most experiments is not clear to me (see questions).

问题

  1. In Q1, the approach is verified on a problem of site-specific docking. Why do you think that the quality of docking tells much about the quality of side-chain prediction (DiffDock can perform this task without this information)? Moreover, for me, it is not fair to compare FlowSite with a blind docking algorithm in a site-specific setting.
  2. In Q2, you compare the ground truth amino acids with the predicted ones. Is it a fair surrogate metric for de novo generation, when for a given set of ligands and backbone coordinates there potentially can be many different answers? An algorithm that can generate pockets with a better binding affinity than affinity of pockets from the dataset, will have a low score.
  3. The output of the flow-matching model should also depend on the sampled initial conditions. In Q2 experiments, do you consider a single run of the noise sampling? Have you tried to sample many different noise vectors and compare the outputs?
  4. You said that it is not feasible to estimate the energy. Why? I suppose that the energy of the designed protein and the affinity of the complexes can be somehow estimated using traditional or machine learning methods. At least, it is possible to use the all-atoms confidence model from DiffDock and compare scores of complexes.
  5. In many experiments, you mentioned the oracle method that has access to the ground truth ligand structure. What exactly is this method?
  6. How much time does it take to train your model on PDBBind and MOAD?
评论

Thank you for the time taken to review our work and for the excellent questions that helped improve the paper!

We understand how these questions arose - we updated the text to contain the following answers to your questions. We hope this makes the paper easy to follow and the experiment setup sufficiently clear. Please let us know if further changes are required.

1.1 Question: “Why do you think that the quality of docking tells much about the quality of side-chain prediction (DiffDock can perform this task without this information)?”

  • A great question! While DiffDock does not use side-chain atom positions, it does use the residue identities as features, which improves performance as the experiments without residue identities in Table 5 show. We are confident that a good structure prediction aids residue-type prediction since the recovery rate is 51.4% when using the ground truth ligand positions as input compared to 41.8% when using random ligand positions.

1.2 Regarding the concern of comparing with DiffDock’s product space diffusion, which was originally developed for Blind docking.

  • This is an important concern! However, the work “DiffDock-Pocket: Diffusion for Pocket-Level Docking with Sidechain Flexibility” (https://openreview.net/forum?id=1IaoWBqB6K) now shows that the product space diffusion employed in a similar way as in our work is a strong pocket level docking method (although they use side-chain atom positions which are not available in our application). We now cite this paper to clarify this.

2, Regarding using amino acid recovery for known pockets as metrics and that there can be multiple right answers, not only the residues in the ground truth data.

  • This is correct and an important point! The recovery metric has been shown to be a good proxy for downstream performance in the inverse folding literature (https://www.science.org/doi/10.1126/science.add2187 PMPNN, https://arxiv.org/abs/2306.16819 GRADEIF) and we try to improve over it with our BLOSUM metric which allows for multiple solutions by taking evolutionary and physicochemical similarity of residues into account. We now better discuss your concern in the 4.3 Metrics paragraph, and in our conclusion, we acknowledge how, ultimately, biological validation is required to fully evaluate, which is ongoing work.

3, Question “Have you tried to sample many different noise vectors and compare the outputs?”

  • We indeed generate 10 samples per complex and evaluate the average quality. In the updated text, we now also provide the standard deviation over these samples, which is 1.16 Angstrom RMSD for Distance-Pockets on the time split (similar for all settings) and 2.5% for the recovery rate and 1.8 Blosum score on PDBBind.

4, Estimating binding free energy or using DiffDock’s confidence model as proxy.

  • Our discussion of the following was unclear in the text, and we improved it in the update: DiffDock’s confidence model and common computational binding affinity correlates would require the ligand and side chain atom positions as input, which none of the methods (except ours) produce. Thus, we could only produce a number for FlowSite with no comparison. Further, DiffDock’s confidence model was never trained on negatives, meaning that it cannot distinguish between binders and non-binders, and there is no evidence for it as a correlate for binding affinity.

5, “you mentioned the oracle method that has access to the ground truth ligand structure. What exactly is this method?”

  • Thank you for pointing out our insufficient explanation, which we now improved! This is the same architecture as FlowSite, but instead of using the predicted ligand positions, the ground truth crystal structure ligand positions are used as input and not altered. In practice, this information would not be known during inference.

6, Question: “How much time does it take to train your model on PDBBind and MOAD?”

  • We now included this in the paper: on PDBBind, FlowSite took 58.6 hours, and on MOAD 115.6 hours. Both on an RTX A600 GPU.

Thank you for the productive discussion! We hope that the improvements might warrant raising your score and we are happy to continue the conversation and make further changes. Please also find other significant improvements in our general response above!

评论

We authors thank you for your initial review and would be very grateful for any further discussion or acknowledgment as to whether any concerns remain after our improvements and responses!

评论

Thank you for the initial review! Any further discussion before today's deadline would be highly appreciated!

审稿意见
6

The author introduces the flow matching-based algorithm for multi-ligand docking and binding site design. The results show both FLOWSITE and HARMONICFLOW achieved the start-of-art performance. Noticeably, the FLOWSITE is the first deep-learning method for designing ligand binding pockets.

优点

  1. In general, the writing is great. The contribution is clearly presented, and the method is well-introduced.
  2. The evaluation is fair and the improvement is significant.
  3. The limitation of FLOWSITE is well introduced.

缺点

Most of the technology is just an application of the existing methods which prevents the paper from getting higher scores in machine learning conferences.

问题

How fast are the two methods in terms of inference speed?

评论

Thank you for the positive review! We are glad you find the improvements significant and appreciate our discussion of FlowSite’s limitations.

Weaknesses:

“Most of the technology is just an application of the existing methods which prevents the paper from getting higher scores in machine learning conferences.”

We think our work provides 3 main important pieces of novelty: 1. We provide the first flow matching process to jointly generate discrete and continuous data, which we consider a novelty given that there is no work on discrete flow matching yet. 2. We think next to technical novelty, the novelty of the application is also important - our work is the first to tackle binding site design with deep learning. 3. Ours is (among another ICLR submission) the first to investigate flow matching for biomolecular structures.

Questions:

“How fast are the two methods in terms of inference speed?”

An excellent question and we provide these details in the updated manuscript! Averaged over 4350 generated samples, the average runtime of HarmonicFlow is 0.223 seconds per generated structure and that of FlowSite is 0.351 seconds per sequence.


Thank you for the discussion so far! We hope the clarifications and our other significant improvements in our general response might warrant raising your score!

评论

We authors thank you for your initial review and would be very grateful for any further discussion or acknowledgment as to whether any concerns remain after our improvements and responses!

评论

Thank you for the additional details – they are valuable. However, I prefer to maintain my original score as they haven't substantially altered my opinion. I acknowledge that there are some technical novelties in the paper but I personally believe they are insufficient to get a score of 8.

评论

We thank all reviewers for their constructive feedback and the time taken to suggest crucial improvements. Our main paper has significant changes (please see the updated .pdf with changes marekd in color) which are summarized below:

Performance Improvements and additional Experiments:

1 Additional baseline (Table 3): Reviewer 2 remarked about the strenght of the baselines. We note that, as we are the first to consider this task, there are no existing baselines for it yet. We added another new stronger baseline to Table 3 based on the anonymous ICLR 2024 submission DiffDock-Pocket [1]. It operates similar to our ground truth position comparison model, except that the fixed positions are given by a sample from DiffDock-Pocket, a recent state-of-the-art pocket level docking tool.

MethodBLOSUM ScoreRecovery Percentage
DiffDock-Pocket Pos42.645.0
FlowSite (ours)47.649.5

2 Several Multi-ligand Docking Case Studies (Figures 7, 8, 9, and 10): Reviewer 2 pointed out how these would be interesting - thanks! We study several examples of multi-ligands (see https://anonymous.4open.science/r/case_studies) that were randomly picked from the Binding MOAD test set, showing that the complexes that HARMONICFLOW generates are often more physically plausible than those of EIGENFOLD DIFFUSION. For instance, HARMONICFLOW’s rings have the appropriate shape and planar systems are actually planar which often is not the case for EIGENFOLD DIFFUSION.

3 Additional Ablations (Table 4 and Table 5): Multiple reviewers appreciated our previous ablation studies. However, Reviewer 5 pointed out that they focus on the 3D generative flow framework while lacking for FlowSite's binding site design. Thus we provide several FlowSite ablations next to additional insightful HarmonicFlow ablations that show how all components are relevant for performance:

  • FlowSite binding site design (Table 5):
  1. Dropping the auxiliary side-chain torsion angle loss
  2. Dropping the fake ligand data augmentation
  3. Using equivariant instead of invariant layers throughout the architecture
  4. Dropping the auxiliary refinement loss of the Equivariant Refinement TFN layers
  5. Adding noise to the backbone coordinates
MethodBLOSUM scoreRecovery
NO SIDE CHAIN TORSION LOSS45.947.7
BACKBONE NOISE39.642.4
NO FAKE LIGAND AUGMENTATION45.546.7
NO REFINEMENT LOSS45.747.6
ONLY EQUIVARIANT LAYERS29.835.3
FLOW SITE discrete loss weight = 0.846.947.8
FLOW SITE discrete loss weight = 0.247.649.5
  • 3D structure generation with Harmonic Flow (Table 4):
  1. Using isotropic Gaussian prior instead of harmonic prior
  2. Dropping the auxiliary refinement loss of the Equivariant Refinement TFN layers
Method%<2%<2*Med.
GAUSSIAN PRIOR17.029.23.8
VELOCITY PREDICTION11.928.83.8
STANDARD TFN LAYERS13.725.43.6
NO REFINEMENT LOSS9.822.13.7
NO SELF-CONDITIONING14.329.83.7
HARMONICFLOW σ = 018.331.33.5
HARMONICFLOW σ = 0.520.534.53.4

4 Analysis of joint discrete-continuous ODE dynamics (Figure 11): Reviwer 2 had the valuable suggestion of analyzing this. Thus we provide a plot of the evolution of FlowSite's x_1 prediction's RMSD and the evolution of the output entropy of the residue type probabilities in Figure 11. The results show how a more determined structure prediction correlates with a decrease in uncertainty of the residue type prediction.

5 Improvements in model quality (Table 3): For FlowSite, both the BLOSUM score and the recovery perecentage improved via engineering improvements such as silu activation functions, using layer norm, or combining heterograph messages immediately instead of calculating all updates and summing them at the end.

References

[1] Anonymous. Diffdock-pocket: Diffusion for pocket-level docking with sidechain flexibility. In Submitted to The Twelfth International Conference on Learning Representations, 2023

AC 元评审

The paper presents HarmonicFlow, a method for designing binding pockets for small molecules. The approach performs 3D protein-ligand binding structure generation by relying on a self-conditioned flow matching objective. Further, FlowSite is introduced to concurrently generate protein pocket residue types and the molecule's 3D binding structure.

为何不给更高分

This paper was borderline. In the end, the reviewers concluded that the methodological contributions were insufficient for a machine learning conference. Furthermore, they expressed doubts about the practical relevance of the application. The above suggest that the work might be more suitably published in a bio-oriented journal rather than at a machine learning conference like ICLR.

为何不给更低分

N/A

最终决定

Reject