4.7

/10

Rejected3 位审稿人

最低3最高6标准差1.2

4.3

置信度

ICLR 2024

DiffMaSIF: Score-Based Diffusion Models for Protein Surfaces

Freyr Sverrisson,Mehmet Akdel,Dylan Abramson,Jean Feydy,Alexander Goncearenco,Yusuf Adeshina,Daniel Kovtun,Céline Marquet,Xuejin Zhang,David Baugher,Zachary Carpenter,Luca Naef,Michael M. Bronstein,Bruno Correia

OpenReview PDF

提交: 2023-09-23更新: 2024-02-11

TL;DR

DiffMaSIF is a score-based diffusion model which advances rigid protein-protein docking with a novel surface-centric molecular representation

摘要

关键词

geometric deep learningprotein-protein dockingdiffusion modelequivariant networkprotein surface

评审与讨论

审稿意见

评分: 3置信度: 42023-10-19

The paper studies the problem of rigid-body protein-protein docking. In this context, given two protein chains, the goal is to predict the bound complex structure without accounting for the transformation of the protein structures. The introduced method, DiffMaSIF, employs a surface-based encoder to predict the binding site. Subsequently, these learned representations are utilized to determine the score function within SO(3) diffusion. The authors demonstrate DiffMaSIF's effectiveness by contrasting it with other deep learning approaches on a newly established benchmark. Additionally, they highlight the method's efficacy on proteins with limited MSA depths.

优点

Protein docking is an important problem in protein science. Utilizing surface-based techniques appears suitable for pinpointing the physical complementarity between ligands and binders. Such an approach might circumvent the generalization issues frequently encountered in many deep learning methods due to overfitting.
Experimental results for proteins with low-depth MSAs are encouraging. These findings suggest that surface-based methods are less dependent on co-evolutionary information compared to other techniques.

缺点

The paper omits many important details regarding the proposed method and experiments, which hampers comprehension and replication of the method.
The techniques employed in the paper are primarily borrowed from existing literature, making it lack of novelty.
The authors introduce a new dataset for evaluating the methods. However, it is unclear why this new dataset based on structural similarity is more appropriate. According to the statement “Equally, sequence similarity is often not a good predictor of structure similarity, with some proteins having high global sequence similarity but structural differences at the interface”, the datasets like DIPS and DB5, split by sequence similarity, present greater generalization challenges.
Many important baselines are missing in the new benchmark. The comparison made in the experiments is not fair.

For details, please refer to the Question section.

问题

Major points:

The title seems inaccurate. While the paper introduces a diffusion model in SO(3) space, it does not directly work on protein surfaces. The term "protein docking" should be incorporated to avoid misleading readers into interpreting it as a generative model for designing protein surfaces.
The paper is missing a 'Related Work' section that would establish context with existing studies. There is a wealth of related research on traditional protein docking, protein representation learning, and diffusion models for proteins. Furthermore, the discussion in Sec. 2 is brief. It would benefit from a more in-depth comparison of DiffMaSIF's advantages over existing methods like DiffDock-PP, AlphaFold-Multimer, and MaSIF.
The paper omits vital details about constructing the heterograph in the encoder and the joint PPI graph in the decoder — both of which are central to the method.
The design of the cross-attention between ligands and receptors for binding site prediction remains unclear.
In Figure 4A, what criteria are used to determine if two surface nodes bind? Conventionally, a precision-recall curve is plotted instead of presenting precision and recall values across all complexes.
The paper should include comparisons with deep learning benchmarks such as MaSIF and DiffDock-PP in Figure 4, especially since the authors argue that "deep learning methods often exhibit biases stemming from their training data."
Table 1 seems to miss several baselines for rigid-body docking, such as HDOCK[1], ClusPro[2], and PatchDock[3]. To highlight the effectiveness of diffusion models, results from MaSIF should also be included.
Leveraging an oracle to select optimal results evidently overestimates DiffMaSIF's performance. A more appropriate approach would involve training a confidence model to choose the best prediction, similar to DiffDock's strategy.
The comparison with AF2MM in Table 1 seems unfair. How can the results be comparable when the test set for AF2MM differs from the other methods?

Minor points:

Figure 1: The design of the figure needs improvement. The three subtitles appear disproportionately large, while the other text elements are notably small. The term "auxiliary task" is positioned at the top, yet its corresponding green box is at the bottom. Additionally, the protein structure within the blue box overextends beyond its boundary. Similar issues for Figure 2.
Sec. 3 and 4 would benefit from a switch in order. It's more logical to discuss the method after laying out the basic knowledge and to relocate the dataset discussion to the experiment section.
In the fourth paragraph of Sec. 3, the phrase "separated the representative structures with a few gaps at non-interfacial locations" is ambiguous. Where do these "gaps at non-interfacial locations" come from?
Sec. 4.1: Although it is not difficult to infer that the translation and rotation group defines the transformation on ligands with domain knowledge, there's a need to formally define the docking problem and diffusion process.
Sec. 4.1: The definition of the diffusion process on the rotation group remains unclear. The equation provided describes a distribution within the $IG_{SO(3)}$ space but lacks clarity on how noise increases as time $t$ progresses. A more thorough explanation is needed here, rather than merely replicating an equation from prior literature.
Sec. 4.2: Expand on the denoising score loss, rather than merely referencing its definition in DiffDock-PP.
Figure 4: In A and C, labels are displayed on the x-axis. However, in section B, the label is oddly placed on the y-axis.
Table 1: The significant figures for different methods should be consistent.
Figure 5: (1) The caption indicates, "The size of each data point corresponds to the number of intermolecular contacts in the complex", yet all points appear uniform in size. (2) The criteria used to plot the green dashed lines needs clarification.
The citations within the paper exhibit formatting inconsistencies. Most references lack the publication year, and many solely cite the arxiv version.

Typos:

Page 3, Sec. 2.1: “as they can the ensemble of bound protein confirmations” -> remove “the” and “of”.
Page 3, Sec. 2.3: “In (Gainza et al. (2023))” -> use \citet instead of \citep for references appeared in sentences.
Page 4, Sec. 3: “We retain the highest resolution among each clusters as test set”, clusters -> cluster.
Page 6, Sec. 4.2, Binding-Site Auxiliary Task: $p(x_{\psi}(t)|x_{\psi}(0)$ , lacks a right parenthesis.
Page 7, Sec. 5.1, Physiological interface prediction: Figure 4B -> Figure4C

Overall, I believe there's significant potential in exploring more applications of surface-based methods within the realm of protein docking tasks. That being said, the current version of the paper falls short in providing comprehensive details and experimental data, making it challenging to deem it ready for publication. I would strongly recommend the authors to reorganize and improve the paper to elevate its quality. This would not only solidify its scientific contributions but also make it a more compelling read for the community.

[1] Yan, Yumeng, et al. "The HDOCK server for integrated protein–protein docking." Nature protocols 15.5 (2020): 1829-1852.

[2] Kozakov, Dima, et al. "The ClusPro web server for protein–protein docking." Nature protocols 12.2 (2017): 255-278.

[3] Schneidman-Duhovny, Dina, et al. "PatchDock and SymmDock: servers for rigid and symmetric docking." Nucleic acids research 33.suppl_2 (2005): W363-W367

审稿意见

评分: 5置信度: 52023-10-31

The paper introduces DiffMaSIF, a unique score-based diffusion model tailored for rigid protein-protein docking, particularly in scenarios lacking co-evolutionary signals. Leveraging a protein molecular surface-based encoder-decoder architecture, DiffMaSIF excels in learning physical complementarity, showcasing superior performance compared to existing deep learning methodologies.

优点

The paper presents a surface-based model rooted in the DiffDock-PP diffusion framework, specifically for protein-protein rigid docking. Experimental results affirm its superiority over DiffDock-PP, highlighting its effectiveness.
The proposed surface-based method can avoid the need for evolutionary pretrained sequence embedding.

缺点

The method is constrained to rigid docking scenarios, assuming the availability of holo-like structures. This limitation becomes apparent in real-world applications where only apo-structures are accessible, and the protein surface, particularly the binding site, may exhibit significant flexibility.
The paper lacks an ablation study, leaving questions about the impact of pretrained GearNet embedding and the role of different level representations unanswered.
The technical contribution is somewhat incremental. The work primarily builds upon the DiffDock-PP diffusion framework for docking pose prediction, with the introduction of a surface-based encoder, a concept previously explored in existing literature [1].
The method employs "oracle selection" to choose the best pose based on the highest valued metric, a practice that may not provide a fair comparison against end-to-end prediction models like AFM and EquiDock. While generating multiple poses is feasible, it typically necessitates an additional selection module, as seen in DiffDock, to enable the model to autonomously select the binding pose.
The paper does not provide results or discussion regarding the inference efficiency of the proposed method.
There is no provided code or anonymous link to facilitate the reproduction of the results, which could be a potential barrier for other researchers.

[1]. Sverrisson, Freyr, et al. "Physics-informed deep neural network for rigid-body protein docking." MLDD workshop of ICLR 2022. 2022.

问题

The paper mentions the absence of "co-evolution representation" derived from protein sequences. Could the authors provide a more detailed explanation of this point?
In the binding site prediction experiment, only a random split is compared, which is expected to perform worse than DiffMaSIF. Could additional baselines be included for a more comprehensive evaluation?

审稿意见

评分: 6置信度: 42023-10-31

The authors propose DiffMaSIF, the first surfaced-based diffusion model for rigid protein-protein docking. The encoder of DiffMaSIF takes surface- and residue-level inputs and extracts surface features, coordinates and normals. The decoder comprises a VNN module and an E3NN module to predict ligand roto-translation scores. Experiments show that DiffMaSIF outperforms previous SOTA DL methods for rigid-body protein-protein docking, in particular on sequentially/structurally novel interfaces.

优点

Novelty: This work is the first surface-based diffusion model for rigid protein-protein docking, providing a possible solution for interfaces with little co-evolution information available.
Significance: Significant performance improvement over the previous SOTA is claimed.
Good writing & figures: The methods section is well written (for audience familiar with the subject), and the figures are straight to the point and easy to understand (despite some confusion in the encoder part, Figure 3).

缺点

Insufficient experiments: To present DiffMaSIF in a more comprehensive way, I would argue that the runtime, # parameters and ablation study results shall be reported.

问题

Why are there no supplementary materials? Is it by mistake or is there no other information you would like to present?
Please explain the data splits more clearly. Currently there are three "as test set" phrases in the data section, which, at first glance, is really confusing. I suppose the first two are in fact "test set candidates" (to be filtered further) and the last one is the final test set. If possible, a supplementary figure explaining the data split pipeline would be very nice.
If AF-Multimer is evaluated on a separate test set, I would suggest adding a supplementary table showing the results of all models on the AF-Multimer-set, just for fair comparison.
Regarding comments in the weakness section, it would be best if you could provide (1) the runtime and # params of all models, (2) ablation study showing the importance of the residue-level features, (3) ablation study showing the rationality of the decoder design. Why is DCGNN necessary?
According to Figure 3, "subsample" happens after VNN. Is this step where the top 512 predicted binding site nodes are used for subsequent layers? If so, then it seems to contradict the description that "The decoder works on the joint PPI graph consisting of the top 512 predicted binding site nodes of both the ligand and the receptor." If not, then what does this step do?

AC 元评审

2023-12-07

The paper tackles the important problem of predicting protein-protein complexes and the proposed diffusion-based approach is promising. Unfortunately the authors have not provided any feedback on important concerns raised in the reviews. We strongly encourage the authors to significantly revise their manuscript to provide more details on the approach and its evaluation, improve experimental setup, and better justify the relevance of the proposed dataset.

为何不给更高分

The paper has significant issues, as all reviews indicate. The authors did not respond to the reviewers.

为何不给更低分

N/A

最终决定Reject

2024-01-16

Reject