Learning Complete Protein Representation by Dynamically Coupling of Sequence and Structure
摘要
评审与讨论
This paper considers the problem of learning numerical protein representations using both sequential and structural data, and presents CoupleNet, a method that uses two graph types, one based on the amino acid sequence and the other based on the protein's tertiary structure, to extract protein representations via graph convolutions. Compared with several baselines, it is demonstrated that CoupleNet improves the performance of the learned representations across multiple downstream tasks.
优点
Combining sequence and structure data in proteomics is an important and timely problem. The performance gains of the proposed approach against baseline methods seem significant to some extent.
缺点
-
My main concern with the paper is the lack of comparisons (both in terms of the performance and the novelty of the method) with state-of-the-art protein language models and their structure-aware versions. It would be beneficial to compare the performance of CoupleNet with methods such as ESM-2 [A], ESM-S [B], S-PLM [C], ESM-GearNet [D], or SaProt [E].
[A] Lin, Zeming, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin et al. "Evolutionary-scale prediction of atomic-level protein structure with a language model." Science 379, no. 6637 (2023): 1123-1130.
[B] Zhang, Zuobai, Jiarui Lu, Vijil Chenthamarakshan, Aurélie Lozano, Payel Das, and Jian Tang. "Structure-informed protein language model." arXiv preprint arXiv:2402.05856 (2024).
[C] Wang, Duolin, Mahdi Pourmirzaei, Usman L. Abbas, Shuai Zeng, Negin Manshour, Farzaneh Esmaili, Biplab Poudel et al. "S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure." bioRxiv (2023): 2023-08.
[D] Zhang, Zuobai, Chuanrui Wang, Minghao Xu, Vijil Chenthamarakshan, Aurélie Lozano, Payel Das, and Jian Tang. "A systematic study of joint representation learning on protein sequences and structures." arXiv preprint arXiv:2303.06275 (2023).
[E] Su, Jin, Chenchen Han, Yuyang Zhou, Junjie Shan, Xibin Zhou, and Fajie Yuan. "Saprot: Protein language modeling with structure-aware vocabulary." bioRxiv (2023): 2023-10.
-
The writing quality of the paper could be substantially improved, in my opinion. Some examples:
- It seems that the graph on lines 114-115 is fully connected based on the definition of , whereas it is not necessarily the case based on the following sections.
- The statement " and are the transformations" on line 127 needs further elaboration.
- What is the domain and range of on line 130?
- The description of a message-passing layer in Eq. (4) is unnecessarily restrictive, since not all message-passing mechanisms use batch normalization or a fully-connected layer. It also does not take edge weights/features into account.
- What is the high-level intuition behind Eqs. (5) and (6)? What does the notation mean? Is it elementwise multiplication?
- What are the input/output dimensions of the FC layer in Eq. (10)? Are the edge features multiplied elementwise by the neighboring node features?
问题
-
On line 133, it is mentioned that the positions can be recovered from the geometric representations. I believe that is not entirely correct, because the recovered positions may not be unique, but they can be uniquely recovered modulo the transformation . Please verify that this is correct.
-
Could you please explain how average pooling is done and why the number of residues is reduced by half after every pooling layer? Does this statement mean that every pair of nodes will be combined into one node? How are the pairs determined? How are the edges determined for the combined nodes (since every node now corresponds to two residues with two different positions, instead of a single residue in the original input graph)?
-
On lines 227-228 in Section 3.4, it is mentioned that convolutions are performed on edges as well. However, based on the message-passing operation in Eq. (10), it seems that only node-level convolutions are performed, and there are no edge-level convolutions (even though the edge features are used for aggregating features at each layer).
局限性
The main limitation of this work is the lack of sufficient structural data as compared to sequential data for proteins, which the authors also allude to in Section 5. It would be helpful to comment on possible extensions to the case where, for some proteins, only sequential data is available, and what would happen if, for those proteins, only the sequence graph is considered, whereas for the rest of the proteins, both sequence and structure graphs are used in the CoupleNet pipeline.
Dear Reviewer JwZm,
We are grateful for your thorough review. Your comments are highly valued, and we would like to express our heartfelt gratitude. We do our utmost to address the questions you have raised:
Q1 Comparisons with state-of-the-art protein language models and their structure-aware versions.
A1 Thank you for your valuable feedback! Factually, CoupleNet is not a pre-training model; it is not fair to compare it with pre-training methods. Thus, we combine ESM-2 (650M) [1] with CoupleNet, named ESM-CoupleNet, using generated ESM embeddings as one part of graph node features. We compare ESM-CoupleNet with pre-training methods on protein function prediction and EC number prediction, including sequence-based methods, ESM-1b [2], ESM-2; sequence-function based methods, ProtST [3]; sequence-structure based methods ESM-S [4], GearNet-ESM [5], SaProt [6]. The comparison results are provided in Table 1 of the one-page rebuttal pdf. From this table, we can see that our proposed model, ESM-CoupleNet, achieves the best results among sequence-based, sequence-structure based, and sequence-function based pre-training methods.
Q2 Writing quality.
A2 Thank you for your suggestions! (1)In the following sections, we have definitions about the edge , it is important to ensure the preciseness of the graph definition. (2) As shown in Section 3.1, in the definition of Invariance and Equivariance, the transformations and represent the actions of the symmetry group (like rotations and translations). (3) The range of includes the geometric representations of the input 3D graphs, which are as listed in the paper, like . (4) We present Eq. 4 as we update the message-passing mechanism in Eq. 10, and we will rectify our descriptions. (5) Eq. 5 is the definition of the local coordinate system as first presented in [7], which is shown in Figure 7(a) in the appendix. The symbol represents the cross product operation. (6) As shown in Appendix F, the input/output dimensions of the FC layer in Eq.10 are 256. We do not update the edge features by the neighboring node features, which is different from GearNet [8].
Q3 On line 133, it is mentioned that the positions can be recovered from the geometric representations.
A3 Thank you for your reviews! The definitions of complete geometric representations are preliminaries proposed by [9]; this is correct as the geometric representations are complete, the structures can be calculated by these features if we use complete geometries.
Q4 Questions about average pooling.
A4 Thank you for your valuable reviews! As shown in Section 3.4, Appendix F and Figure 2, we employ complete message passing and sequence pooling layers to obtain the deeply encoded graph-level representations. Every two message-passing layers are followed by an average pooling layer. There are eight message-passing layers in the model. The sequence average pooling functions perform customized average pooling operations on the input tensors based on the calculated indices (dividing the length of the sequence by 2 and flooring the result). It aggregates and summarizes information from the input tensors using scatter operations to produce the output tensors (torch_scatter.scatter_mean). (2) After one average pooling layer, the number of residues reduces by half; for example, only one of the two consecutive graph nodes () remains. We reconstruct the graphs by Eq. 9 by expanding the radius threshold to after once pooling, which makes neighbors of center nodes gradually cover more distant and rare nodes, also reducing the computational complexity.
Q5 On lines 227-228 in Section 3.4...
A5 Thank you for your comments! In Lines 227-228, we stated that the message passing mechanism only executes on nodes in CoupleNet instead of on nodes and edges alternately used in GearNet. CoupleNet is our proposed model.
Q5 The lack of sufficient structural data as compared to sequential data for proteins...
A5 Thank you! For some proteins, only sequential data is available, we can consider using the AlphaFold predicted structures. This model is only developed for modeling protein sequences and structures concurrently. In the era of AlphaFold, it is a trend to modeling protein sequences and structures, using the structural information to enahce the model's performance.
Thank you again! In case our answers have justifiably addressed your concerns, we respectfully thank you that support the acceptance of our work. Also, please let us know if you have any further questions. Look forward to further discussions!
[1] Lin, Z., et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv, 2022.
[2] Rives, A., et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. PNAS, 2021.
[3] Xu, M., et al. Protst: Multi-modality learning of protein sequences and biomedical texts. ICML. 2023.
[4] Zhang, et al. Structure-informed protein language model. arXiv, 2024.
[5] Zhang, et al. A systematic study of joint representation learning on protein sequences and structures. arXiv, 2023.
[6] Su et al. SaProt: Protein Language Modeling with Structure-aware Vocabulary. ICLR, 2024.
[7] John, et al. Generative models for graph-based protein design. NeurIPS, 2019.
[8] Zhang, et al., Protein Representation Learning by Geometric Structure Pretraining. ICLR. 2023.
[9] Liu, et al. Spherical message passing for 3d molecular graphs. ICLR, 2021.
Dear Reviewer JwZm,
We express our sincere gratitude for your constructive feedback in the initial review. It is our hope that our responses adequately address your concerns. Your expert insights are invaluable to us in our pursuit of elevating the quality of our work. We are fully aware of the demands on your time and deeply appreciate your dedication and expertise throughout this review.
We eagerly anticipate your additional comments and are committed to promptly addressing any further concerns.
Once again, we extend our heartfelt thanks for your time and effort during the author-reviewer discussion period.
Sincerely,
The Authors
Thank you. I have decided to increase my score after reading the rebuttal and the rest of the reviews.
Dear Reviewer JwZm,
Thanks for your comments and response. We are grateful for your help in improving our work.
Best Regards,
The Authors
This work proposes the CoupleNet, a novel framework designed to interlink protein sequences and structures to derive informative protein representations. It integrates multiple levels and scales of features, constructing a dynamic graph to capture both local and global structural geometries. Experimental results demonstrate that CoupleNet outperforms state-of-the-art methods in multiple different tasks, e.g., folding / reaction classification, GO term / EC number prediction.
优点
[+] The proposed method is well-introduced and easy to follow. [+] The improved performance in protein function prediction makes CoupleNet highly valuable for practical applications in biology and medicine.
缺点
-
The experiment results do not include error bars and related analysis about multiple trials.
-
The proposed method looks quite general. Applying the architecture to more real-world problems, e.g., residue design / engineering may be helpful. Or, including some discussions about these topics may be useful.
问题
-
For the structure module, is the model robust to perturbations? Or, the model can aware of small changes in the structures? NMR PDB files which includes multiple models can be a test set.
-
The structure files are not available for real problems. I wonder what's the difference (in the model output space) between Alphafold structures or other folded structures and the real structures? Especially, on multiers or interface data.
局限性
n/a
Dear Reviewer ANTD,
We are grateful for your thorough review. Your comments are highly valued, and we would like to express our heartfelt gratitude. We do our utmost to address the questions you have raised:
Q1 The experiment results do not include error bars and related analysis about multiple trials.
A1 Thank you for your valuable feedback! We have done multiple trials. The performance is measured with mean values for five different initializations. We report the mean (variance) for the proposed CoupleNet:
| Method | Fold | SuperFamily | Family | Enzyme Reaction | GO-BP | GO-MF | GO-CC | EC |
|---|---|---|---|---|---|---|---|---|
| CoupleNet | 60.6(0.36) | 82.1(0.63) | 99.7(0.04) | 89.0(0.17) | 0.467(0.005) | 0.669(0.002) | 0.494(0.005) | 0.866(0.008) |
Q2 The proposed method looks quite general. Applying the architecture to more real-world problems, e.g., residue design / engineering may be helpful. Or, including some discussions about these topics may be useful.
A2 Thank you for your informative reviews! Protein design is the computational approach to designing amino acid sequences that fold into a predefined protein structure. ESM-IF [1], PiFold [2], and VFN-IF [3] are methods aiming to do protein design, where the task differs from protein representation learning of function prediction. Removing the sequence information, we do protein inverse folding by our method on CATH 4.2, the results are shown in Table 2 in the one-page rebuttal pdf. Our method also achieves almost the best results on this task.
Q3 For the structure module, is the model robust to perturbations? Or, the model can aware of small changes in the structures? NMR PDB files which includes multiple models can be a test set.
A3 Thank you for your reviews! We have conducted the noise analysis experiments, the results are shown in Section 4.2. Proteins in the test set are categorized into four groups based on their similarity to the training set, 30%, 40%, 50%, 70%, not by the default split rate of 95%. The results indicate that even when there is a low similarity between the training and test sets, our model also has the highest scores, which demonstrates the robustness of the proposed model. The model can be aware of small changes in the structures, as we have constructed the complete structural representations to ensure global completeness; this has been demonstrated in Appendix H. Such global completeness is theoretically guaranteed to incorporate 3D information completely without information loss, while the local view would miss the long-range effects of subtle conformational changes happening distantly.
Q4 The structure files are not available for real problems. I wonder what's the difference (in the model output space) between Alphafold structures or other folded structures and the real structures? Especially, on multiers or interface data.
A4 Thank you for your valuable reviews! The proposed model is a basic framework to model protein sequences and strucutres, if the real structures are not available, we can use Alphafold predicted structures. We have not consider the difference between predicted structures and the real structures, as in experiments, the strucutres we used are all from PDB files. The proposed model can also model protein complex.
Thank you again for all the efforts that helped us improve our manuscript! In case our answers have justifiably addressed your concerns, we respectfully thank you for supporting the acceptance of our work. As you know, your support holds great significance for us. Also, please let us know if you have any further questions. Look forward to further discussions!
[1] Hsu, C., et al. Learning inverse folding from millions of predicted structures. ICML, 2022.
[2] Gao, Z., et al. PiFold: Toward effective and efficient protein inverse folding. ICLR, 2022.
[3] Mao, W., et al. De novo protein design using geometric vector field networks. arXiv, 2023.
[4] Zhang et al. Protein Representation Learning by Geometric Structure Pretraining. ICLR, 2023.
Dear Reviewer ANTD,
We express our sincere gratitude for your constructive feedback in the initial review. It is our hope that our responses adequately address your concerns. Your expert insights are invaluable to us in our pursuit of elevating the quality of our work. We are fully aware of the demands on your time and deeply appreciate your dedication and expertise throughout this review.
We eagerly anticipate your additional comments and are committed to promptly addressing any further concerns.
Once again, we extend our heartfelt thanks for your time and effort during the author-reviewer discussion period.
Sincerely,
The Authors
Thank you. I decide to keep my score after reading the rebuttal and the other reviewers' opinion. Thanks for your response to the perturbations and s.t.d. problem.
I think more experiments (e.g., different complexes, NMR structures) can make the work more solid. Till now, it sounds a borderline paper to me.
Dear Reviewer ANTD,
We sincerely appreciate your response and suggestions! We are grateful for your help in improving our work.
Best Regards,
The Authors
The authors tackle the limitation of modeling inter-dependencies between protein sequences and structures. To solve this limitation, this work proposes CoupleNet which dynamically couples protein sequences and structures. Specifically, the authors propose to construct two-type dynamic graphs (sequential graph and radius graph) and execute convolutions on nodes and edges to encode proteins.
优点
- The proposed method is memory-efficient as it adopts hierarchical pooling.
- Considering different types of features such as torsion angles, dihedral angles, and planar angles, and conducting the ablation study on them is interesting.
缺点
- The tackling problem has already been solved by recent works [1, 2] which weakens the contribution of the proposed work. ESM-GearNet [1] successfully fuses the sequential and structural representations, and SaProt [2] develops a structure-aware vocabulary allowing to fusing the sequential and structural properties in one model. Also, the pre-training strategy of GearNet [3] could solve the problem as it aims to learn the similarity between the subsequence graph and the structural (radius) graph. I suggest the authors concisely compare those works and elaborate on the advantages of the proposed method.
- For the experimental results, the related works [1, 2] that model the sequences and structures simultaneously should be compared.
- The ablation study is not related to the core techniques of this paper. Even though the core techniques of this paper are dynamic pooling and constructing sequential and radius graphs, the authors do not consider them for the ablation study, hindering the readers from knowing what component largely contributes to the performance.
- As dynamic pooling is one of the core ideas of this paper, in the main body, the authors should explain what kind of pooling strategy is leveraged, what the effect of the proposed dynamic pooling is, and why it is important to model the two graphs.
[1] Zhang et al., "Enhancing Protein Language Models with Structure-based Encoder and Pre-training", ArXiv, 2023.
[2] Su et al., "SaProt: Protein Language Modeling with Structure-aware Vocabulary", ICLR, 2024.
[3] Zhang et al., "Protein Representation Learning by Geometric Structure Pretraining", ICLR, 2023.
问题
- Which pooling strategy did you adopt? Please elaborate on the pooling process.
- Could you explain the advantages of this work from the difference with existing methods in Section 3.4?
局限性
The authors discuss the limitation in Section 5.
Dear Reviewer fLEX,
We are grateful for your thorough review. Your comments are highly valued, and we would like to express our heartfelt gratitude. We do our utmost to address the questions you have raised:
Q1 I suggest the authors concisely compare those works and elaborate on the advantages of the proposed method.
A1 Thank you for your valuable feedback! (1) As shown in Lines 221-229 in the manuscript, there are only two different types of graphs in the proposed CoupleNet: radius graph and sequential graph; we did not use the k-nearest graph, as some nodes have distant neighbors when having neighbors. (2) The threshold in the radius graph in GearNet is set to be constant, but we change the threshold of radius dynamically to learn different distance relationships. (3) The message passing mechanism only executes on nodes in CoupleNet instead of on nodes and edges alternately used in GearNet. CoupleNet performs convolutions on nodes and edges simultaneously with several pooling layers. (4) ESM-GearNet combines ESM embeddings with GearNet embeddings; SaProt also uses ESM embeddings; it only uses the coordinate at the structure level, but CoupleNet models the coordinates of all backbone atoms completely.
Q2 Comparisons with the related works.
A2 Thank you for your informative reviews! Factually, CoupleNet is not a pre-training model; it is not fair to compare it with pre-training methods. Thus, we combine ESM-2 (650M) [1] with CoupleNet, named ESM-CoupleNet, using generated ESM embeddings as one part of graph node features. We compare ESM-CoupleNet with pre-training methods on protein function prediction and EC number prediction, including sequence-based methods, ESM-1b [2], ESM-2; sequence-function based methods, ProtST [3]; sequence-structure based methods ESM-S [4], GearNet-ESM [5], SaProt [6]. The comparison results are provided in Table 1 of the one-page rebuttal pdf. From this table, we can see that our proposed model, ESM-CoupleNet, achieves the best results among sequence-based, sequence-structure based, and sequence-function based pre-training methods.
Q3 The ablation study is not related to the core techniques of this paper.
A3 Thank you for your reviews! (1) In this paper, a novel two-graph-based approach for modeling sequential and 3D geometric features is proposed, ensuring global completeness in protein representation, which has been demonstrated in the appendix. We have done the ablations on the sequential and radius graphs by removing different input features, as shown in Table 3 in the paper. (2) CoupleNet performs concurrent convolutions and sequence poolings on nodes and edges. Convolution and pooling operations consist of the network. We remove the pooling operations in the network, and the results are shown in the following table. We can see without pooling operations the model's performance degrades significantly.
| Method | GO-BP | GO-MF | GO-CC | EC |
|---|---|---|---|---|
| CoupleNet | 0.467 | 0.669 | 0.494 | 0.866 |
| w/o pooling | 0.362 | 0.535 | 0.420 | 0.748 |
Besides, in Figure 4 of the manuscript, CoupleNet demonstrates its capability to capture long-range relationships; higher accuracies are observed for relatively large proteins with sequence lengths surpassing the mean length. This also illustrates the effectiveness of the dynamic pooling operations.
Q4 Questions about pooling strategy.
A4 Thank you for your valuable reviews! (1) As shown in Section 3.4, Appendix F, and Figure 2, we employ complete message passing and sequence pooling layers to obtain the deeply encoded graph-level representations. Every two message-passing layers are followed by an average pooling layer. There are eight message-passing layers in the model. The sequence average pooling functions perform customized average pooling operations on the input tensors based on the calculated indices (dividing the length of the sequence by two and flooring the result). It aggregates and summarizes information from the input tensors using scatter operations to produce the output tensors (torch_scatter.scatter_mean). (2) After one average pooling layer, the number of residues reduces by half, and we expand the radius threshold to after once pooling, which makes neighbors of center nodes gradually cover more distant and rare nodes, also reducing the computational complexity. These operations make the model capture the local to the global features.
Q5 Could you explain the advantages of this work from the difference with existing methods in Section 3.4?
A5 Thank you! We have stated the differences in Section 3.4. We will rectify it to explain extra advantanages in the revised version. For example, our model are more memory-efficient as we adopt dynamic (hierarchical) pooling to reduce the seuqnce length, the dynamically changed graphs can better model the node-edge relationships. We model the sequential and 3D geometric features, ensuring global completeness in protein representations.
Thank you again! In case our answers have justifiably addressed your concerns, we respectfully thank you that support the acceptance of our work. Also, please let us know if you have any further questions. Look forward to further discussions!
[1] Lin, Z., et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv, 2022.
[2] Rives, A., et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. PNAS, 2021.
[3] Xu, M., et al. Protst: Multi-modality learning of protein sequences and biomedical texts. ICML. 2023.
[4] Zhang, et al. Structure-informed protein language model. arXiv, 2024.
[5] Zhang, et al. A systematic study of joint representation learning on protein sequences and structures. arXiv, 2023.
[6] Su et al. SaProt: Protein Language Modeling with Structure-aware Vocabulary. ICLR, 2024.
Dear Reviewer fLEX,
We express our sincere gratitude for your constructive feedback in the initial review. It is our hope that our responses adequately address your concerns. Your expert insights are invaluable to us in our pursuit of elevating the quality of our work. We are fully aware of the demands on your time and deeply appreciate your dedication and expertise throughout this review.
We eagerly anticipate your additional comments and are committed to promptly addressing any further concerns.
Once again, we extend our heartfelt thanks for your time and effort during the author-reviewer discussion period.
Sincerely,
The Authors
Thanks for the authors' thoughtful response. I acknowledge that
- The pooling strategy is effective as shown in the additional ablation study on it.
- The provided additional comparison with other SOTA models demonstrates the effectiveness of the proposed method.
- The proposed method is efficient in terms of memory and time cost.
Especially, my major concern about the comparison with the SOTA models is solved. However, there still are some concerns.
- To emphasize the efficiency of this work, it would be better to compare the computational costs such as FLOPs or memory usage.
- The pooling strategy where the graphs are pooled by the indices of the sequence seems not to be sound to correctly encode the structural information
- The tackling problem is solved by the previous works.
Nevertheless, as the proposed method shows the SOTA performance and is efficient, I raise the score to 5.
Thanks for your prompt feedback. Your review is incredibly encouraging, and your valuable suggestions have been instrumental for us. We sincerely appreciate your response, which will improve the further work.
First and foremost, we would like to express our sincere gratitude for the insightful and constructive feedback provided by the reviewers on our manuscript.
We are particularly thankful for the Reviewers' recognition of the method of our study; it is important to combine sequences and structures (Reviewer fLEX, JwZm). We also appreciate their acknowledgment of the experiments we have conducted, which is interesting and better than other methods, making it highly valuable for practical applications in biology and medicine (Reviewer fLEX, JwZm, ANTD). Reviewer fLEX acknowledged that the proposed method is memory-efficient as it adopts hierarchical pooling, and Reviewer ANTD presented that our method is well-introduced and easy to follow.
We are grateful for the feedback received, particularly regarding the lack of comparisons with the pre-training methods; indeed, our method is a basic network that models protein sequences and structures, which is not a pre-training model, in order to compare with pre-training models fairly, we use ESM-2 [2] embeddings as the graph node features and develop ESM-CoupleNet. We compare ESM-CoupleNet with pre-training methods on protein function prediction and EC number prediction, including sequence-based methods, ESM-1b [2], ESM-2; sequence-function based methods, ProtST [3]; sequence-structure based methods ESM-S [4], GearNet-ESM [5], SaProt [6]. The comparison results are provided in Table 1 of the one-page rebuttal pdf. From this table, we can see that our proposed model, ESM-CoupleNet, achieves the best results among sequence-based, sequence-structure based, and sequence-function based pre-training methods.
In the one-page rebuttal pdf, we also provide the results on protein design, which is to design amino acid sequences that fold into a predefined protein structure, different from protein representation learning of function prediction. Removing the sequence information, we do protein inverse folding by our method on CATH 4.2, the results are shown in Table 2 in the one-page rebuttal pdf. These results illstrates the generalization of our method.
Once again, we sincerely appreciate the reviewers' feedback and remain committed to continuously improving our research and manuscript based on their valuable insights. Thank you again!
[1] Lin, Z., et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv, 2022.
[2] Rives, A., et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. PNAS, 2021.
[3] Xu, M., et al. Protst: Multi-modality learning of protein sequences and biomedical texts. ICML. 2023.
[4] Zhang, et al. Structure-informed protein language model. arXiv, 2024.
[5] Zhang, et al. A systematic study of joint representation learning on protein sequences and structures. arXiv, 2023.
[6] Su et al. SaProt: Protein Language Modeling with Structure-aware Vocabulary. ICLR, 2024.
The paper proposes CoupleNet, a novel framework to learn protein representations by dynamically coupling protein sequences and structures. It presents an innovative approach to integrating multiple levels and scales of protein features, and demonstrates superior performance compared to state-of-the-art methods, particularly in scenarios with low sequence similarities.
The reviewers pointed out that the core techniques of dynamic pooling and graph construction were not thoroughly explored in the ablation study, and the paper could benefit from a more concise comparison with related works and a clearer explanation of the advantages of the proposed method. Additionally, the writing quality and clarity of the paper could be improved.
Despite some areas for improvement, the paper presents a novel and effective method for protein representation learning, which has the potential to impact the field. The authors have also actively addressed the reviewers' concerns during the rebuttal phase, further strengthening the paper's contributions.