3.4

/10

withdrawn5 位审稿人

最低3最高5标准差0.8

3.8

置信度

正确性2.4

贡献度2.0

表达2.4

ICLR 2025

ProteinAdapter: Adapting Pre-trained Large Protein Models for Efficient Protein Representation Learning

Chao Wang,Zhedong Zheng,Yifan Sun,Hehe Fan,Yi Yang

OpenReview PDF

提交: 2024-09-27更新: 2024-11-18

TL;DR

The first Mamba-based adapter exploring the parameter efficient fine-tuning of pre-trained large protein models.

摘要

关键词

Protein RepresentationStructured State Space Models

评审与讨论

审稿意见

评分: 3置信度: 42024-10-17

This paper presents ProteinAdapter, a method for efficient protein representation learning. It combines multi-level sequence and geometric structure embeddings from pre-trained large protein models (LPMs) and uses a multi-scale predictor for various tasks. Experiments on over 20 tasks show its superiority over state-of-the-art methods in single-task and multi-task scenarios.

优点

I like the topic of studying multi-scale complementarity of protein sequences and structures, and I appreciate that the authors have done a good job of highlighting and explaining the motivations behind this.
The proposed method makes a good use of those existing protein language models. By using frozen protein models and a lightweight adapter, the method is more compute-efficient than existing state-of-the-art methods, requiring fewer parameters for fine-tuning.
The experiments are very adequate. The method is scalable to multiple downstream tasks, as demonstrated by extensive experiments on over 20 tasks, including various aspects such as protein function prediction, structure prediction, and interaction prediction.

缺点

From the results across multiple tasks in Table 4, I did not observe a substantial improvement of ProteinAdapter over ESM-1b, which does not convince me of the effectiveness of the proposed approach.
The two core contributions of the paper are (1) multi-level complementarity (sequence-structure co-modeling) and (2) multi-level modeling, which are not substantial innovations to the field of protein engineering and have already been commonly used and mentioned in many previous works, including ESM-GearNet [1], SaProt-PDB [2], and PromptProtein [3]. I encourage the authors to clarify the essential differences with these efforts.
ESM3 is a state-of-the-art generative multimodal protein model, how does the performance of your proposed method compare with it on various tasks?
The paper focuses on two specific LPMs (ESM-1b and ESM-IF1). It would be beneficial to explore the performance of the method with other LPMs or ESM-2 with different parameter amounts to further demonstrate its generality.
From my point of view, the main contribution of this paper is a Mamba-based embedding merge module. Despite its effectiveness, is Manba necessary here? Is the merging of embeddings really working or is it a proposed merging architecture?
The performance of ProteinAdapter highly depends on the quality and suitability of the pre-trained LPMs. If the pre-trained models have biases or limitations in their learned representations, it may affect the performance of ProteinAdapter on certain tasks or protein datasets. Have the authors done more exploration in this issues?

[1] Zuobai Zhang, Minghao Xu, Vijil Chenthamarakshan, Aurelie Lozano, Payel Das, and Jian Tang. Enhancing protein language models with structure-based encoder and pre-training. In International Conference on Learning Representations Machine Learning for Drug Discovery Workshop, 2023.

[2] Jin Su, Chenchen Han, Yuyang Zhou, Junjie Shan, Xibin Zhou, and Fajie Yuan. Saprot: Protein language modeling with structure-aware vocabulary. In The Twelfth International Conference on Learning Representations, 2024.

[3] Wang Z, Zhang Q, Shuang-Wei H U, et al. Multi-level protein structure pre-training via prompt learning[C]//The Eleventh International Conference on Learning Representations. 2022.

问题

See Weakness. I am pleased to consider the author's response to revise my score.

审稿意见

评分: 3置信度: 42024-10-26

This paper introduces ProteinAdapter, a Mamba-based approach for protein representation learning. It leverages the concept of integrating 3D structure and 1D sequence information using a Mamba-based Pre-alignment Module and multiple Mamba Fusion Blocks, which combine embeddings from pre-trained Protein Language Models and Protein Structure Models. The authors design a Multi-Scale Predictor for downstream tasks based on the characteristics of proteins. The paper validates the ProteinAdapter on benchmarks such as PEER and ATOM3D in both single-task and multi-task learning compared to previous works.

优点

This represents an innovative approach to utilizing Mamba as the foundation for fusion module, which is known for its computational efficiency. By incorporating frozen pre-trained models and a Mamba-based fusion module, the overall model achieves a high level of efficiency.
Results in Table 6 demonstrate it achieving better performances with fewer trainable parameters and higher training efficiency.
The concept of a multi-scale predictor based on the biological properties of proteins is well motivated.

缺点

The experimental setup of the model contains significant omissions. According to 4.2, if the structure input is replaced with the output from layer of sequence embedding, the resulting data fails to demonstrate the effectiveness of the fusion module in integrating real structural and sequence embeddings, rendering the experiment invalid. It would be essential to align the data with a 3D dataset such as the Protein Data Bank (PDB), or generating pseudo structure using pre-trained structure prediction model as written in 4.2, and re-evaluate the benchmark for "ProteinAdapter (w/o structure)" to enhance credibility.
If the original configuration, which includes both sequence and structural inputs, is maintained, the auxiliary tasks in the multi-task learning framework and the Fold Classification task in Table 2 are structure-related tasks, raising concerns about data leakage. This is evident in Table 5, where the performance gains of the two tasks are not on the same scale.
The contributions and innovations of this work are not particularly substantial, as the patterns of integrating 3D and 1D data are not breakthroughing (as several 3D + 1D models compared within the text).
This paper contains several writing issues, such as "bold" being misspelled as "blod" in all figure captions, and two arrows missing in the merge section of Figure 3. Additionally, the description of the dataset is quite vague, such as it does not clearly state the statistical information of the dataset and not fully listing its sources.

问题

Why are all the tasks from the PEER Benchmark in single-task learning not included in Table 1?
Regarding efficiency, could the authors provide data on training time and inference time?

审稿意见

评分: 3置信度: 42024-10-30

The authors introduce a novel approach to parameter-efficient fine-tuning, aiming to leverage the comprehensive knowledge within two Language Protein Models (LPMs), specifically EAM-1b and ESM-1F, for various protein-related tasks. Their method comprises two primary elements: a multi-level ProteinAdapter based on Mamba architecture, and a multi-scale predictor utilizing a secondary Transformer. The researchers assert that this combination effectively addresses two critical issues in protein representation learning: the need for multi-level complementarity and multi-scale integration. According to the paper, the proposed technique demonstrates superior performance compared to current state-of-the-art methods across more than 20 tasks, in both single-task and multi-task settings, as evidenced by their extensive experimental results.

优点

The paper presents an innovative approach to utilizing representations from two Language Protein Models through a parameter-efficient fine-tuning technique. While the concept of combining 1D and 3D LPM representations isn't entirely new, the authors' use of a Mamba-based adapter approach appears to be a novel contribution to the field. The comprehensive experimental results, spanning multiple tasks, indicate that their proposed method surpasses the performance of various baseline techniques mentioned in the study. The combination of computational efficiency and improved performance has the potential to benefit researchers across diverse areas of protein biology, offering a valuable tool for a wide range of applications in the field.

缺点

The manuscript lacks sufficient detail regarding the baseline comparisons. Several key questions remain unanswered:

What method was used to obtain baseline figures? Were they cited from existing literature or reproduced through new experiments?
How were the models applied to downstream tasks? Was fine-tuning or linear projection employed?
Were the entire models fine-tuned, or did the authors experiment with fine-tuning only the final few layers?

The ablation studies focus solely on GO and EC tasks. The rationale behind selecting these specific tasks is unclear. Are these purely sequence-based tasks or do they involve both sequence and structure? If they are sequence-only, can they be considered truly representative for comprehensive ablation studies?
Despite the inclusion of some ablation studies, the specific contribution of the proposed ProteinAdapter to performance improvements remains ambiguous. Critical baseline comparisons are absent:

Fine-tuning the last few layers of ESM-IF1 for 3D and (3+1)D tasks. This baseline is crucial to differentiate between improvements due to the adapter versus those from ESM-IF1 representations alone.
Substituting Mamba with a cross-attention block integration. This would more effectively demonstrate Mamba's advantages over MLPs than the current ablation studies.

The authors' claim of efficiency compared to ESM-1b at the end of section 4 is only valid if they fine-tuned the entire ESM-1b model. The comparison raises concerns about whether other parameter-efficient fine-tuning techniques or partial model fine-tuning were considered.
The authors' use of the term "multi-task scenario" could be misleading. While they train models for individual tasks with an auxiliary secondary structure prediction loss, readers might misinterpret this as suggesting that a single fine-tuned model can perform multiple diverse tasks.
For the sake of completeness, the experimental settings should be explicitly stated in the supplementary materials, rather than merely referencing another paper.

问题

Could you provide more details on your baseline comparisons, including how the numbers were obtained, the method used for downstream tasks, and whether entire models or only specific layers were fine-tuned?
Regarding your ablation studies, why were GO and EC tasks specifically chosen, and are these sequence-only tasks or sequence-structure tasks?
Have you considered additional baselines such as fine-tuning the last few layers of ESM-IF1 for 3/(3+1)D tasks or substituting Mamba with a cross-attention block integration?

审稿意见

评分: 3置信度: 32024-10-30

This paper proposes a new model for protein property prediction: use embeddings from a frozen protein language model (sequence) and inverse folding model (structure), merges them using Mamba layers, and then make predictions.

优点

Empirical results seem to show improvements or match state of the art across a variety of benchmarks.

缺点

The methodology feels like glueing together pLMs and inverse folding models like Lego blocks. I think there have been several works which have done similar things in the past without using Mamba (eg. https://icml-compbio.github.io/2023/papers/WCBICML2023_paper53.pdf). I think the overall idea is sensible, but I don't think this is a significant contribution in terms of methodology.
There is no real discussion on new insights or limitations of the method. My main takeaway from the results seems to be that (a) yes, the method is performing empirically well, but (b) I don't really have any new insights here besides that they got the idea to work as well or better than what currently exists for protein property prediction.

问题

Beyond being novel at the moment, I don't really see any real reason or justification to using Mamba/SSMs. To me, the main advantage of Mamba/SSMs is that they are super fast for processing very long sequences. This does not seem to be the case for the current application. What is the rationale for using Mamba? Could I not simply drop in an LSTM or Transformer as replacement?

审稿意见

评分: 5置信度: 42024-11-03

The paper aims to effectively leverage the knowledge embedded in pre-trained Large Protein Models (LPMs) for downstream tasks. The authors argue that the key to utilizing the knowledge from LPMs lies in the integration of protein information at multi-level, as well as in the multi-scale hierarchical prediction of this information. By implementing these methods, the authors achieve competitive results across multiple downstream tasks. This approach offers valuable insights into the cost-effective utilization of existing large protein models' knowledge.

优点

The paper is well-written, with a clear narrative that effectively explains the motivation and details of the proposed methods.
The study involves extensive experiments across over 20 tasks, demonstrating the competitiveness of ProteinAdapter compared to state-of-the-art approaches.

缺点

In the introduction, the authors highlight two challenges. However, the explanation could benefit from more detail about how these issues manifest, along with supporting citations to strengthen the argument. Given the current prevalence of joint sequence and structure pretraining in large protein models, it would be helpful to clarify whether these challenges still persist in this context. In the related work section, the authors briefly outline the advantages of Mamba. However, there is no discussion in the introduction or methods sections regarding why Mamba was chosen and how it effectively supports the proposed work. Instead, the paper directly incorporates multiple Mamba modules without further explanation. The experimental design could be more transparent. In Tables 1-6, there is no clear explanation of how the baseline methods were implemented, nor is it specified whether they were fine-tuned on the same task datasets as the proposed method. This lack of detail makes it difficult to assess the fairness of the comparisons. In Table 1, the performance of ProteinAdapter is quite similar to that of ESM-1b, making it challenging to demonstrate a clear improvement. If the gains are not substantial, it might be more practical to use the features from large protein models directly, avoiding the complexity of extensive downstream task training. In Table 4, the authors state that the multi-task experiment involves adding an auxiliary task during training but testing only on the main task. However, multi-task experiments typically evaluate performance across all tasks to understand the overall impact of the auxiliary task. Additionally, since the results of the proposed method show limited improvement over ESM-1b, it is difficult to claim a significant differentiation of ProteinAdapter from LPMs. In Table 5’s (3+1)D experiment, the comparison with the SaProt method from Table 2 is omitted, but the reason for this exclusion is not clarified in the paper. In Table 6, the ablation study could be more comprehensive. The multi-scale prediction head should first be compared with a transformer in the same framework, followed by a comparison with an MLP to assess relative advantages. Additionally, comparisons with other representations like pooling would help demonstrate the effectiveness of the multi-scale approach. Some writing problems In the introduction section (lines 30-33), the background discussion lacks citations, which may make it challenging for general readers to understand the context and significance of the work. Mamba is mentioned multiple times without citation, which could be confusing for readers. The first citation appears in line 120 of the main text, so it might be helpful to introduce it earlier in the paper. In the experimental section, SaProt appears directly in the comparison tables, but there is no description of the method in the main text. Providing an introduction to SaProt would enhance clarity and context for the comparisons.

问题

The citations for protein sequence models are somewhat outdated, primarily referencing work from 3-4 years ago, while the protein structure models cited are from 2 years ago. It may be beneficial to consider incorporating more recent studies as feature extractors to reflect the current state of the field.
Regarding the multi-scale predictor, the authors mention the presence of various substructures and functional regions in proteins but do not explain how the multi-scale prediction module addresses these aspects or the rationale behind its design. Clarifying these points in the text would provide readers with a better understanding of the module’s effectiveness and underlying principles.

伦理问题详情

N/A

撤稿通知

2024-11-18

I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.