PaperHub
4.0
/10
withdrawn4 位审稿人
最低3最高5标准差1.0
3
3
5
5
4.0
置信度
正确性2.8
贡献度2.0
表达2.0
ICLR 2025

Benchmark on Drug Target Interaction Modeling from a Structure Perspective

OpenReviewPDF
提交: 2024-09-27更新: 2025-01-06
TL;DR

A benchmark for drug-target interaction modeling from a structure perspective like GNN-based and Transformer-based methods

摘要

关键词
drug-target interactionbenchmarktransformerGNN

评审与讨论

审稿意见
3

This paper provides a comprehensive benchmark on Drug-Target Interaction (DTI) methods. Different methods utilizing GNN encoders and Transformer-based encoders for molecular representation are tested. During the benchmark, the paper offers observations and insights regarding model structure selection and featurization methods. Based on these observations, the authors propose a model combination approach that integrates both GNN and Transformer encoders to achieve a better DTI model.

优点

  1. As a benchmark paper, the experiments are significantly comprehensive and rich. A large number of models are tested on diverse datasets, yielding many observations and insights.
  2. From the observations, the authors propose a method for improvement, and the proposed model combination approach demonstrates improved results.

缺点

  1. The authors separate molecular models into GNN-based models and Transformer-based models, which seems slightly misleading. It would be more appropriate to categorize them by graph-based inputs and SMILES-based inputs, as graph-based inputs can also be used with Transformer models, and SMILES-based inputs can be used with other model types beyond Transformers. Additionally, there are models that use both SMILES and graph-based inputs, which are not addressed in the paper.
  2. The reasoning behind using the same hyperparameters for a fair comparison of different models is unclear. Different models should have their own optimal hyperparameters, and using identical configurations might not be the fairest comparison. It could be more meaningful to perform hyperparameter searches for each model and use the optimal set for each one.
  3. If a pre-trained protein encoder like ESM2 is used, then pre-trained molecular encoders should also be considered and benchmarked, as pre-trained encoders are commonly known to enhance performance in property prediction tasks. If the main focus of the paper’s benchmark is on encoders and representations, then pre-trained methods should be discussed more thoroughly. Otherwise, the focus should shift to how the molecular and protein embeddings interact after being separately encoded, rather than only emphasizing their individual representations.

问题

see weaknesses.

审稿意见
3

This article presents a comprehensive benchmark study on deep learning-based approaches for predicting drug-target interactions (DTIs), and proposing their best combination of models. In the early stages of drug discovery, identifying compounds that interact with specific targets within large chemical databases is time consuming and costly. Accurately and efficiently predicting the DTIs can significantly accelerate drug development and reduce costs. The study focuses on two major deep learning approaches in DTIs prediction: Graph Neural Networks (GNNs) and Transformers. By comparing the performance of these two types models, then evaluating over thirty recently published models across six datasets. The authors provide readers an overview of current deep learning-based DTIs prediction methods.

优点

  1. This work studies the prediction of drug-target interactions (DTIs), which has significant meaning in the drug discovery process.

  2. The authors utilized six datasets and evaluated on over thirty models based on graph neural networks or Transformers, making this a large-scale benchmark for DTIs prediction models.

缺点

  1. This article title shows that this study is from a structure perspective. The structure-based approaches for the drug target interactions are based on the three dimensional structures of chemicals and proteins. However, as some models also used the chemical SMILES strings, RDKit descriptor, ECFP, amino acids sequences of proteins. These can not be considered as a structure perspective.

  2. In the background information, it illustrated different graph neural networks, such as GNN, GCN, GIN, GAT. Transformers models with self-attention and cross attention mechanisms. However, in the macroscopic benchmark section, it is not clear that why authors only employed GCN, CNN, vanilla Transformer, ESM2 to do the test. Also, based on many published data and our tested data, ESM2 has generally better performance on encoding protein representations than the normal Transformer model. Maybe authors could have more detailed discussion to show why the vanilla Transformer performs better than ESM2 here.

  3. Unifying hyperparameter configuration section. Why chose these four prediction models to do the hyperparameters comparison? Also, between GraphDTA and GraphCPI, MRBTA and TransformerCPI their model architecture have some differences, and have different focusing on the protein-compound interactions prediction. If it is necessary to do this hyperparameter unifying?

  4. The datasets usage. Among these six datasets, if as written on the article only used C. elegans dataset from Tsubaki et al., 2018. This dataset is for protein-compound interactions study, not really in drug-target interactions. Also, Davis and KIBA datasets are only focusing on kinase. Even though kinase is one of primary drug targets, but maybe these two datasets can not represent to study the drug-target interactions in general.

  5. In the microscopic benchmark section, it is better to clear classify which types of the GNNs or Transformers of the prediction models used in Table 2 and 3. Some of them are indicated using GIN, GCN, etc. But it is better to label all of them, that can give readers intuitive information on comparison of different models' performance.

  6. Since you also compared the models' memory and running time. It would be great that you can show more details of your computing systems, like hardware, storage, python version, etc. Not just the "RTX 3090 GPU".

  7. Your prediction model combos. In Figure 5, it shows the model uses CNN for protein embedding, Graph Encoder and CNN for drug embedding. In the Graph Encoder part, you also integrate the Transformer model (like used its self-attention mechanisms). It would be great that you can more clear show how you this combination works, not only put an attention formula here. Also, it is not clear that how your combos handle classification and regression separately (what are the differences?). As there are the model performance results in both Table 2 (regression task) and 3 (classification task).

问题

  1. Illustrate the reason for unifying hyperparameter configuration? And why select those four prediction models?

  2. For both marcoscopic and microscopic section. Better and more clear to illustrate why selected those models to do benchmarking study

  3. Try one or two more general drug-target interaction datasets on the prediction models, not only focusing on one single type of targets.

  4. Add more details (including model architecture, mechanism, math, etc.) to your DTIs prediction combos.

审稿意见
5

In this work, a comprehensive survey and benchmark for drug-target interaction modeling from a structure perspective is conducted, and it integrates tens of GNN-based and transformer-based structure learning methods. A large number of macroscopical comparison between these two classes of encoding methods and the feature engineering techniques are considered. Besides, comprehensive benchmarks are conducted with experimental results leading to the optimal designation of model combinations.

优点

S1. The benchmark integrates extensive models and makes the comparison.

S2. Extensive analysis of network architectures has identified the optimal architecture combination, achieving state-of-the-art performance.

缺点

W1. Lack of insights: In my view, a key feature of a benchmark should be its ability to provide insights. For example, what unique characteristics does the data have for the DTI task, and why? This article, however, does not seem to make any substantial contribution toward advancing the field or pointing out essential directions for future development. Additionally, the classification approach used in this paper lacks novelty, as it is based on CNNs and transformers, which is a very straightforward and unoriginal approach. Past related works have also categorized methods this way. Can this field really progress continuously by relying solely on network architecture improvements and summaries? Are there any techniques that could assist in better capturing DTI? What are the underlying reasons for this capture—is it due to differences in networks, such as GNNs versus transformers, or are they specific components within networks that better suit the features of DTI tasks and data flows? These questions are difficult to conclude from the paper, which thus presents a large collection of experiments without providing inspiring conclusions. Additionally, I believe that the introduction of transformers and GNNs is overly detailed. For readers of a top-tier conference in the machine learning field, there is no need for such extensive descriptions of these network architectures. In the machine learning context, abstracting the DTI task into a machine learning problem is not particularly challenging, so the paper should focus more on analyzing experimental results and providing insights into future developments in the field.

W2. Insufficiency of related works: Although this work includes many related methods for comparison, it still misses some recent approaches, such as [1] and [2]. I suggest that incorporating these methods into the discussion and comparison would help make the experiments more comprehensive.

[1] Nguyen, Ngoc-Quang, Jang, Gwanghoon, Kim, Hajung, Kang, Jaewoo. Perceiver CPI: A nested cross-attention network for compound–protein interaction prediction.

[2] Wu, Lirong, Huang, Yufei, Tan, Cheng, Gao, Zhangyang, Hu, Bozhen, Lin, Haitao, Liu, Zicheng, Li, Stan Z. PSC-CPI: Multi-scale protein sequence-structure contrasting for efficient and generalizable compound-protein interaction prediction.

问题

I have raised all my doubts about the weaknesses. Please refer to W1 and W2.

审稿意见
5

This paper presents a comprehensive benchmark study of drug-target interaction (DTI) prediction methods, focusing specifically on structural modeling approaches. The authors evaluate over 30 different models across GNN-based and Transformer-based architectures on six standard datasets, examining both classification and regression tasks. The work unifies hyperparameter settings, conducts extensive empirical comparisons, and proposes model combinations that achieve competitive performance. The core contributions include: (1) A systematic evaluation framework with unified settings; (2) Detailed analysis of different structural encoding and featurization strategies; (3) Comprehensive empirical comparisons of model effectiveness and efficiency; (4) Novel insights leading to improved model combinations.

优点

The technical depth demonstrated throughout the work is exemplary. The authors present a thorough examination of structural encoding strategies, carefully distinguishing between explicit and implicit approaches. Their meticulous investigation of feature engineering methodologies and their subsequent impact on model performance offers valuable insights for the field. Furthermore, the authors' careful consideration of computational efficiency and memory utilization demonstrates a pragmatic understanding of real-world implementation challenges.

缺点

Despite its contributions, several aspects of the study warrant further consideration. The methodology, while comprehensive, raises important questions about potential biases introduced through the hyperparameter unification process. This standardization, while necessary for fair comparison, may inadvertently disadvantage certain model architectures whose optimal performance relies on more specific parameter configurations.

The experimental framework would benefit from additional depth in several areas. The absence of comprehensive cross-validation results for certain model configurations limits our understanding of model stability and generalizability. Moreover, the limited exploration of model behavior in edge cases and the lack of detailed failure mode analysis leave some uncertainty about the robustness of the proposed approaches.

From a technical perspective, the choice of baseline feature sets requires stronger theoretical justification. The model combination strategy, while empirically effective, would benefit from a more rigorous theoretical foundation. Additionally, the memory usage analysis could be enhanced by considering diverse hardware configurations, providing more comprehensive guidance for practical implementation.

问题

Methodological Considerations:

The sensitivity of results to the chosen hyperparameter unification strategy represents a fundamental concern. How do the proposed model combinations handle previously unseen molecular structures, and what criteria guided the selection of train/test splits? These aspects are crucial for understanding the robustness and generalizability of the findings.

Technical Implementation:

The cross-attention mechanism implemented in the model combinations deserves deeper examination. Specifically, how were the evaluation metrics weighted in determining overall performance, and what steps ensure the reproducibility of these results? The relationship between molecular featurization choices and model robustness also merits further investigation.

伦理问题详情

nothing

撤稿通知

I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.