Enhancing Bioactivity Prediction via Spatial Emptiness Representation of Protein-ligand Complex and Union of Multiple Pockets
摘要
评审与讨论
This paper proposes a method called GeoREC to address the bioactivity prediction problem. GeoREC quantifies the spatial emptiness around each atom within a protein-ligand complex, and introduces two key innovations: (1) the Union-Pocket, which aggregates multiple binding pockets to capture global interaction context, and (2) a pairwise loss function, which preserves relative ordering among predictions. Together, these enhancements improve the accuracy and selectivity of affinity prediction.
优缺点分析
Stengths:
- The proposed augmentation strategy consistently improves performance across various baselines, including DTIGN, GIGN, and GAT.
- The pair-wise loss complements the regression loss by enhancing the selectivity for specific ligands and mitigating the effects of noise in affinity values.
Weaknesses:
- What impact do different aggregation strategies have on the final performance?
- Why does this method perform well on the SIU dataset, which contains complexes with a single binding pocket and a single ligand?
- How does the method perform on the LBA (Ligand Binding Affinity) task?
- If a ligand can bind to multiple positions in a protein with different conformations, how can multiple complex conformations be modeled? Does the method support using multiple complex conformations as input?
问题
See Weakness
局限性
This prediction method relies on the complex conformation structure. If the docking method generates inaccurate complex conformations, it may negatively affect the performance. Moreover, the method lacks analysis and discussion regarding this limitation.
最终评判理由
Thank you for the authors’ rebuttal. Most of my concerns have been addressed. My remaining concern is that the authors only compare the improvements against certain methods, without including comparisons to other strong binding affinity baselines. Nevertheless, I am happy to raise my score.
格式问题
NA
Thank you for your comments. We are grateful for your acknowledgment that our method has achieved substantial improvements on multiple benchmarks. Here we would like to response to the comments.
In response to Weakness 1
As discussed in Eq. 5 of the main text, we aggregate multiple pocket–ligand pose embeddings to improve the accuracy of bioactivity prediction. In our current experiments, all models use mean aggregation unless otherwise specified. The only exception is DTIGN, which by default employs its own attention-based aggregation strategy.
To analyze the effect of different aggregation strategies, we conducted controlled comparisons where both the baseline DTIGN and our enhanced DTIGN were equipped with the same aggregation function—either mean, sum, or attention. As in our other experiments, this comparison ensures that the observed improvements stem from our method rather than differences in aggregation. The results on three datasets (I1, I2, E1) show that our method consistently improves performance across all aggregation strategies, though the extent of improvement varies. In all settings, our enhanced DTIGN outperforms the original DTIGN with the same aggregation function:
- With mean aggregation, our method achieves an average RMSE reduction of 9.62%, along with improvements of 26.43% in Pearson’s and 25.42% in Kendall’s .
- With sum aggregation, improvements are more pronounced: 14.26% reduction in RMSE, and increases of 32.44% in and 46.54% in .
- With attention-based aggregation, the enhanced model achieves the best results—12.16% lower RMSE, and substantial gains in correlation metrics: 41.54% in and 52.46% in .
These results demonstrate that our method is robust to different aggregation choices, enhancing the baseline model regardless of the specific strategy used.
| Dataset/Aggfunction | Method | RMSE(↓) | Pearson(↑) | Tau(↑) |
|---|---|---|---|---|
| DTIGNI1 / Mean | DTIGN | 1.1971 | 0.4431 | 0.3307 |
| DTIGN Enhanced | 1.0183 | 0.6352 | 0.4356 | |
| DTIGNI2 / Mean | DTIGN | 0.7863 | 0.6760 | 0.4340 |
| DTIGN Enhanced | 0.6781 | 0.7786 | 0.5453 | |
| DTIGNE1 / Mean | DTIGN | 0.8817 | 0.4121 | 0.3166 |
| DTIGN Enhanced | 0.8803 | 0.4976 | 0.3764 | |
| Overall / Mean | Avg. imp(%) | 9.62% | 26.43% | 25.43% |
| DTIGNI1 / Sum | DTIGN | 1.1441 | 0.4708 | 0.2947 |
| DTIGN Enhanced | 0.9781 | 0.6744 | 0.4872 | |
| DTIGNI2 / Sum | DTIGN | 0.7466 | 0.7186 | 0.5044 |
| DTIGN Enhanced | 0.6170 | 0.8164 | 0.5932 | |
| DTIGNE1 / Sum | DTIGN | 0.9313 | 0.3782 | 0.2620 |
| DTIGN Enhanced | 0.8298 | 0.5312 | 0.4105 | |
| Overall / Sum | Avg. imp(%) | 14.26% | 32.44% | 46.54% |
| DTIGNI1 / Attn | DTIGN | 1.1977 | 0.3547 | 0.2445 |
| DTIGN Enhanced | 1.0364 | 0.5895 | 0.4239 | |
| DTIGNI2 / Attn | DTIGN | 0.7952 | 0.7128 | 0.4922 |
| DTIGN Enhanced | 0.6507 | 0.7888 | 0.5454 | |
| DTIGNE1 / Attn | DTIGN | 0.9086 | 0.3363 | 0.2139 |
| DTIGN Enhanced | 0.8645 | 0.4969 | 0.3705 | |
| Overall / Attn | Avg. imp(%) | 12.16% | 41.54% | 52.46% |
In response to Weakness 2
As shown in Figure 2c (Page 4) of [1], the SIU dataset contains a total of 249,631 pocket-ligand pairs across 9,544 protein pockets, indicating an average of approximately 26 ligands per pocket.
Although the SIU paper [1] states in the "Pocket definition" section (Page 19) that "For each PDB ID, a single pocket was extracted, defined as the region centered on the co-crystal ligand within a 15 Å radius", the datasets they released via their Hugging Face repository (linked from their GitHub page, which is referenced in the paper's abstract) contain only local pockets surrounding the individual ligand poses. In other words, the single pocket they used for docking was not fully utilized when constructing the pocket–ligand graphs in their released dataset. In our implementation, for each protein conformation, we use the predefined global pocket—docked by all its associated ligands—as the Union-Pocket. By combining this with GeoREC and our pairwise contrastive loss, our method was able to leverage richer spatial and geometric context. As a result, it performed well on the SIU dataset also.
In response to Weakness 3
Ligand binding affinity is typically quantified by the dissociation constant (Kd), which is included in the SIU dataset. To evaluate our method’s performance on this task, we analyzed the Kd and Ki prediction results across both SIU0.9 and SIU0.6 subsets. From the results summarized below, we observe that out of the 48 performance metrics reported across all models and datasets, only 5 metrics do not show improvement with our method. All five are correlation coefficients (Pearson and Spearman) on the SIU0.6 / Kd dataset, which corresponds to the LBA task where the training and test proteins share less than 60% sequence identity. This suggests that while our method is tailored to improve bioactivity prediction, it may be less effective at enhancing correlation-based metrics for ligand binding affinity on highly dissimilar proteins. Nevertheless, our method consistently improves RMSE and MAE on these more challenging cases, and achieves clear improvements across all four LBA metrics (RMSE, MAE, Pearson, Spearman) on the SIU0.9 datasets, where proteins in the training and test sets are more similar.
These results demonstrate that our approach effectively enhances bioactivity modeling and LBA prediction on proteins with moderate to high similarity, and still offers tangible benefits—particularly in reducing absolute error—even on dissimilar protein targets.
| Dataset/label | Method | RMSE(↓) | MAE(↓) | Pearson(↑) | Spearman(↑) |
|---|---|---|---|---|---|
| SIU0.9 / Kd | Unimol | 1.364 | 1.141 | -0.033 | -0.082 |
| DTIGN | 1.839 | 1.490 | -0.001 | -0.042 | |
| DTIGN enhanced | 1.304 | 1.060 | 0.321 | 0.326 | |
| GIGN | 1.708 | 1.367 | 0.070 | 0.038 | |
| GIGN enhanced | 1.455 | 1.139 | 0.296 | 0.261 | |
| GAT | 1.545 | 1.240 | 0.092 | 0.082 | |
| GAT enhanced | 1.473 | 1.166 | 0.261 | 0.254 | |
| SIU0.9 / Ki | Unimol | 1.235 | 1.017 | 0.485 | 0.452 |
| DTIGN | 1.607 | 1.276 | 0.360 | 0.329 | |
| DTIGN enhanced | 1.296 | 1.054 | 0.485 | 0.441 | |
| GIGN | 1.597 | 1.337 | 0.223 | 0.167 | |
| GIGN enhanced | 1.487 | 1.240 | 0.371 | 0.338 | |
| GAT | 1.706 | 1.386 | 0.301 | 0.262 | |
| GAT enhanced | 1.625 | 1.339 | 0.316 | 0.294 | |
| SIU0.6 / Kd | Unimol | 1.389 | 1.192 | -0.149 | -0.206 |
| GIGN | 1.371 | 1.115 | 0.265 | 0.281 | |
| GIGN enhanced | 1.326 | 1.078 | 0.280 | 0.227 | |
| DTIGN | 1.349 | 1.079 | 0.329 | 0.329 | |
| DTIGN enhanced | 1.332 | 1.069 | 0.327 | 0.248 | |
| GAT | 1.521 | 1.285 | 0.233 | 0.157 | |
| GAT enhanced | 1.424 | 1.182 | 0.107 | 0.133 | |
| SIU0.6 / Ki | Unimol | 1.255 | 1.034 | 0.472 | 0.452 |
| GIGN | 1.789 | 1.503 | 0.225 | 0.230 | |
| GIGN enhanced | 1.404 | 1.165 | 0.498 | 0.463 | |
| DTIGN | 1.993 | 1.653 | 0.123 | 0.091 | |
| DTIGN enhanced | 1.321 | 1.079 | 0.472 | 0.452 | |
| GAT | 1.976 | 1.690 | -0.060 | -0.096 | |
| GAT enhanced | 1.694 | 1.381 | 0.303 | 0.263 |
In response to Weakness 4
In our current implementation, multiple complex conformations are explicitly modeled by processing each conformation independently through the graph neural network to obtain individual graph-level embeddings. These embeddings are then aggregated—using either mean or attention-based pooling—before being passed to the bioactivity prediction network (typically a multi-layer perceptron). In this way, our method does support multiple complex conformations as input while leveraging shared network weights to process each conformation consistently.
In response to Limitations
We fully acknowledge that inaccurate complex conformations generated by docking methods can negatively affect model performance. To clarify this limitation, we will add the following note to the appendix:
Limitations of Docking-Based Structures:
Docking methods can generate inaccurate complex conformations, which may impair the performance of models trained on docked structures. Users should be aware of this potential issue and are encouraged to use biologically plausible docking poses obtained from well-established software tools. In the DTIGN dataset, ligand poses were generated using AutoDock Vina [2], and the top-ranked poses were selected for model training (as stated in the “Docking method” section on Page 2 and the footnote of Table 1 in [3]). In the SIU dataset, docking was performed using AutoDock Vina [2], Glide [4], and GOLD [5], with a voting strategy applied to select representative poses (as described in the “Structural data construction via multi-software docking” section on Page 5 of [1]).
References
[1] Redefining the task of Bioactivity Prediction, ICLR, 2025
[2] AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading, Journal of Computational Chemistry, 2010
[3] Advancing bioactivity prediction through molecular docking and self-attention, JBHI, 2024
[4] Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy, Journal of Medicinal Chemistry, 2004
[5] Improved protein–ligand docking using GOLD, Proteins: Structure, Function, and Bioinformatics, 2003
Dear Reviewer b5PK,
Thank you once again for your valuable comments on our submission. As the discussion phase is approaching its end, we would like to kindly confirm whether we have sufficiently addressed all of your concerns. Should there be any remaining questions or areas requiring further clarification, please do not hesitate to let us know. If you are satisfied with our responses, we would greatly appreciate your consideration in adjusting the evaluation scores accordingly. We sincerely look forward to your feedback.
Dear Reviewer b5PK,
Thank you once again for your valuable comments on our submission. As the discussion phase is approaching its end, we would like to kindly confirm whether we have sufficiently addressed all of your concerns. Should there be any remaining questions or areas requiring further clarification, please do not hesitate to let us know. If you are satisfied with our responses, we would greatly appreciate your consideration in adjusting the evaluation scores accordingly. We sincerely look forward to your feedback.
Thanks for the rebuttal. I still have the following concerns:
Regarding Weakness 4: The aggregation of global embeddings from different conformations appears too simplistic and could be trivially applied to any baseline method. It seems disconnected from the core methodology proposed in the paper.
Regarding Weakness 3: The LBA benchmark I referred to is from [1], not the alternative metric mentioned in the SIU.
[1] ATOM3D: Tasks On Molecules in Three Dimensions
Thank you for your comment. Here is our response.
Weakness 4: We believe that there are some misunderstandings. 1) In this paper, we did not claim the aggregation as our contribution. They are just a component in the networks involved in the evaluation. 2) We also implemented them in the baselines. 3) We have conducted ablation experiments showing that our contributions remain valid regardless of the specific aggregation method used.
Weakness 3: We acknowledge that ligand binding affinity (LBA) prediction is an important task for pre-screening of non-binders. However, binders do not necessarily have bioactivities. Binding to non-functional sites or occupying non-critical spaces produces inactive binders. That's why we proposed the concepts of Union-Pocket and GeoREC: to further improve the drug hit rate by accurately modeling the bioactivity upon multiple possible conformations.
SIU dataset was published this year in ICLR [2]. The authors directly compared the SIU dataset with PDBbind, which is the dataset used in ATOM3D [1] for LBA evaluation. Here are the reasons why we chose SIU instead of PDBbind in our paper.
- SIU is a much larger-scale benchmark As introduced in [2] (page 1, line 9-11): "... by introducing the SIU dataset-a million-scale Structural small molecule-protein Interaction dataset for Unbiased bioactivity prediction task, which is 50 times larger than the widely used PDBbind.".
Table 1. Comparison of PDBbind and SIU datasets from Table 7 in [2]
| Dataset | Pocket-molecule pairs | Avg. molecules per pocket | Unique pockets | Unique molecules |
|---|---|---|---|---|
| PDBbind | 19,443 | 1 | 19,443 | 19,443 |
| SIU | 1,312,827 | 137.6 | 9,544 | 214,686 |
- Experimental results support that SIU is more suitable for model training than PDBbind According to [2] (page 3, paragraph 4): "We compare the experimental results of training several classical baseline models on PDBbind and SIU. Two key findings highlight the outperformance of SIU over PDBbind ...".
- PDBbind has data issue As analyzed in [2] (page 2, paragraph 4): "From a data perspective, the constructed training data is not sufficient for developing a robust bioactivity predictor. Although previous works have utilized different training data, they are all derived from PDBbind, which contains only about 20,000 small molecule-protein target pairs. More importantly, for each protein target, these datasets typically feature only a single small-molecule ligand. This introduces bias into the training data...".
- Bioactivity prediction is a more challenging task due to diverse biological metrics As mentioned in [2] (page 1, paragraph 1): "In this context, 'bioactivity' encompasses the diverse biological effects resulting from small molecule-protein interactions, including binding responses-commonly quantified by the dissociation constant (Kd) and the inhibition constant (Ki)...".
Since SIU is the latest and more comprehensive dataset for bioactivity prediction task, it's reasonable for us to adopt SIU to validate our proposed methods. It's also a more robust benchmark for LBA. It has been shown that our methods achieve improvements in 43 out of 48 reported metrics across all models on [Kd/Ki]-[0.9/0.6] datasets, as summarized before.
Additional Experiment. To further address your concern, and in light of the limited hours remaining before the deadline, we conducted supplementary experiments to validate our method on a redocked PDBbind v2020 dataset (Kd only). For each PDB entry, the top three docking poses were used. The models were trained on the general subset (4,466 PDB entries, excluding all refined-set samples) and evaluated on the refined set (2,783 PDB entries). As a baseline, Vina scores were converted to pKd [3] and averaged over the top three poses. The results, presented in Table 2, show that our enhanced methods consistently outperform the baselines across all metrics, demonstrating their generalizability on the PDBbind dataset.
Table 2. Performance of LBA task on PDBBind v2020 (Kd)
| Method | RMSE | Pearson r | Kendal's tau | Avg. Imp (%) |
| AutoDock Vina | 2.488 | 0.261 | 0.269 | |
| GIGN | 1.428 | 0.216 | 0.105 | |
| DTIGN | 1.144 | 0.499 | 0.316 | |
| GIGN enhanced | 1.334 | 0.421 | 0.276 | 87.79% |
| DTIGN enhanced | 1.045 | 0.608 | 0.356 | 14.34% |
We hope our responses have resolved all remaining concerns. As the final deadline is approaching, we would deeply appreciate your consideration of a further score adjustment. Thank you again for your time and feedback.
References: [1] ATOM3D: Tasks On Molecules in Three Dimensions, NeurIPS, 2021
[2] Redefining the task of Bioactivity Prediction, ICLR, 2025
[3] A new paradigm for applying deep learning to protein–ligand interaction prediction, BIB, 2024
We hope our responses have resolved all remaining concerns. As the final deadline is approaching, we would deeply appreciate your consideration of a further score adjustment. Thank you again for your time and feedback.
This paper improves the bioactivity prediction task by considering spatial emptiness, union pockets and pairwise loss. The spatial emptiness aims to identify pockets and ligands atoms without sufficient contact. Union pockets combine pockets of different ligands to generate a more comprehensive view. Pairwise loss provides more precise supervision signal for bioactivity prediction. With these techniques, the authors have achieved substantial improvements on multiple benchmarks.
优缺点分析
Strengths:
- Pairwise loss and union pockets have incorporated important inductive bias into the bioactivity prediction task.
- The paper has achieved substantial improvement on multiple benchmarks and has conducted solid ablation tests.
Weaknesses:
- The authors did not compare their method with some strong baselines, for example, EHIGN that published on TPAMI.
- More experiments or explanation should be provided by claiming that GeoREC can model spatial emptiness.
问题
- Add more baseline methods: There are a lot of strong baseline models for bioactivity prediction, and it’s better to make a more comprehensive comparison. For example, physics-based methods like MM-PB/GBSA, ABFEP or deep learning methods like EHIGN.
- The implementation of GeoREC: For neighbors in each cone, are they distinguished by their directions? Is this direction information incorporated into Egeo? If not, what is the difference between GeoREC and neighborhood-based edge construction?
- Why can GeoREC model spatial emptiness but other methods cannot?
局限性
Yes
最终评判理由
This paper proposes a novel approach to bioactivity prediction through the use of spatial emptiness modeling (GeoREC), union pockets, and pairwise loss. These innovations aim to improve how protein–ligand interactions are represented and learned, and the method demonstrates strong performance across multiple benchmarks.
After reviewing the rebuttal and follow-up discussion, I find that the authors have meaningfully addressed my concerns:
Stronger Baselines: The authors have significantly expanded the evaluation by adding three high-quality baseline methods: EHIGN, SIGN, and MBP. These additions improve the completeness of the comparison and demonstrate consistent performance gains across a wide range of settings, with the proposed method outperforming baselines in the majority of cases.
Spatial Emptiness via GeoREC: The explanation of how GeoREC captures spatial emptiness through a cone-based geometric edge construction is both physically intuitive and technically sound. The notion of using the minimum distance in each cone as a proxy for unoccupied volume is clever and offers a novel inductive bias in molecular modeling.
Directionality and Angular Information: While I initially had concerns about the lack of explicit angular encoding, the authors clarified that their model does not aim to explicitly reconstruct angles, but rather relies on sufficient geometric and topological information already present in the graph structure. Their ablation studies show that adding explicit angle features does not improve performance and can even harm it, likely due to redundancy or added complexity. This clarification partially resolves my concern.
Overall, the authors have convincingly demonstrated the methodological soundness and empirical strength of their approach. The work introduces new ideas and inductive biases that are likely to benefit future research in protein–ligand modeling and bioactivity prediction. While some architectural and interpretability aspects could still benefit from further investigation, the paper is well-motivated, technically solid, and significantly strengthened by the rebuttal. I maintain my positive assessment and raise my rating to 5. Nevertheless, the scalability to deeper and larger neural networks like EquiFormer still needs further investigation.
格式问题
No
Thank you for your kind recognition of our work and for providing valuable suggestions. Our responses are as follows.
In response to Weakness 1
We have expanded our experiments to include additional competitive baselines: the explainable heterogeneous interaction graph neural network (EHIGN) [1], the structure-aware interactive graph neural network (SIGN) [2], and the multi-task bioassay pre-training (MBP) approach [3]. Results (attached below) show that the proposed method consistently improves performance over these models across most metrics (39/43) on a representative benchmark subset.
| Dataset / Label | Method | RMSE(↓) | Pearson(↑) | Tau(↑) |
|---|---|---|---|---|
| DTIGN I1/ IC50 | EHIGN | 1.215 | 0.146 | 0.052 |
| EHIGN Enhanced | 1.239 | 0.193 | 0.108 | |
| SIGN | 0.911 | 0.681 | 0.492 | |
| SIGN Enhanced | 0.909 | 0.712 | 0.539 | |
| MBP | 1.280 | 0.148 | 0.036 | |
| MBP Enhanced | 1.152 | 0.466 | 0.325 | |
| DTIGN I2 / IC50 | EHIGN | 1.089 | 0.042 | 0.011 |
| EHIGN Enhanced | 0.999 | 0.053 | 0.045 | |
| SIGN | 0.582 | 0.848 | 0.632 | |
| SIGN Enhanced | 0.569 | 0.848 | 0.641 | |
| MBP | 1.071 | 0.073 | 0.081 | |
| MBP Enhanced | 0.977 | 0.449 | 0.258 | |
| DTIGN E1 / EC50 | EHIGN | 0.994 | -0.050 | -0.026 |
| EHIGN Enhanced | 1.006 | 0.010 | -0.009 | |
| SIGN | 0.887 | 0.464 | 0.393 | |
| SIGN Enhanced | 0.861 | 0.506 | 0.446 | |
| MBP | 0.972 | 0.028 | 0.005 | |
| MBP Enhanced | 0.913 | 0.334 | 0.241 |
| Dataset / Label | Method | RMSE(↓) | MAE(↓) | Pearson(↑) | Spearman(↑) |
|---|---|---|---|---|---|
| SIU 0.6 / Kd | EHIGN | 1.404 | 1.180 | -0.015 | 0.020 |
| EHIGN Enhanced | 1.325 | 1.110 | 0.159 | 0.213 | |
| MBP | 1.693 | 1.418 | -0.195 | -0.189 | |
| MBP Enhanced | 1.406 | 1.194 | 0.082 | 0.041 | |
| SIU 0.6 / Ki | EHIGN | 1.450 | 1.222 | 0.118 | 0.106 |
| EHIGN Enhanced | 1.394 | 1.164 | 0.256 | 0.169 | |
| MBP | 1.748 | 1.483 | 0.304 | 0.253 | |
| MBP Enhanced | 1.699 | 1.434 | 0.317 | 0.225 |
In response to Weakness 2
GeoREC captures spatial emptiness implicitly via geometric edge construction. For each atom, the surrounding space is partitioned into cones of equal solid angle. Each cone connects the central atom to its nearest neighbor, and the distance to the nearest atom in each cone provides a lower bound on unoccupied volume in that direction.
Consider:
- A surrounding sphere of radius is partitioned into cones.
- Volume per cone:
- If the nearest atom in a cone lies at distance , the empty volume in that direction is at least:
Thus, longer distances imply more emptiness, and this grows with . GeoREC encodes the distance to the nearest atom in each cone, providing a compact, learnable, and physically meaningful proxy for local spatial emptiness.
In response to Question 1
Our proposed method is a general and plug-and-play module that can be seamlessly integrated into various GNN-based models for bioactivity prediction. However, it is not directly applicable to physics-based approaches such as MM-PB/GBSA or ABFEP, as these rely on fundamentally different simulation principles. In response to your suggestion of benchmarking against stronger baselines, we have further incorporated three representative deep learning models—EHIGN [1], SIGN [2], and MBP [3]—into our experimental evaluation. We evaluated them on both datasets used in our study. As shown in the table above, our method consistently improves the performance across these baselines, further demonstrating its effectiveness and generalizability.
In response to Question 2
In the current implementation of GeoREC, cones are not explicitly distinguished by their directions. However, the relative spatial orientation is largely captured by the dense network of intra- and inter-molecular edge lengths—including chemical bonds, protein–ligand interface edges, and the shortest atom distances sampled within each cone—which together encode the geometric relationships between atoms.
We also experimented with explicitly embedding directional angle ranges into the geometric edge features, but this reduced performance, as shown in our ablation results. Our hypothesis is that these explicit directional features introduce additional complexity, making it harder for the model to reconcile them with the directional cues already embedded in the graph through edge lengths. Moreover, since the union pocket structure is fixed and its graph contains sufficient edges to form triangles with common sides and vertices, angular information can be inferred from these edge-based geometric relationships. This makes explicit angle features not only redundant, but potentially disruptive to the model's ability to learn coherent spatial patterns.
As a result, GeoREC differs from conventional neighborhood-based edge construction by enforcing directional locality through conical constraints, while still relying on implicit directionality captured via edge geometry and connectivity in the 3D graph—without requiring explicit angular encodings.
| Method | Metric | I1 | I2 | E1 | Avg.imp(%) |
|---|---|---|---|---|---|
| DTIGN | RMSE(↓) | 1.1977 | 0.7952 | 0.9086 | - |
| Pearsonr(↑) | 0.3547 | 0.7128 | 0.3363 | - | |
| Tau(↑) | 0.2445 | 0.4922 | 0.2139 | - | |
| DTIGN enhanced | RMSE(↓) | 1.0364 | 0.6507 | 0.8645 | 12.16 |
| Pearsonr(↑) | 0.5895 | 0.7888 | 0.4969 | 41.54 | |
| Tau(↑) | 0.4239 | 0.5454 | 0.3705 | 52.46 | |
| DTIGN enhanced w/ angle ranges | RMSE(↓) | 1.1532 | 0.8363 | 0.9131 | -0.65 |
| Pearsonr(↑) | 0.4903 | 0.6280 | 0.3509 | 10.22 | |
| Tau(↑) | 0.3305 | 0.4015 | 0.2258 | 7.44 |
In response to Question 3
GeoREC is specifically designed to capture spatial emptiness by introducing geometric edges that connect a central atom to its nearest neighbor in each predefined directional cone. These edges encode the minimum distance required to reach another atom in that direction, thereby providing a lower bound on the unoccupied volume. This is a direct and localized way to represent spatial sparseness—something that standard graph construction methods do not account for.
In contrast, other methods typically rely on local neighborhoods constructed via chemical bonds or distance thresholds, and their receptive field grows only with the number of GNN layers. As a result, the model may completely miss distant atoms that lie just outside the local neighborhood but are crucial for estimating spatial emptiness in a particular direction. GeoREC addresses this limitation by ensuring that each directional cone is always represented, regardless of the connectivity of the initial graph.
Additionally, since the union pocket graph is fixed, it contains sufficient edges to form triangles with common sides and vertices, allowing the model to recover angular relationships implicitly through edge lengths and connectivity. This removes the need to explicitly encode angle features. Our ablation study shows that angle features degrade the mode performance—likely due to redundancy and additional complexity.
Taken together, GeoREC provides an efficient, geometry-aware design that captures directional unoccupied space, which is otherwise inaccessible to conventional methods relying solely on bonded or proximity-based edges.
References
[1] Interaction-based inductive bias in graph neural networks: enhancing protein-ligand binding affinity predictions from 3D structures, TPAMI, 2024
[2] Structure-aware Interactive Graph Neural Networks for the Prediction of Protein-Ligand Binding Affinity, KDD, 2021
[3] Multi-task bioassay pre-training for protein-ligand binding affinity prediction, BIB, 2024
Thank you to the authors for the thoughtful and thorough rebuttal. I appreciate the substantial effort put into addressing the feedback, particularly in conducting new experiments with stronger baselines such as EHIGN, SIGN, and MBP. These additions considerably strengthen the empirical evaluation and improve the completeness of the benchmarking.
Strengths of the Rebuttal: The inclusion of EHIGN, SIGN, and MBP in the evaluation is much appreciated. These results demonstrate the competitiveness and generalizability of the proposed method across a wider range of models and datasets. Also, as for ABFEP and MM-PB/GBSA, I just mean that you should add them as baselines but not apply your method to these baselines.
Clarification of Spatial Emptiness Modeling: The geometric construction approach using directional cones provides an interesting and compact proxy for spatial emptiness. The explanation of how this setup captures unoccupied volume is helpful and aligns well with the paper’s motivation.
Remaining Concerns: While I appreciate the attempt to encode directionality via edge lengths and intra/inter-molecular connectivity, the explanation remains somewhat ambiguous. Specifically, the claim that angular information is implicitly recoverable through edge-based triangle formations deserves a justification. It is mathematically possible, but hard to achieve by networks without any supervision signaling.
Conclusion: Overall, the rebuttal strengthens the paper. The added experiments are particularly valuable. However, I still find the explanation of directional encoding and its relation to performance somewhat not convincing, and therefore not entirely resolved. I will be maintaining my current score.
Thank you for the thoughtful comment. We apologize for the confusion caused by our wording. We did not mean that the network explicitly recovers the angular relationships. We just meant that it can extract this angular information and use it for prediction. Our experiment in the previous response demonstrated that encoding the angular features in the edges does not improve the prediction performance. In other words, the network is able to extract this information from other features. We apologize again for the confusion.
That makes sense to me. Thank you!
Thank you very much for your constructive comments. We believe your concerns have been fully addressed. We would greatly appreciate it if you could kindly consider increasing the score.
Yeah. I am raising my rating.
Thank you for raising the score and for your efforts in helping us improve the paper. We truly appreciate your thoughtful feedback and the time you dedicated to reviewing our work.
This paper proposes a method to improve bioactivity prediction by combining three components: a novel geometric representation (GeoREC) designed to model the "spatial emptiness" within a binding site; a unified "Union-Pocket" that provides a consistent global context for the protein; and a pairwise-enhanced loss function to better preserve the relative ranking of bioactivities. Experiments are performed on DTIGN and SIU dataset to valid performance.
优缺点分析
Strengths
-
The paper is generally well-written and clearly structured.
-
The central motivation of modeling "spatial emptiness" is conceptually interesting.
Weaknesses
-
The claim of novelty for the pairwise loss function is overstated. Ranking-based or pairwise losses are well-established in computational chemistry and drug discovery, particularly for binding affinity prediction and virtual screening tasks. [1][2]
-
The connection between the proposed GeoREC method and the concept of "spatial emptiness" is not sufficiently justified. The paper argues that GeoREC captures emptiness by connecting a central atom to the nearest neighboring atoms within predefined spatial cones. However, it is not immediately clear how these new edges, which connect existing atoms, directly represent or quantify the volume of "unoccupied space". A more rigorous explanation, perhaps with visualizations or a stronger biophysical argument, is needed to convince the reader that this geometric construction is a valid and effective proxy for spatial emptiness.
-
The experimental validation raises concerns about the generalizability and practical utility of the method. The primary DTIGN dataset consists of only 8 protein targets. This is a very small number and may not be representative of the vast chemical and structural space of protein targets.
-
The set of baseline models is somewhat limited. While including DTIGN and GIGN is appropriate, several more recent and powerful state-of-the-art geometric deep learning models for protein-ligand interaction are missing, such SIGN[3], GET[4], MBP[2], and so on.
[1]. A bioactivity foundation model using pairwise meta-learning, Nature Machine Intelligence, 2024
[2]. Multi-task bioassay pre-training for protein-ligand binding affinity prediction, Briefings in Bioinformatics, 2024
[3]. Structure-aware Interactive Graph Neural Networks for the Prediction of Protein-Ligand Binding Affinity, KDD, 2021
[4]. Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning, ICML, 2024
问题
Suggestions/Questions:
- The central motivation of the paper rests on the importance of "spatial emptiness" for predicting bioactivity. However, the intuition behind this needs further development. Could you please provide a more detailed explanation?
- Adding more powerful baselines, such as SIGN, GET, MBP, and so on.
- What is the main difference of the pairwise losses between GeoREC and previous work listed above?
- It's better to detail the advantages of predicting affinity based on docked conformations rather than co-crystal structures.
I'm willing to raise my socre is these suggestions/questions are solved.
局限性
Yes
最终评判理由
My concerns have been solved.
格式问题
None
Thank you for your interest in our work and your insightful suggestions. Below are our responses to your questions.
In response to Weakness 1
We appreciate the point and acknowledge that several related works have adopted ranking or pairwise loss. In our work, the pairwise loss complements the primary objective by explicitly modeling relative differences between samples. A detailed comparison with [1] and [2] is provided in the response to the corresponding question below.
While the core idea of employing pairwise loss is similar, the problem settings differ. We have reviewed related literature and cited several works in our original submission, including [3] and [4]. We will additionally cite these newly mentioned references and revise the manuscript to clarify our contribution more precisely.
In response to Weakness 2
GeoREC captures spatial emptiness implicitly via geometric edge construction. For each atom, the surrounding space is partitioned into cones of equal solid angle. Each cone connects the central atom to its nearest neighbor, and the distance to the nearest atom in each cone provides a lower bound on unoccupied volume in that direction.
Consider:
- A surrounding sphere of radius is partitioned into cones.
- Volume per cone:
- If the nearest atom in a cone lies at distance , the empty volume in that direction is at least:
Thus, longer distances imply more emptiness, and this grows with . GeoREC encodes the distance to the nearest atom in each cone, providing a compact and physically meaningful proxy for local spatial emptiness.
In response to Weakness 3
The two datasets used in this study, DTIGN and SIU, represent distinct scenarios in bioactivity learning. DTIGN includes 8 protein targets, with diverse and sufficient numbers of bioactivity assays (see paragraph 2, page 2, and Table 1 in [5]). In this case, models were trained independently for each target to evaluate target-specific performance. In contrast, the SIU dataset includes 1,720 protein targets (Figure 2, page 4 in [6]) and is designed to assess model generalizability across a wide chemical and structural space, where a single model is trained jointly on all targets. We now completed a comprehensive set of experiments on the SIU dataset. The results show that our method consistently improves performance across multiple metrics, demonstrating that our method enhances baseline models even in large-scale, diverse bioactivity prediction tasks.
| Metrics | RMSE | MAE | Pearson r | Spearman r |
|---|---|---|---|---|
| Baseline avg | 1.667 | 1.368 | 0.180 | 0.152 |
| Enhanced avg | 1.429 | 1.163 | 0.336 | 0.308 |
| Dataset/label | Method | RMSE(↓) | MAE(↓) | Pearson(↑) | Spearman(↑) |
|---|---|---|---|---|---|
| SIU0.9/Kd | Unimol | 1.364 | 1.141 | -0.033 | -0.082 |
| DTIGN | 1.839 | 1.490 | -0.001 | -0.042 | |
| DTIGN enhanced | 1.304 | 1.060 | 0.321 | 0.326 | |
| GIGN | 1.708 | 1.367 | 0.070 | 0.038 | |
| GIGN enhanced | 1.455 | 1.139 | 0.296 | 0.261 | |
| GAT | 1.545 | 1.240 | 0.092 | 0.082 | |
| GAT enhanced | 1.473 | 1.166 | 0.261 | 0.254 | |
| SIU0.9/Ki | Unimol | 1.235 | 1.017 | 0.485 | 0.452 |
| DTIGN | 1.607 | 1.276 | 0.360 | 0.329 | |
| DTIGN enhanced | 1.296 | 1.054 | 0.485 | 0.441 | |
| GIGN | 1.597 | 1.337 | 0.223 | 0.167 | |
| GIGN enhanced | 1.487 | 1.240 | 0.371 | 0.338 | |
| GAT | 1.706 | 1.386 | 0.301 | 0.262 | |
| GAT enhanced | 1.625 | 1.339 | 0.316 | 0.294 | |
| SIU0.6/Kd | Unimol | 1.389 | 1.192 | -0.149 | -0.206 |
| GIGN | 1.371 | 1.115 | 0.265 | 0.281 | |
| GIGN enhanced | 1.326 | 1.078 | 0.280 | 0.227 | |
| DTIGN | 1.349 | 1.079 | 0.329 | 0.329 | |
| DTIGN enhanced | 1.332 | 1.069 | 0.327 | 0.248 | |
| GAT | 1.521 | 1.285 | 0.233 | 0.157 | |
| GAT enhanced | 1.424 | 1.182 | 0.107 | 0.133 | |
| SIU0.6/Ki | Unimol | 1.255 | 1.034 | 0.472 | 0.452 |
| GIGN | 1.789 | 1.503 | 0.225 | 0.230 | |
| GIGN enhanced | 1.404 | 1.165 | 0.498 | 0.463 | |
| DTIGN | 1.993 | 1.653 | 0.123 | 0.091 | |
| DTIGN enhanced | 1.321 | 1.079 | 0.472 | 0.452 | |
| GAT | 1.976 | 1.690 | -0.060 | -0.096 | |
| GAT enhanced | 1.694 | 1.381 | 0.303 | 0.263 |
In response to Weakness 4
We have expanded our experiments to include additional competitive baselines: EHIGN [7], SIGN [8], and MBP [2]. Results (attached below) show that the proposed method consistently improves performance over these models across most metrics (39/43) on a representative benchmark subset.
| Dataset / Label | Method | RMSE(↓) | Pearson(↑) | Tau(↑) |
|---|---|---|---|---|
| DTIGN I1/ IC50 | EHIGN | 1.215 | 0.146 | 0.052 |
| EHIGN Enhanced | 1.239 | 0.193 | 0.108 | |
| SIGN | 0.911 | 0.681 | 0.492 | |
| SIGN Enhanced | 0.909 | 0.712 | 0.539 | |
| MBP | 1.280 | 0.148 | 0.036 | |
| MBP Enhanced | 1.152 | 0.466 | 0.325 | |
| DTIGN I2 / IC50 | EHIGN | 1.089 | 0.042 | 0.011 |
| EHIGN Enhanced | 0.999 | 0.053 | 0.045 | |
| SIGN | 0.582 | 0.848 | 0.632 | |
| SIGN Enhanced | 0.569 | 0.848 | 0.641 | |
| MBP | 1.071 | 0.073 | 0.081 | |
| MBP Enhanced | 0.977 | 0.449 | 0.258 | |
| DTIGN E1 / EC50 | EHIGN | 0.994 | -0.050 | -0.026 |
| EHIGN Enhanced | 1.006 | 0.010 | -0.009 | |
| SIGN | 0.887 | 0.464 | 0.393 | |
| SIGN Enhanced | 0.861 | 0.506 | 0.446 | |
| MBP | 0.972 | 0.028 | 0.005 | |
| MBP Enhanced | 0.913 | 0.334 | 0.241 |
| Dataset / Label | Method | RMSE(↓) | MAE(↓) | Pearson(↑) | Spearman(↑) |
|---|---|---|---|---|---|
| SIU 0.6 / Kd | EHIGN | 1.404 | 1.180 | -0.015 | 0.020 |
| EHIGN Enhanced | 1.325 | 1.110 | 0.159 | 0.213 | |
| MBP | 1.693 | 1.418 | -0.195 | -0.189 | |
| MBP Enhanced | 1.406 | 1.194 | 0.082 | 0.041 | |
| SIU 0.6 / Ki | EHIGN | 1.450 | 1.222 | 0.118 | 0.106 |
| EHIGN Enhanced | 1.394 | 1.164 | 0.256 | 0.169 | |
| MBP | 1.748 | 1.483 | 0.304 | 0.253 | |
| MBP Enhanced | 1.699 | 1.434 | 0.317 | 0.225 |
In response to Question 1
The motivation for using spatial emptiness comes from the blocking mechanism: ligands that fully occupy key regions of a binding site can outcompete endogenous substrates and more effectively modulate protein function. Bioactivity depends not only on where atoms are present, but also on where critical voids remain. Incomplete occupancy may leave space for substrate binding, reducing inhibition. By modeling spatial emptiness, we capture how well a ligand blocks substrate access—offering a meaningful proxy for bioactivity.
In response to Question 2
We have added more powerful baselines include SIGN, MBP and EHIGN. See tables above.
In response to Question 3
Here are more details about the difference between ours and other 2 works.
[1] In this work, they use a BCE as their ranking loss. We summarize the difference between their ranking loss and our pairwise loss in the table below:
| Aspect | Loss in [1] | Loss in our work |
|---|---|---|
| Objective | Learns relative order (which is larger/smaller) | Learns relative difference (how much larger/smaller) |
| Loss type | Binary classification | Regression |
| Noise Robustness | More robust to absolute value noise | Focuses on precise difference alignment |
[2] In this work, the pairwise loss enables meta-learning for few-shot adaptation. E.g., ligand bioactivity data from the same assay are inherently comparable, allowing the model to learn relative differences even with limited data. This loss also guides the model to refine predictions on new assays using only a few samples. In our work, the pairwise loss is a supplementary term in a hybrid loss for a regression task. Here is a table for the summary:
| Aspect | Loss in [2] | Loss in our work |
|---|---|---|
| Objective | Enable few-shot adaptation via meta-learning | Improve static prediction via relative order |
| Calculation scope | Task-specific subsets | All samples in a batch |
| Typical Use Case | Low-data,cross-assay adaptation | General bioactivity prediction |
| Loss Integration | Core to meta-learning loops | Hybrid loss with MSE |
In response to Question 4
Predicting affinity from docked conformations offers key advantages over using co-crystal structures. Co-crystal structures are scarce and costly to obtain, especially for novel targets, whereas docking provides scalable access to approximate protein–ligand complexes. Most of our training data lack co-crystal structures, so using docked conformations enables broader applicability. Moreover, deep learning can be designed to denoise and learn meaningful patterns, making our approach practical and robust for real-world tasks like virtual screening.
References
[1] A bioactivity foundation model using pairwise meta-learning, Nature Machine Intelligence, 2024
[2] Multi-task bioassay pre-training for protein-ligand binding affinity prediction, Briefings in Bioinformatics, 2024
[3] Gradient Aligned Regression via Pairwise Losses, arXiv preprint, 2024
[4] Pairwise difference regression: a machine learning meta-algorithm for improved prediction and uncertainty quantification in chemical search, JCIM, 2021
[5] Advancing bioactivity prediction through molecular docking and self-attention, JBHI, 2024
[6] Redefining the task of Bioactivity Prediction, ICLR, 2025
[7] Interaction-based inductive bias in graph neural networks: enhancing protein-ligand binding affinity predictions from 3D structures, TPAMI, 2024
[8] Structure-aware Interactive Graph Neural Networks for the Prediction of Protein-Ligand Binding Affinity, KDD, 2021
[9] Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning, ICML, 2024
Thank you for the detailed response. Some of my questions have been solved. I'll raise the score to 3.
The left concern is the advantages of predicting affinity based on docked conformations (Question 4.)
Indeed, it is a common practice for traditional docking methods to predict affinity on docked conformations, such as vina, smina.
For me, performing experiments on the virtual screening task (also mentioned by author's response) and demonstrating GeoREC's capability on this task will significantly convince me and improve the practical value of this work.
Ligand binding affinity (LBA), typically quantified by the dissociation constant (), is included in the SIU dataset and serves as a relevant benchmark for evaluating virtual screening performance on docked conformations.
To assess our method in this context, we analyzed prediction performance on the SIU0.9 and SIU0.6 subsets, which are large-scale and structurally diverse datasets (as summarized in Table 1 below), sharing the same test set but differing in training data: proteins in SIU0.9 have sequence identity with the test set, while those in SIU0.6 have .
Our results show consistent performance gains:
- RMSE improved by 11.13%
- MAE improved by 11.55%
- Pearson correlation improved by 135.48%
- Spearman correlation improved by 151.83%
averaged across models and datasets, as summarized in Table 2 below.
As shown in Table 3 below, out of 24 reported metrics (RMSE, MAE, Pearson, Spearman across all models and both subsets), 19 showed improvements with our method.
Additionally, we evaluated the Vina scoring function on the SIU test set (same for both SIU0.9 and SIU0.6). For each protein-ligand pair, multiple docked conformations were scored using AutoDock Vina. These scores were converted to predicted dissociation constants using the following standard transformation:
where is the binding free energy reported by Vina (in kcal/mol), is the gas constant ( kcal/mol·K), and is the temperature (assumed to be the standard state condition, i.e., 298 K [1]). For consistency, we used as the prediction target and averaged the values across all conformations for each pair.
We found that the models enhanced by our method significantly outperform the Vina baseline across all error and correlation metrics. These findings reinforce the practical value of our approach in affinity prediction and virtual screening workflows, demonstrating superior predictive performance even when relying solely on docked poses.
Furthermore, we would like to emphasize that, unlike traditional docking methods which score only a single binding conformation, our approach predicts experimental affinity by integrating information from multiple plausible docking conformations. This multi-conformation strategy aligns with practices adopted in recent datasets and better reflects real-world scenarios where various binding poses collectively influence the observed affinity.
We sincerely appreciate your reconsideration and the score increase. We hope that the additional analyses and clarifications have addressed all your concerns. We would be truly grateful if you would consider a more positive reassessment beyond borderline reject.
Table 1. Information of SIU0.9 and SIU0.6 () datasets
| Dataset | # Unique protein | # Unique ligand | # Protein-ligand complex | # Ligand conformations |
|---|---|---|---|---|
| SIU0.6 / | 1216 | 1860 | 17509 | 70674 |
| SIU0.9 / | 4211 | 5201 | 54570 | 216602 |
Table 2. Summary of Average Improvements
| RMSE() | MAE() | Pearson() | Spearman() | |
|---|---|---|---|---|
| Baseline avg for | 1.554 | 1.272 | 0.097 | 0.085 |
| Enhanced avg for | 1.381 | 1.125 | 0.229 | 0.213 |
| Percentage improvement | 11.13% | 11.55% | 135.48% | 151.83% |
Table 3. Per-Model Performance on SIU0.9 and SIU0.6 ()
| Dataset/Label | Method | RMSE() | MAE() | Pearson() | Spearman() |
|---|---|---|---|---|---|
| SIU0.9 / | Vina | 2.018 | 1.564 | 0.120 | 0.127 |
| Unimol | 1.364 | 1.141 | -0.033 | -0.082 | |
| DTIGN | 1.839 | 1.490 | -0.001 | -0.042 | |
| DTIGN enhanced | 1.304 | 1.060 | 0.321 | 0.326 | |
| GIGN | 1.708 | 1.367 | 0.070 | 0.038 | |
| GIGN enhanced | 1.455 | 1.139 | 0.296 | 0.261 | |
| GAT | 1.545 | 1.240 | 0.092 | 0.082 | |
| GAT enhanced | 1.473 | 1.166 | 0.261 | 0.254 | |
| SIU0.6 / | Vina | 2.018 | 1.564 | 0.120 | 0.127 |
| Unimol | 1.389 | 1.192 | -0.149 | -0.206 | |
| GIGN | 1.371 | 1.115 | 0.265 | 0.281 | |
| GIGN enhanced | 1.326 | 1.078 | 0.280 | 0.227 | |
| DTIGN | 1.349 | 1.079 | 0.329 | 0.329 | |
| DTIGN enhanced | 1.332 | 1.069 | 0.327 | 0.248 | |
| GAT | 1.521 | 1.285 | 0.233 | 0.157 | |
| GAT enhanced | 1.424 | 1.182 | 0.107 | 0.133 |
References
[1] Receptor binding thermodynamics as a tool for linking drug efficacy and affinity, Il Farmaco, 1998
Thank you for the rebuttal, i will raise my score to 4.
Thank you for raising the score and for your efforts in helping us improve the paper. We truly appreciate your thoughtful feedback and the time you dedicated to reviewing our work.
This paper proposes a new approach to structure-based bioactivity prediction by incorporating spatial information beyond traditional binding interactions. The authors introduce GeoREC, a method that models the unoccupied space around atoms in a protein-ligand complex, and Union-Pocket, which combines multiple pockets from the same protein to provide a consistent structural context across ligands. A hybrid loss function is also used to improve the preservation of bioactivity rankings. The method shows improved performance over some baselines on the DTIGN and SIU datasets, and an ablation study confirms that each component contributes to the overall gain.
优缺点分析
Strength
- The model introduces three well-defined components—GeoREC, Union-Pocket, and hybrid loss—and each is shown through ablation to provide a consistent improvement.
- The paper is generally well-structured and clearly presented.
- This research focuses on a core issue in the field of computational drug discovery: accurately predicting bioactivity based on protein-ligand complex structures.
Weakness
- While the method performs well on certain SIU subsets, the results for Ki 0.9, Ki 0.6, and Kd 0.6 in the appendix are clearly below state-of-the-art levels. This raises concerns about the model's generalizability across different label types and dataset settings.
- The paper includes only a limited set of baselines. The lack of comparison with more diverse or competitive methods makes it difficult to judge how strong the proposed model is in a broader context.
- The abstract implies that stronger binding directly translates to stronger bioactivity. This is scientifically inaccurate. While binding affinity is a necessary component of bioactivity, it is not sufficient—many ligands bind tightly without eliciting a biological response, or even elicit off-target effects.
- The figures, particularly Figure 1, are not visually effective. Fonts are too small, and the graphical elements are not intuitive. Given that the method is fundamentally geometric, better visualizations are essential.
- There are some writing and LaTeX formatting issues. For example, terms like IC₅₀ and EC₅₀ are incorrectly written in italics or math mode. Similarly, the Pearson correlation coefficient should be denoted as lowercase r, not uppercase R.
问题
- Union-Pocket appears effective on datasets like DTIGN and SIU, where multiple pockets per protein are available. However, how does the method apply to standard benchmarks like PDBbind [1], where proteins typically have only one known binding site? Can the authors comment on the generalizability of their approach in such single-pocket settings?
- What specific criteria were used to define individual pockets before forming Union-Pocket? The effectiveness of Union-Pocket depends on how the individual pockets are initially defined. Could the authors clarify how pockets are constructed during data preprocessing—for example, what criteria or radius are used? Since these pocket definitions may rely on heuristic rules or docking-based priors, to what extent does the Union-Pocket strategy introduce new information, rather than simply aggregating existing sampling bias?
- While the three proposed components—GeoREC, Union-Pocket, and hybrid loss—are each shown to improve performance, the conceptual motivation for combining them is not clearly articulated. What is the biological or modeling rationale for combining GeoREC, Union-Pocket, and the pairwise loss in a single framework? Were they motivated together, or added independently for empirical gain?
[1] Wang, Renxiao, et al. "The PDBbind database: Collection of binding affinities for protein− ligand complexes with known three-dimensional structures." Journal of medicinal chemistry 47.12 (2004): 2977-2980.
局限性
Yes. The authors acknowledge certain limitations in Appendix A.9, specifically the potential increase in computational cost due to the use of Union-Pocket.
最终评判理由
After considering the rebuttal, additional experiments, and clarifications, I am increasing my score to 4. The paper introduces a meaningful geometric perspective for structure-based bioactivity prediction (GeoREC and Union-Pocket) and a hybrid loss formulation, and the expanded evaluation now includes more competitive baselines, showing consistent improvements across most metrics. The authors clarified the applicability to single-pocket benchmarks, defined pocket construction explicitly (5 Å criterion), and revised the abstract to avoid conflating binding affinity with bioactivity, addressing several key concerns. However, some issues remain partially resolved: certain SIU subsets still show modest gains, suggesting sensitivity to label types; Union-Pocket depends on docking-derived priors, with no sensitivity analysis; the conceptual motivation for combining all components is still somewhat empirical; and generalizability to datasets like PDBbind is argued but not demonstrated experimentally. Overall, the work presents a technically solid and novel approach with improved clarity and broader evaluation after rebuttal, but residual concerns about robustness, scalability, and empirical breadth prevent a higher recommendation, leading to a borderline accept (score 4).
格式问题
N/A
Thank you for your comments. We sincerely appreciate your recognition of the significance of our work and the paper writing. Here are the responses to the comments.
In response to Weakness 1
We have completed comprehensive experiments across all SIU subsets for both Kd and Ki tasks, evaluating three baseline models (GAT, GIGN, DTIGN) and their enhanced versions. Our results demonstrate that the enhanced models outperform their respective baselines in 43 out of 48 dataset-metric combinations and achieve state-of-the-art performance on 10 out of 16 dataset-level metrics. We also note that the SIU benchmark, introduced in [1], is a rigorous and up-to-date standard for evaluating ligand binding affinity. We believe these stable and competitive results across varied settings demonstrate the robustness and generalizability of our method.
| Dataset/label | Method | RMSE(↓) | MAE(↓) | Pearson(↑) | Spearman(↑) |
|---|---|---|---|---|---|
| SIU0.9/Kd | Unimol | 1.364 | 1.141 | -0.033 | -0.082 |
| DTIGN | 1.839 | 1.490 | -0.001 | -0.042 | |
| DTIGN enhanced | 1.304 | 1.060 | 0.321 | 0.326 | |
| GIGN | 1.708 | 1.367 | 0.070 | 0.038 | |
| GIGN enhanced | 1.455 | 1.139 | 0.296 | 0.261 | |
| GAT | 1.545 | 1.240 | 0.092 | 0.082 | |
| GAT enhanced | 1.473 | 1.166 | 0.261 | 0.254 | |
| SIU0.9/Ki | Unimol | 1.235 | 1.017 | 0.485 | 0.452 |
| DTIGN | 1.607 | 1.276 | 0.360 | 0.329 | |
| DTIGN enhanced | 1.296 | 1.054 | 0.485 | 0.441 | |
| GIGN | 1.597 | 1.337 | 0.223 | 0.167 | |
| GIGN enhanced | 1.487 | 1.240 | 0.371 | 0.338 | |
| GAT | 1.706 | 1.386 | 0.301 | 0.262 | |
| GAT enhanced | 1.625 | 1.339 | 0.316 | 0.294 | |
| SIU0.6/Kd | Unimol | 1.389 | 1.192 | -0.149 | -0.206 |
| GIGN | 1.371 | 1.115 | 0.265 | 0.281 | |
| GIGN enhanced | 1.326 | 1.078 | 0.280 | 0.227 | |
| DTIGN | 1.349 | 1.079 | 0.329 | 0.329 | |
| DTIGN enhanced | 1.332 | 1.069 | 0.327 | 0.248 | |
| GAT | 1.521 | 1.285 | 0.233 | 0.157 | |
| GAT enhanced | 1.424 | 1.182 | 0.107 | 0.133 | |
| SIU0.6/Ki | Unimol | 1.255 | 1.034 | 0.472 | 0.452 |
| GIGN | 1.789 | 1.503 | 0.225 | 0.230 | |
| GIGN enhanced | 1.404 | 1.165 | 0.498 | 0.463 | |
| DTIGN | 1.993 | 1.653 | 0.123 | 0.091 | |
| DTIGN enhanced | 1.321 | 1.079 | 0.472 | 0.452 | |
| GAT | 1.976 | 1.690 | -0.060 | -0.096 | |
| GAT enhanced | 1.694 | 1.381 | 0.303 | 0.263 |
In response to Weakness 2
We have expanded our experiments to include additional competitive baselines: the explainable heterogeneous interaction graph neural network (EHIGN) [2], the structure-aware interactive graph neural network (SIGN) [3], and the multi-task bioassay pre-training (MBP) approach [4]. Results (attached below) show that the proposed method consistently improves performance over these models across most metrics (39/43) on a representative benchmark subset.
Together with the existing GAT, GIGN, and DTIGN, these additional methods form a diverse and competitive baseline set that includes heterogeneous, attention-based, and multi-task learning approaches. This broader comparison provides a more comprehensive assessment of the proposed model's effectiveness.
| Dataset / Label | Method | RMSE(↓) | Pearson(↑) | Tau(↑) |
|---|---|---|---|---|
| DTIGN I1/ IC50 | EHIGN | 1.215 | 0.146 | 0.052 |
| EHIGN Enhanced | 1.239 | 0.193 | 0.108 | |
| SIGN | 0.911 | 0.681 | 0.492 | |
| SIGN Enhanced | 0.909 | 0.712 | 0.539 | |
| MBP | 1.280 | 0.148 | 0.036 | |
| MBP Enhanced | 1.152 | 0.466 | 0.325 | |
| DTIGN I2 / IC50 | EHIGN | 1.089 | 0.042 | 0.011 |
| EHIGN Enhanced | 0.999 | 0.053 | 0.045 | |
| SIGN | 0.582 | 0.848 | 0.632 | |
| SIGN Enhanced | 0.569 | 0.848 | 0.641 | |
| MBP | 1.071 | 0.073 | 0.081 | |
| MBP Enhanced | 0.977 | 0.449 | 0.258 | |
| DTIGN E1 / EC50 | EHIGN | 0.994 | -0.050 | -0.026 |
| EHIGN Enhanced | 1.006 | 0.010 | -0.009 | |
| SIGN | 0.887 | 0.464 | 0.393 | |
| SIGN Enhanced | 0.861 | 0.506 | 0.446 | |
| MBP | 0.972 | 0.028 | 0.005 | |
| MBP Enhanced | 0.913 | 0.334 | 0.241 |
| Dataset / Label | Method | RMSE(↓) | MAE(↓) | Pearson(↑) | Spearman(↑) |
|---|---|---|---|---|---|
| SIU 0.6 / Kd | EHIGN | 1.404 | 1.180 | -0.015 | 0.020 |
| EHIGN Enhanced | 1.325 | 1.110 | 0.159 | 0.213 | |
| MBP | 1.693 | 1.418 | -0.195 | -0.189 | |
| MBP Enhanced | 1.406 | 1.194 | 0.082 | 0.041 | |
| SIU 0.6 / Ki | EHIGN | 1.450 | 1.222 | 0.118 | 0.106 |
| EHIGN Enhanced | 1.394 | 1.164 | 0.256 | 0.169 | |
| MBP | 1.748 | 1.483 | 0.304 | 0.253 | |
| MBP Enhanced | 1.699 | 1.434 | 0.317 | 0.225 |
In response to Weakness 3
We fully agree that strong binding affinity alone does not guarantee bioactivity. While binding is a prerequisite for bioactivity, it is not sufficient on its own—many tightly binding ligands fail to produce the desired biological effect. We will revise the abstract as follows to more accurately reflect this distinction.
Abstract: Predicting the bioactivity of candidate ligands remains a central challenge in drug discovery. Ligands and endogenous substrates often compete for the same binding sites on target proteins, and the extent to which a ligand can modulate protein function depends not only on its binding but also on how effectively it occupies the relevant pocket. However, most existing methods focus narrowly on local interactions within protein–ligand complexes and neglect spatial emptiness—the unoccupied regions within the binding site that may permit endogenous molecules to engage or interfere. Such unfilled space can diminish the ligand’s functional impact, regardless of binding affinity. In this paper, ... (The rest remains unchanged.)
In response to Weakness 4
Due to submission constraints, we are unable to upload revised figures here. However, we will improve the figures in the final version by increasing font sizes and making the graphical elements more intuitive to better convey the geometric nature of our method.
In response to Weakness 5
We will correct the formatting issues in the final version by using the proper LaTeX conventions—ensuring that terms like IC₅₀ and EC₅₀ are written in non-italic text and that the Pearson correlation coefficient is correctly denoted as lowercase .
In response to Question 1
The concept of Union-Pocket is indeed applicable to single-pocket settings like those in PDBbind. In our method, we consider not just the local pocket surrounding a specific ligand pose, but the union binding region estimated by all ligand poses docked around known active sites. This includes all possible functional active sites and their surrounding environments that could contribute to bioactivity upon ligand binding. In single-pocket benchmarks, if the pocket is defined broadly enough—e.g., by enlarging the docking box—it effectively serves the same role as a Union-Pocket. Thus, our method remains generalizable and effective in such settings.
In response to Question 2
Individual pockets are defined as the set of protein residues with at least one atom located within 5 Å of any ligand atom [5-7]. This definition follows a commonly used proximity-based criterion that reflects potential physical interactions.
The Union-Pocket strategy introduces two key types of new information beyond simply aggregating existing docking poses. First, instead of using one large docking box that covers the entire protein surface, we define multiple smaller docking boxes around each cluster of functional sites. This approach forces ligands to explore and generate binding poses specifically within these localized regions, reducing sampling bias and ensuring more thorough and uniform coverage of all potential binding sites. Second, by taking the union of all such individual pockets, the resulting Union-Pocket–ligand graph captures the global spatial context of the ligand, including its relative location and the overall geometry of empty regions around potential binding sites. This holistic view of binding topography is not available in any single pose or local pocket, thereby providing richer structural information for bioactivity prediction.
In response to Question 3
The proposed GeoREC and Union-Pocket were motivated together because the Union-Pocket provides the global geometry of protein-ligand interaction graphs. The pairwise loss is an add-on to let the model perceive both the relative error (direction-based) and the absolute error (distance-based) between its predictions and labels.
References
[1] Redefining the task of Bioactivity Prediction, ICLR, 2025
[2] Interaction-based inductive bias in graph neural networks: enhancing protein-ligand binding affinity predictions from 3D structures, TPAMI, 2024
[3] Structure-aware Interactive Graph Neural Networks for the Prediction of Protein-Ligand Binding Affinity, KDD, 2021
[4] Multi-task bioassay pre-training for protein-ligand binding affinity prediction, BIB, 2024
[5] Geometric interaction graph neural network for predicting protein–ligand binding affinities from 3d structures (GIGN), Journal of physical chemistry letters, 2023
[6] Advancing bioactivity prediction through molecular docking and self-attention, JBHI, 2024
[7] SiteFerret: beyond simple pocket identification in proteins, JCTC, 2023
I thank the authors for their detailed rebuttal and additional experiments, which address several of my earlier concerns. The inclusion of more competitive baselines strengthens the evaluation and provides a clearer view of the model’s relative performance. Clarifications on the applicability of Union-Pocket to single-pocket datasets and the 5 Å pocket definition help explain the generalizability and rationale of the method. The revised abstract more accurately reflects the relationship between binding affinity and bioactivity. However, some limitations remain: certain SIU subsets still show only modest improvements. Overall, the rebuttal improves the clarity and robustness of the work, and I am raising my score to 4.
Thank you very much for your appreciation and for raising the score to 4. We agree that the proposed method shows modest improvements in some comparisons, which can be attributed to the inherent complexity of biological systems. However, overall, the method demonstrates substantial improvements. To further illustrate this point, we calculated the average improvement in percentage of each metric over the GNN-based methods on the Kd (0.6/0.9) datasets, the Ki (0.6/0.9) datasets, and across all datasets. The results are summarized in the tables below, showing that the average improvements achieved by our method are substantial.
| RMSE(↓) | MAE(↓) | Pearson(↑) | Spearman(↑) | |
|---|---|---|---|---|
| Baseline avg for Kd | 1.554 | 1.272 | 0.097 | 0.085 |
| Enhanced avg for Kd | 1.381 | 1.125 | 0.229 | 0.213 |
| Percentage of improvement | 11.13% | 11.55% | 135.48% | 151.83% |
| RMSE(↓) | MAE(↓) | Pearson(↑) | Spearman(↑) | |
|---|---|---|---|---|
| Baseline avg for Ki | 1.733 | 1.444 | 0.199 | 0.168 |
| Enhanced avg for Ki | 1.490 | 1.232 | 0.377 | 0.331 |
| Percentage of improvement | 14.02% | 14.66% | 89.34% | 97.22% |
| RMSE(↓) | MAE(↓) | Pearson(↑) | Spearman(↑) | |
|---|---|---|---|---|
| Baseline avg for all datasets | 1.643 | 1.358 | 0.148 | 0.126 |
| Enhanced avg for all datasets | 1.435 | 1.178 | 0.303 | 0.272 |
| Percentage of improvement | 12.65% | 13.20% | 104.47% | 115.52% |
Thank you for the additional clarifications and detailed performance analysis. While the results are encouraging, my evaluation is based on overall performance rather than the Kd (0.6/0.9) and Ki (0.6/0.9) datasets specifically, so I will maintain my score of 4.
Thank you for your clarification and for the time you dedicated to reviewing our work. We appreciate your thoughtful feedback and the opportunity to address your comments, which helped us improve the paper.
This paper presents a method for structure-based bioactivity prediction that integrates spatial details extending beyond conventional binding interactions. The authors develop GeoREC, a technique that represents the unoccupied space surrounding atoms in a protein-ligand complex, along with Union-Pocket, a strategy for merging multiple pockets from a single protein to ensure uniform structural context across different ligands. A hybrid loss function is employed to better maintain bioactivity rankings. The approach yields enhanced results compared to certain baseline models on the DTIGN and SIU datasets, and an ablation study verifies that every element contributes to the overall improvement.
The authors' compelling rebuttal successfully addressed the reviewers' most concerns. Please include reviewers' comments into the final version.