PaperHub
5.5
/10
Rejected4 位审稿人
最低5最高6标准差0.5
6
5
5
6
3.8
置信度
ICLR 2024

Bio-RFX: Refining Biomedical Extraction via Advanced Relation Classification and Structural Constraints

OpenReviewPDF
提交: 2023-09-23更新: 2024-02-11
TL;DR

We propose a novel biomedical entity and relation extraction method, Bio-RFX, by classifying fine-grained relations at sentence level and exploiting the strong structural constraints for relation triplets in the textual corpus.

摘要

关键词
Named Entity RecognitionRelation ExtractionBiomedical Literature

评审与讨论

审稿意见
6

This paper proposes a novel method for entity relation extraction from text, where relation types are detected first, and then entities (or arguments) are detected later. For entities, the entity candidates and their number for each relation type are detected using two types of question-answering framework, and the entities are extracted by filtering the entity candidates considering the number of entities. The approach shows the best performance among compared methods for the named entity recognition and relation extraction tasks on three biomedical data sets in the full data and low resource settings, except for relation extraction on one data set in the full data set setting. Ablation studies show the usefulness of the structure constraints and number prediction.

优点

  • The approach to first detect relation types and then entities is novel
  • The paper is well-written and easy to follow. Figure 2 is helpful to grasp the overall framework.
  • The results on three datasets show high performance in both full data and low resource settings, and the ablation study shows the usefulness of the proposed enhancements.

缺点

  • The approach could be generally applied to other domains, but the approach is presented as a model for the biomedical domain, and the scope is limited.
  • The comparison with existing state-of-the-art entity relation models is limited. The authors presented several approaches like OneRel and SPN in the related work section, and there are several SOTA models for entity relation tasks as follows, but the comparison is not performed.
    • Pere-Lluís Huguet Cabot and Roberto Navigli. 2021. REBEL: Relation Extraction By End-to-end Language generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2370–2381, Punta Cana, Dominican Republic. Association for Computational Linguistics.
    • Chenguang Wang, Xiao Liu, Zui Chen, Haoyun Hong, Jie Tang, and Dawn Song. 2022. DeepStruct: Pretraining of Language Models for Structure Prediction. In Findings of the Association for Computational Linguistics: ACL 2022, pages 803–823, Dublin, Ireland. Association for Computational Linguistics.
    • Deming Ye, Yankai Lin, Peng Li, and Maosong Sun. 2022. Packed Levitated Marker for Entity and Relation Extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4904–4917, Dublin, Ireland. Association for Computational Linguistics.

问题

  • Please see the weaknesses above.
  • How is the model specific to the biomedical domain?
  • Appendix A shows several comparisons of prompting using domain resources, but the prompts used in practice are not clear. For the activator relation type, the authors use the prompt including activate (not activator), so is question generation done manually? They also say, "Note that it is a relatively simple approach" in explaining questions, but do they use any other complicated approach in practice? Or is this simple approach the best?
评论

We appreciate your constructive feedback, which has been instrumental in enhancing our paper. We have addressed all the points raised and have detailed our responses below.

Weaknesses

The approach could be generally applied to other domains, but the approach is presented as a model for the biomedical domain, and the scope is limited.

This approach is designed to tackle specific issues in biomedical texts. The relation-first approach is based on the high prevalence of ambiguous terms in biomedical literature and serves as a hint for the entity type. The implementation of structural constraints is informed by domain-specific knowledge. For instance, the relation activator can only occur between a chemical and a gene. On the other hand, the relation located_in can occur between sports_team and city, company and country, city and country, among others, thus offering only a very weak constraint. We will explore the potential of applying our approach to other domains in our future research.

The comparison with existing state-of-the-art entity relation models is limited. The authors presented several approaches like OneRel and SPN in the related work section, and there are several SOTA models for entity relation tasks as follows, but the comparison is not performed.

Inspired by your constructive comments, we have updated the related work section in the revised version of the paper. We will further conduct comprehensive research into these SOTA models, extending them into biomedical fields and performing in-depth comparisons and discussions.

评论

Questions

Please see the weaknesses above.

Kindly refer to our reply to Weakness 1 and Weakness 2.

How is the model specific to the biomedical domain?

Please review our response to Weakness 1.

Appendix A shows several comparisons of prompting using domain resources, but the prompts used in practice are not clear. For the activator relation type, the authors use the prompt including activate (not activator), so is question generation done manually? They also say, "Note that it is a relatively simple approach" in explaining questions, but do they use any other complicated approach in practice? Or is this simple approach the best?

In practice, we exclusively use the questions and refrain from using any prompts derived from domain resources.

All the questions are generated based on templates except for entity detection (Section 3.2.1) in relation extraction task. The primary reason is that the relation types in different datasets are morphologically and semantically diverse, and manual generation is more accurate in terms of syntax and semantics. In our future work, we plan to investigate techniques for automatically generating questions using the appropriate toolkit. While for the other components in our model, the question is generated based on templates. For example, in number prediction, the template is How many τe1,τe2,...,τeN\tau_{e1}, \tau_{e2}, ..., \tau_{eN} are there in the sentence with relation τr\tau_r?, where τe1,τe2,...,τeN\tau_{e1}, \tau_{e2}, ..., \tau_{eN} are all the entity types that satisfy the structural constraint of relation τr\tau_r.

In Appendix A we also mentioned two prompting techniques: term definitions and UMLS markers. Both of them have a negative influence on the model's performance.

Term Definition. We enrich the question with definitions from the Free Medical Dictionary (https://medical-dictionary.thefreedictionary.com/), i.e. Bio-RFX (+definition). In NER, the question is followed by the definitions of all the entity types that appear in the question. Similarly, for RE, definitions of the present relation type. The experimental results are displayed below. The rigid definitions introduce noise into the data distribution, thereby increasing the complexity of modeling sentence representations.

DatasetTaskBio-RFXBio-RFX (+definition)
DrugProtNER91.8790.79
RE71.2256.79
DrugProt(500)NER89.6788.67
RE57.5852.50
DrugProt(200)NER87.4389.35
RE52.0356.73
DrugVarNER84.1283.86
RE69.2869.70
DrugVar(500)NER81.2379.90
RE63.9063.22
DrugVar(200)NER74.9071.19
RE54.0148.68
Bacteria BiotopeNER75.9075.14
RE43.3844.65

UMLS Marker. As suggested in Appendix A, incorporating UMLS markers leads to a performance drop, due to the discrepancies between entity types of UMLS Metamap and datasets, the low accuracy of Metamap matching, and the overlooking of relation types in the sentence. In the following example, the term of is erroneously identified as a gene OF (TAF1 wt Allele) due to its ambiguous nature, which subsequently hampers the overall performance. Moreover, accessing Metamap via web API is extremely time-consuming, posing a challenge to processing speed.

... <CHEMICAL> isoprenaline </CHEMICAL> - induced maximal relaxation ( E ( max ) ) <GENE> of </GENE> <CHEMICAL> methacholine </CHEMICAL> - contracted preparations in a concentration dependent fashion ...

评论

Thank you for the response and update. The response is not enough to raise my score, so I will keep it.

审稿意见
5

This paper studies the problem of information extraction (named entity recognition and relation extraction) in the biomedical domain. The authors propose to predict the relation type in the sentence and then extract the relevant entities in a question-answering manner. Finally, the pruning algorithm is used with an entity number predictor to filter the final predicted entities. The proposed method is evaluated on three biomedical datasets: Bacteria Biotope, DrugProt, and DrugVar, and results show that the proposed method outperforms several baselines, especially under the low-resource scenario.

优点

  • The authors propose an interesting paradigm for relation extraction: predict relation first and then extract entities; extract entities in a QA manner.
  • The authors report strong results of the proposed method on several biomedical datasets.

缺点

  • The choice of baselines seems arbitrary; I suggest linking Section 2.1 and Section 4.2.1 to make the reason for ‘why cannot use relation-first baseline’ more explicit.
  • It isn't easy to gain insights into the main strengths of the proposed method. See question A

问题

Question A: it is unclear what the ablated variant ‘- structure’ is. Suggest linking Section 4.5 and the four components in Section 3. For example, the ‘- Number’ variant contains only component 1, 2, 4? and, it would be nice to see a variant containing only the first two components

评论

We appreciate your insightful suggestions and apologize for any lack of clarity in our paper. We have addressed all the issues, and the detailed responses are as follows.

Weaknesses

The choice of baselines seems arbitrary; I suggest linking Section 2.1 and Section 4.2.1 to make the reason for "why cannot use relation-first baseline" more explicit.

In Section 2.1, we introduce two relation-first methods: PRGC and RERE, both of which are not suitable baselines for our task.

PRGC only performs the relation classification task, which means that it uses ground truth entities as the input and predicts the relations between entity pairs. On the contrary, our method is able to extract both entities and relation triplets.

RERE extracts subjects and objects from the text, ignoring entities that have not been found in any relation triplets, while our approach can recognize all the entities in the text. Besides, RERE does not differentiate between multiple mentions of the same entity within the sentence. However, biomedical literature is seethed with complicated clauses and ambiguous terms, which necessitates the precise identification of each unique mention. Thus, the benchmark is designed to distinguish each mention during the process of metric calculation, and RERE has to be excluded from baselines.

It isn't easy to gain insights into the main strengths of the proposed method. See question A.

Please refer to our response to question A.

Questions

It is unclear what the ablated variant "- Structure" is. Suggest linking Section 4.5 and the four components in Section 3. For example, the "- Number" variant contains only component 1, 2, 4? and, it would be nice to see a variant containing only the first two components.

Referring to Section 3, our framework contains four key components:

  1. Relation Classifier
  2. Entity Span Detector
  3. Entity Number Predictor
  4. Pruning Algorithm

Apart from the four key components mentioned above, we take advantage of the structural constraints brought by domain knowledge to obtain more accurate relation triplets.

In addition to the brief explanation on page 9 of our revised paper, we present a more detailed discussion here. In order to measure the significance of the structural constraints, we performed the ablated variant "- Structure" by removing the structural constraints informed by domain-specific knowledge from the model. Without taking advantage of the structural constraints, the model will end up obtaining less accurate relation triplets. For example, the relation activator can only occur between a chemical and a gene. The ablated variant without the structural constraints may extract relation triplets containing relation activator and two gene's.

In order to test the validity of the entity number predictor, we performed the ablated variant "- Number" containing only components 1, 2, and 4, for which we use the average number of entities in a sentence as the threshold for the pruning algorithm during inference.

We have performed an ablated variant containing only the first two components, i.e. the relation classifier and the entity span detector. While the micro F1 score for NER on the DrugProt dataset increased by 1.35%, the micro F1 score for RE on the same dataset decreased dramatically by 4.58%. This demonstrates that although the ablated variant may recall slightly more entities, it will extract a lot more false relation triplets due to the existence of perplexing entities, thus harming the precision.

评论

I have read the author's response and other reviews. My concern regarding the choice baselines has not been addressed.

评论

Thank you for your prompt feedback. As outlined in our previous response under Weakness 1, a fair comparison between Bio-RFX and relation-first baselines is not feasible. We appreciate your understanding and look forward to further discussions on this matter.

审稿意见
5

The paper introduced a novel biomedical entity and relation extraction method that deploys structural constraints for relation triplets to constrain the hypothesis space. It reported on extensive evaluations on three datasets and in a case study to provide convincing evidence of performance gains obtained using the introduced method. It supplemented these performance evaluations by conduction an ablation study as well.

优点

The paper introduced a novel biomedical entity and relation extraction method that deploys structural constraints for relation triplets to constrain the hypothesis space. Biomedical applications of this kind are of substantial societal importance.

It reported on extensive evaluations on three datasets and in a case study to provide convincing evidence of performance gains obtained using the introduced method. It supplemented these performance evaluations by conduction an ablation study as well.

The paper was very carefully written. It was clear and convincing.

缺点

I was unable to find information about statistical analysis (e.g., statistical significance tests or confidence intervals).

Automatic entity and relation extraction is a trending topic in research on natural language processing, knowledge graphs, and machine/deep learning. Hence, a more convincing case for novel contributions made in this paper could be made. For example, I could not find a single paper on contrastive representation learning included in the paper, although, e.g., triplet loss and contrastive representation learning are very closely related (see, e.g., Le-Khac, P. H., Healy, G., & Smeaton, A. F. (2020). Contrastive representation learning: A framework and review. IEEE Access, 8, 193907-193934).

Reference list of the paper could be perfected and math should be punctuated.

问题

How were the performance gained evaluated as significant? Were they both statistically and practically significant?

What made the contributions made new/novel compared to prior work?

评论

Thank you for the constructive comments. They are very helpful for improving our paper. We understand your concerns and would like to address them as follows.

Weaknesses

I was unable to find information about statistical analysis (e.g., statistical significance tests or confidence intervals).

Thank you for your constructive suggestions. We are conducting more experiments regarding statistical tests according to your new comment and will update the results before November 22.

Automatic entity and relation extraction is a trending topic in research on natural language processing, knowledge graphs, and machine/deep learning. Hence, a more convincing case for novel contributions made in this paper could be made. For example, I could not find a single paper on contrastive representation learning included in the paper, although, e.g., triplet loss and contrastive representation learning are very closely related (see, e.g., Le-Khac, P. H., Healy, G., & Smeaton, A. F. (2020). Contrastive representation learning: A framework and review. IEEE Access, 8, 193907-193934).

Thank you for your comments, which have broadened our perspective. We will conduct in-depth research and discuss it in detail in our subsequent work.

Reference list of the paper could be perfected and math should be punctuated.

Thank you for your detailed and helpful suggestions. We have carefully polished the paper and fixed the writing issues in the revised manuscript.

Questions

How were the performance gained evaluated as significant? Were they both statistically and practically significant?

Please see our response to Weakness 1.

What made the contributions made new/novel compared to prior work?

Compared with other methods, our approach introduces novelty in the following aspects. Firstly, we take advantage of the structural constraints brought by domain knowledge to obtain more accurate relation triplets. This approach mitigates the issue of extracting false positive triplets from the texts when enumerating all the entity pairs. Secondly, in order to address domain-specific issues, such as nested or overlapping biomedical terms, we implemented the text-NMS algorithm to improve the specificity of extraction. Thirdly, we generate a question query with respect to the relation type and targeted entity type, providing an intuitive way of jointly modeling the connection between entity and relation.

评论

It would have been helpful to communicate the results from those additional statistical tests in the rebuttal already. Because these results - or even methodological detail towards conducting the tests - were not included, I am not in a position of changing my review for the better. Based on the other reviewers' concerns, I have now revised the review to go from marginally above to marginally below the acceptation threshold.

评论

Thank you for your prompt response. We have designed a statistical analysis and performed experiments on the DrugVar dataset according to your constructive suggestions, which we address as follows.

  1. We choose 5 seeds randomly.

  2. We train Bio-RFX and all the baseline models with each seed and record the corresponding performances.

  3. We perform one-tailed paired t-tests between Bio-RFX and each baseline model with significance level α=0.05\alpha = 0.05 on the results. For each baseline model:

    1. We compute the difference in performance between Bio-RFX and the baseline model so that we obtain 5 difference measures di (i=1,2,,5)d_i~(i = 1, 2, \dots, 5).

    2. We compute the tt statistic under the null hypothesis that Bio-RFX and the compared baseline have equal performance:

      t=dˉ0s/5=5dˉ14i=15(didˉ)2, t = \frac{\bar{d} - 0}{s / \sqrt{5}} = \frac{\sqrt{5}\bar{d}}{\sqrt{\frac14\sum_{i=1}^5(d_i - \bar{d})^2}},

      ​ where dˉ\bar{d} and ss are the sample mean and standard deviation of the difference measures, respectively.

    3. We compute the p-value and compare it to the significance level α=0.05\alpha = 0.05. If the p-value is smaller than 0.050.05 or the tt statistic is bigger than 2.1322.132, we reject the null hypothesis.

The tt statistics and p-values between Bio-RFX and the baseline models are shown in the following table. We can observe that all the p-values are below α=0.05\alpha = 0.05 (and all the tt statistics are above 2.1322.132), rejecting the null hypothesis and demonstrating that Bio-RFX significantly outperforms all the compared baselines. We hope these results may alleviate your concerns. We have updated the experimental results, please refer to our latest response.

RunBio-RFXPUREKECITPLinker-plusSpanBioER
NERRENERRENERRENERRENERRE
182.5568.6580.2463.4074.2863.3479.5463.1482.1869.39
282.7369.3781.1866.4274.2563.4080.1361.6181.9068.39
383.0671.5780.5265.5374.9860.0078.8861.6081.7568.01
483.8570.4580.5864.9074.3063.0078.7661.1681.4467.48
583.6370.3080.4266.0374.9665.0882.0367.3381.8467.76
tt--8.138.7833.765.995.405.523.762.39
pp--0.00060.00050.00000.00200.00280.00260.00990.0375
评论

We have further conducted significance tests using the Bacteria Biotope dataset, and we present here our experimental results on the Bacteria Biotope dataset. It shows that Bio-RFX significantly outperforms the baselines on this dataset too. Due to our limited computing resources and time constraints, we may not be able to conduct the significance tests for the DrugProt dataset before Nov. 22, as it is a much bigger dataset, instead, we will see if we can perform the experiments for the DrugProt dataset in the low resource setting (i.e., with much smaller training data) by the due date. In future versions, we will carry out more comprehensive significance tests and analyses. We appreciate your understanding and patience.

RunBio-RFXPUREKECITPLinker-plusSpanBioER
NERRENERRENERRENERRENERRE
175.9445.8366.5936.4766.2036.3669.2537.4974.7844.68
276.6542.5566.5936.1862.0232.1469.7344.5874.9442.32
375.8443.7765.8037.9963.8331.0369.1541.5575.1841.90
476.2546.1966.0436.7559.5738.7170.8838.3575.6543.42
576.2945.3967.3837.2468.1534.4866.7740.3374.6442.92
tt--38.8410.377.8611.8110.382.234.923.69
pp--0.00000.00020.00070.00010.00020.04480.00400.0105
评论

We have conducted significance analysis for the DrugProt dataset in the low resource setting (i.e., with a much smaller training set of size 200). The results are shown below. We would like to highlight that even when training resources are limited, our method is still able to demonstrate significant performance improvements over the most of the baselines. We appreciate your time and consideration, and we look forward to any further comments or suggestions you may have to enhance our work.

RunBio-RFXPUREKECITPLinker-plusSpanBioER
NERRENERRENERRENERRENERRE
188.1255.6383.9055.7471.0538.5878.1132.8082.2441.31
288.5553.9783.4051.4168.5738.4379.7926.5281.8739.65
388.9757.0484.2952.8972.7638.5984.7025.0082.1742.92
488.9355.5984.1354.6274.6435.5882.7526.2982.0342.06
590.4958.7884.0958.2371.0744.1782.6830.2382.3842.01
tt--13.692.1116.6117.617.3718.6219.2226.16
pp--0.00010.05120.00000.00000.00090.00000.00000.0000
审稿意见
6

The paper tackles the issue of Named Entity Recognition and Biomedical Relation Extraction. In particular, it proposes a framework that starts with identifying relations first, and using that information to constrain the space for entity extraction. The paper shows that the method surpasses SOA for NER and RE tasks on multiple biomedical datasets, on a variety of scenarios, including low-resource.

优点

  • Quality: Approach is grounded in biological knowledge of constraining the space for entity extractions to types of relationship
  • Originality: The paper combines various ML concepts in an interesting and creative way
  • Results: Results show improvement over other SOA models

缺点

The concepts introduced in the paper are not novel. However, the paper does a nice job at putting them in a creative way to obtain improvements over SOA.

问题

  • Looking at the results in Table 1: What is the hypothesis for KECI performing better on RE task in DrugProt?
评论

Thank you for the valuable and detailed comments on the experimental results, which are very helpful for improving our paper.

Questions

Looking at the results in Table 1: What is the hypothesis for KECI performing better on RE task in DrugProt?

In addition to the brief explanation on the last line of page 7, we present a more detailed discussion here. The result in Table 1 can be attributed to two key factors: the integration of a background Knowledge Graph (KG) within KECI, and the large number of training samples in the DrugProt dataset.

KECI first constructs an initial span-graph from the text and then uses an entity linker to form a knowledge graph containing relevant background biomedical knowledge for the entity mentions in the text. To make the final predictions, KECI fuses the initial span graph and the knowledge graph into a more refined graph using an attention mechanism. The intricate network of interconnections among entities within the background KG equips KECI with a deeper understanding of sentence structures, thereby facilitating the formation of complex triplets. On the other hand, in order to lessen the demand for computational resources, we match all the possible subjects and objects to generate triplets, as stated in Section 3.2.3.

Another contributing factor is the larger size of the DrugProt training set compared to other datasets, as detailed in Table 5 in Appendix C. This allows KECI to effectively strike a balance between utilizing training data and incorporating prior knowledge. However, when training data is limited, it will lead to overfitting to the background KG. For additional supporting experimental results and discussions, please refer to Section 4.4.

评论

We would like to express our gratitude to all the reviewers for the time and effort they spent reviewing our paper. We value the insightful and constructive comments, which have been instrumental in improving our work. We have carefully addressed all the comments in our individual responses. We kindly invite further engagement to resolve any confusion and to receive feedback on our rebuttal.

In the following, we list the changes in the revised manuscript.

  • Supplemented Appendix A with experimental results of incorporating domain knowledge into prompts.
  • Added significant analysis in Appendix E.
  • Punctuated mathematical formula in Section 3, as suggested by Reviewer FKtM.
  • Elucidated the setting of ablation study in Section 4.5.
  • Added missing related works in Section 2, in response to Reviewer FKtM and Reviewer apnY.
AC 元评审

This work considered biomedical named entity recognition and relation extraction via a staged approach which sequentially predicts relation types, then entity identification (followed by pruning). The intuition is to bake the constraints implicit in relation extraction into the method.

The approach is relatively straightforward (a strength!), and seem to offer gains on the biomedical datasets considered. However, the baselines considered are weak (as pointed out by Wvyh and apnY). In particular, the authors compare against methods only from 2021 and before; this is a long time in NLP years. Indeed, as pointed out by apnY, generative methods for this task have since emerged as SOTA (e.g., Cabot and Navigli [EMNLP 2021], Wang et al [ACL 2022], Ye et al. [ACL 2022], Wadwha et al. [ACL 2023]). Without comparison to at least one such generative baseline, it is difficult to appreciate the contribution here. The authors addressed this in response only elaborating their "related work" section, but provide no additional empirical results.

为何不给更高分

The main limitation here is the lack of modern baselines; given that this is a very applied and therefore empirical paper (the primary contribution being a simple, intuitive technique for biomedical RE), this is critical in my view.

为何不给更低分

N/A

最终决定

Reject