Volume 39, Issue 2 , Pages 127-136, February 2007
Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach☆☆☆
Summary
Objective
The amount of new discoveries (as published in the scientific literature) in the biomedical area is growing at an exponential rate. This growth makes it very difficult to filter the most relevant results, and thus the extraction of the core information becomes very expensive. Therefore, there is a growing interest in text processing approaches that can deliver selected information from scientific publications, which can limit the amount of human intervention normally needed to gather those results.
Materials and methods
This paper presents and evaluates an approach aimed at automating the process of extracting functional relations (e.g. interactions between genes and proteins) from scientific literature in the biomedical domain. The approach, using a novel dependency-based parser, is based on a complete syntactic analysis of the corpus.
Results
We have implemented a state-of-the-art text mining system for biomedical literature, based on a deep-linguistic, full-parsing approach. The results are validated on two different corpora: the manually annotated genomics information access (GENIA) corpus and the automatically annotated arabidopsis thaliana circadian rhythms (ATCR) corpus.
Conclusion
We show how a deep-linguistic approach (contrary to common belief) can be used in a real world text mining application, offering high-precision relation extraction, while at the same time retaining a sufficient recall.
Keywords: Information extraction, Text mining, Dependency parsing, Biomedical literature, Protein interactions
To access this article, please choose from the options below
☆ A preliminary version of the system described in this paper has been presented in [1]. Recent results obtained after the submission of this paper are described in [2]. All URLs mentioned in this paper have been accessed and verified on 10 May 2006.
☆☆ The tools and resources used for the work described in this paper are freely available for research purposes. The DepGENIA corpus can be downloaded from the OntoGene web site (http://www.ontogene.org/). The Pro3Gres parser and the OntoGene text mining system can be obtained by contacting the authors of this paper.
PII: S0933-3657(06)00137-0
doi:10.1016/j.artmed.2006.08.005
© 2006 Published by Elsevier Inc.
Volume 39, Issue 2 , Pages 127-136, February 2007
