Artificial Intelligence in Medicine
Volume 39, Issue 2 , Pages 127-136, February 2007

Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach☆☆

  • Fabio Rinaldi

      Affiliations

    • Institute of Computational Linguistics, University of Zurich, Binzmühlestrasse 14, CH-8050 Zürich, Switzerland
    • Corresponding Author InformationCorresponding author. Tel.: +41 44 635 6724; Fax: +41 44 635 6809.
  • ,
  • Gerold Schneider

      Affiliations

    • Institute of Computational Linguistics, University of Zurich, Binzmühlestrasse 14, CH-8050 Zürich, Switzerland
  • ,
  • Kaarel Kaljurand

      Affiliations

    • Institute of Computational Linguistics, University of Zurich, Binzmühlestrasse 14, CH-8050 Zürich, Switzerland
  • ,
  • Michael Hess

      Affiliations

    • Institute of Computational Linguistics, University of Zurich, Binzmühlestrasse 14, CH-8050 Zürich, Switzerland
  • ,
  • Christos Andronis

      Affiliations

    • Biovista, 34 Rodopoleos Str., Ellinikon, GR-16777 Athens, Greece
  • ,
  • Ourania Konstandi

      Affiliations

    • Biovista, 34 Rodopoleos Str., Ellinikon, GR-16777 Athens, Greece
  • ,
  • Andreas Persidis

      Affiliations

    • Biovista, 34 Rodopoleos Str., Ellinikon, GR-16777 Athens, Greece

Received 16 January 2006; received in revised form 27 August 2006; accepted 28 August 2006.

Summary 

Objective

The amount of new discoveries (as published in the scientific literature) in the biomedical area is growing at an exponential rate. This growth makes it very difficult to filter the most relevant results, and thus the extraction of the core information becomes very expensive. Therefore, there is a growing interest in text processing approaches that can deliver selected information from scientific publications, which can limit the amount of human intervention normally needed to gather those results.

Materials and methods

This paper presents and evaluates an approach aimed at automating the process of extracting functional relations (e.g. interactions between genes and proteins) from scientific literature in the biomedical domain. The approach, using a novel dependency-based parser, is based on a complete syntactic analysis of the corpus.

Results

We have implemented a state-of-the-art text mining system for biomedical literature, based on a deep-linguistic, full-parsing approach. The results are validated on two different corpora: the manually annotated genomics information access (GENIA) corpus and the automatically annotated arabidopsis thaliana circadian rhythms (ATCR) corpus.

Conclusion

We show how a deep-linguistic approach (contrary to common belief) can be used in a real world text mining application, offering high-precision relation extraction, while at the same time retaining a sufficient recall.

Keywords: Information extraction, Text mining, Dependency parsing, Biomedical literature, Protein interactions

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

 A preliminary version of the system described in this paper has been presented in [1]. Recent results obtained after the submission of this paper are described in [2]. All URLs mentioned in this paper have been accessed and verified on 10 May 2006.

☆☆ The tools and resources used for the work described in this paper are freely available for research purposes. The DepGENIA corpus can be downloaded from the OntoGene web site (http://www.ontogene.org/). The Pro3Gres parser and the OntoGene text mining system can be obtained by contacting the authors of this paper.

PII: S0933-3657(06)00137-0

doi:10.1016/j.artmed.2006.08.005

Artificial Intelligence in Medicine
Volume 39, Issue 2 , Pages 127-136, February 2007