Artificial Intelligence in Medicine
Volume 39, Issue 2 , Pages 127-136 , February 2007

Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach

  • Fabio Rinaldi

      Affiliations

    • Institute of Computational Linguistics, University of Zurich, Binzmühlestrasse 14, CH-8050 Zürich, Switzerland
    • Corresponding Author InformationCorresponding author. Tel.: +41 44 635 6724; Fax: +41 44 635 6809.
  • ,
  • Gerold Schneider

      Affiliations

    • Institute of Computational Linguistics, University of Zurich, Binzmühlestrasse 14, CH-8050 Zürich, Switzerland
  • ,
  • Kaarel Kaljurand

      Affiliations

    • Institute of Computational Linguistics, University of Zurich, Binzmühlestrasse 14, CH-8050 Zürich, Switzerland
  • ,
  • Michael Hess

      Affiliations

    • Institute of Computational Linguistics, University of Zurich, Binzmühlestrasse 14, CH-8050 Zürich, Switzerland
  • ,
  • Christos Andronis

      Affiliations

    • Biovista, 34 Rodopoleos Str., Ellinikon, GR-16777 Athens, Greece
  • ,
  • Ourania Konstandi

      Affiliations

    • Biovista, 34 Rodopoleos Str., Ellinikon, GR-16777 Athens, Greece
  • ,
  • Andreas Persidis

      Affiliations

    • Biovista, 34 Rodopoleos Str., Ellinikon, GR-16777 Athens, Greece

Received 16 January 2006 ,Revised 27 August 2006 ,Accepted 28 August 2006.

References 

  1. Rinaldi F, Schneider G, Kaljurand K, Hess M, Andronis C, Persidis A, et al. Relation mining over a corpus of scientific literature. In: Proceedings of the 10th Conference on Artificial Intelligence in Medicine, LNAI 3581. Aberdeen, Scotland: Springer Verlag; 2005;p. 550–559
  2. Rinaldi F, Schneider G, Kaljurand K, Hess M, Romacker M. An environment for relation mining over richly annotated corpora: the case of GENIA. In: Proceedings of the second international symposium on semantic mining in biomedicine. 2006;p. 68–75
  3. Daraselia N, Egorov S, Yazhuk A, Novichkova S, Yuryev A, Mazo I. Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics. 2004;20(5):604–611
  4. Schibler U. The daily rhythms of genes, cells and organs. Biological clocks and circadian timing in cells. EMBO Rep. 2005;6:9–13
  5. Barak S, Tobin E, Andronis C, Sugano S, Green R. All in good time: the Arabidopsis Circadian Clock. Trends Plant Sci. 2000;5(12):517–522
  6. Kim J, Ohta T, Tateisi Y, Tsujii J. GENIA corpus—a semantically annotated corpus for bio-textmining. Bioinformatics. 2003;19(1):180–182
  7. Kaljurand K, Rinaldi F, Schneider G. Prolog-based query interface to syntactic dependencies extracted from biomedical literature. Technical report IFI-2006.04. University of Zurich; 2006.
  8. Rinaldi F, Dowdall J, Hess M, Ellman M, Zarri GP, Persidis A, et al. Multilayer Annotations in PARMENIDES. In:  Handschuh S,  Koivunen M,  Dieng R,  Staab S editor. Proceedings of the K-CAP2003 workshop on knowledge markup and semantic annotation. 2003;p. 33–40
  9. Reynar JC, Ratnaparkhi A. A maximum entropy approach to identifying sentence boundaries. In: Proceedings of the fifth conference on applied natural language processing. Washington, DC: University of Pennsylvania; 1997;p. 16–19
  10. Marcus M, Santorini B, Marcinkiewicz M. Building a large annotated corpus of English: the Penn treebank. Comput Linguist. 1993;19:313–330
  11. Ratnaparkhi A. A maximum entropy part-of-speech tagger. In:  Brill E,  Church K editor. Proceedings of the empirical methods in natural language processing conference. University of Pennsylvania; 1996;p. 133–142
  12. Minnen G, Carroll J, Pearce D. Applied morphological processing of English. Nat Lang Eng. 2001;7(3):207–223
  13. Mikheev A. Automatic rule induction for unknown word guessing. Comput Linguist. 1997;23(3):405–423
  14. Schneider G. Extracting and using trace-free functional dependencies from the Penn treebank to reduce parsing complexity. In:  Nivre J,  Hinrichs E editor. Proceedings of the second workshop on treebanks and linguistic theories (TLT 2003), vol. 9 of Mathematical Modelling in Physics, Engineering and Cognitive Science. 2003;p. 153–164
  15. Schneider G, Rinaldi F, Dowdall J. Fast, deep-linguistic statistical minimalist dependency parsing. In:  Kruijff G,  Duchier D editor. Proceedings of the COLING-2004 workshop on recent advances in dependency grammars. 2004;p. 33–40
  16. M. Collins, Head-statistical models for natural language processing. PhD Thesis. Philadelphia, USA: University of Pennsylvania; 1999.
  17. Younger DH. Recognition and parsing of context-free languages in time . Inform Contr. 1967;10:189–208
  18. http://www.cs.york.ac.uk/aig/lll/lll05/(accessed: May 10, 2006) In:  Cussens J,  Nédellec C editor. Proceedings of the workshop on learning language in logic (LLL05). 2005;
  19. Carroll J, Minnen G, Briscoe E. Parser evaluation: using a grammatical relation annotation scheme. In:  Abeillé A editors. Treebanks: building and using parsed corpora. Dordrecht: Kluwer; 2003;p. 299–316
  20. Lin D. Dependency-based evaluation of MINIPAR. In: Proceedings of the workshop on the evaluation of parsing systems. 1998;
  21. Preiss J. Using grammatical relations to compare parsers. In: Proceedings of the EACL ’03. 2003;p. 291–296
  22. Yakushiji A, Tateisi Y, Miyao Y. Event extraction from biomedical papers using a full parser. In: Proceedings of Pacific symposium on biocomputing. River Edge, NJ: World Scientific Publishing; 2001;p. 408–419
  23. Miyao Y, Ninomiya T, Tsujii J. Corpus-oriented grammar development for acquiring a head-driven phrase structure grammar from the Penn treebank. In: Proceedings of IJCNLP-04. 2004;p. 684–693
  24. Friedman C, Kra P, Yu H, Krauthammer M, Rzhetsky A. GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics. 2001;17(1):74–82
  25. Gaizauskas R, Demetriou G, Artymiuk PJ, Protein Structures WP. Information extraction from biological texts: the PASTA system. Bioinformatics. 2003;19:135–143
  26. Novichkova S, Egorov S, Daraselia N. MedScan, a natural language processing engine for MEDLINE abstracts. Bioinformatics. 2003;19(13):1699–1706
  27. Katrenko S, Roos MS, Adriaans M. Learning biological interactions from medline abstracts. In:  Cussens J,  Nédellec C editor. Proceedings of the workshop on learning language in logic (LLL05). 2005;p. 53–58
  28. Greenwood MA, Stevenson M, Guo Y, Harkema H, Roberts A. Automatically acquiring a linguistically motivated genic interaction extraction system. In:  Cussens J,  Nédellec C editor. Proceedings of the workshop on learning language in logic (LLL05). 2005;p. 46–52

 A preliminary version of the system described in this paper has been presented in [1]. Recent results obtained after the submission of this paper are described in [2]. All URLs mentioned in this paper have been accessed and verified on 10 May 2006.

☆☆ The tools and resources used for the work described in this paper are freely available for research purposes. The DepGENIA corpus can be downloaded from the OntoGene web site (http://www.ontogene.org/). The Pro3Gres parser and the OntoGene text mining system can be obtained by contacting the authors of this paper.

PII: S0933-3657(06)00137-0

doi: 10.1016/j.artmed.2006.08.005

Artificial Intelligence in Medicine
Volume 39, Issue 2 , Pages 127-136 , February 2007