Artificial Intelligence in Medicine
Volume 41, Issue 3 , Pages 209-222 , November 2007

Semi-supervised learning of the hidden vector state model for extracting protein–protein interactions

  • Deyu Zhou

      Affiliations

    • School of Computer Engineering, Nanyang Technological University, Block N4, Nanyang Avenue, Singapore 639798, Singapore
    • Corresponding Author InformationCorresponding author. Tel.: +65 67906609; fax: +65 63162780.
  • ,
  • Yulan He

      Affiliations

    • Informatics Research Centre, The University of Reading, Whiteknights Reading, Berkshire RG6 6BX, UK
  • ,
  • Chee Keong Kwoh

      Affiliations

    • School of Computer Engineering, Nanyang Technological University, Block N4, Nanyang Avenue, Singapore 639798, Singapore

Received 15 December 2006 ,Revised 18 June 2007 ,Accepted 6 July 2007.

References 

  1. Phizicky EM, Fields S. Protein–protein interactions: methods for detection and analysis. Microbiol Rev. 1995;59:94–123
  2. Bader GD, Betel D, Hogue CW. BIND: the biomolecular interaction network database. Nucleic Acids Res. 2003;31(1):248–250
  3. Hermjakob H, Montecchi-Palazzi L, Lewington C. IntAct: an open source molecular interaction database. Nucleic Acids Res. 2004;32(Database issue):452–455
  4. von Mering C, Jensen LJ, Snel B, Hooper SD, Krupp M. STRING: known and predicted protein–protein associations, integrated and transferred across organisms. Nucleic Acids Res. 2005;33(Database issue):433–437
  5. Seymore K, McCallum A, Rosenfeld R. Learning hidden Markov model structure for information extraction. In: Proceedings of the sixteenth national conference on artificial intelligence (AAAI-99) workshop on machine learning for information extraction. 1999;
  6. Novichkova S, Egorov S, Daraselia N. MedScan, a natural language processing engine for MEDLINE abstracts. Bioinformatics. 2003;19(13):1699–1706
  7. Daraselia N, Yuryev A, Egorov S, Novichkova S, Nikitin A, Mazo L. Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics. 2004;20(5):604–611
  8. Zhou D, He Y, Kwoh CK. Extracting protein–protein interactions from the literature using the hidden vector state model. In: Alexandrov VN, van Albada GD, Sloot PMA, Dongarra J, editors. Lecture notes in computer science, vol. 3992. 2006. p. 549–56.
  9. Nigam K, McCallum AK, Thrun S, Mitchell TM. Text classification from labeled and unlabeled documents using EM. Mach Learn. 2000;39(2/3):103–134
  10. Yarowsky D. Unsupervised word sense disambiguation rivaling supervised methods. In:  Uszkoreit H editors. Proceedings of the 33rd annual meeting of the association for computational linguistics. Morristown, NJ, USA: Association for Computational Linguistics; 1995;p. 189–196
  11. Rosenberg C, Hebert M, Schneiderman H. Semi-supervised self-training of object detection models. In: Proceedings of the seventh IEEE workshop on applications of computer vision. Washington, DC, USA: IEEE Computer Society; 2005;p. 29–36
  12. Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. In:  Bartlet P,  Mansour Y editor. Annual workshop on computational learning theory, Proceedings of the eleventh annual conference on computational learning theory. New York, NY, USA: ACM Press; 1998;p. 92–100
  13. Jones R. Learning to extract entities from labeled and unlabeled text. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA, 2005.
  14. Xu L, Schuurmans D. Unsupervised and semi-supervised multi-class support vector machines. In:  Veloso MM,  Kambhampati S editor. Proceedings of the twentieth national conference on artificial intelligence. Menlo Park, California, USA: The AAAI Press; 2005;p. 904–910
  15. Blum A, Chawla S. Learning from labeled and unlabeled data using graph mincuts. In: Brodley CE, Danyluk AP, editors. Proceedings of the 18th international conference on machine learning. Morgan Kaufmann; 2001. p. 19–26.
  16. Zhu X. Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences Department, University of Wisconsin-Madison; 2005.
  17. He Y, Young S. Semantic processing using the hidden vector state model. Comput Speech Lang. 2005;19(1):85–106
  18. Huang M, Zhu X, Hao Y. Discovering patterns to extract protein–protein interactions from full text. Bioinformatics. 2004;20(18):3604–3612
  19. Temkin JM, Gilder MR. Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics. 2003;19(16):2046–2053
  20. Kim JD, Ohta T, Tateisi Y, Tsujii J. GENIA corpus-semantically annotated corpus for bio-textmining. Bioinformatics. 2003;19(Suppl 1):i180–i182
  21. Elworthy D. Does Baum–Welch re-estimation help taggers?. In:  Jacobs P editors. Proceedings of the fourth ACL conference on applied natural language processing. San Francisco, CA, USA: Morgan Kaufmann; 1994;p. 53–58
  22. Inoue M, Ueda N. Exploitation of unlabeled sequences in hidden markov models. IEEE Trans Pattern Anal Mach Intell. 2003;25(12):1570–1581
  23. Milidiú R, Santos C, Duarte J, Rentería R. Semi-supervised learning for portuguese noun phrase extraction. In: Vieira R, Quaresma P, Nunes MGV, Mamede N, Oliveira C, Dias MC, editors. Lecture notes in computer science, vol. 3960. Springer Berlin: Heidelberg; 2006. p. 200–3.

PII: S0933-3657(07)00087-5

doi: 10.1016/j.artmed.2007.07.004

Artificial Intelligence in Medicine
Volume 41, Issue 3 , Pages 209-222 , November 2007