Artificial Intelligence in Medicine
Volume 41, Issue 2 , Pages 87-104 , October 2007

Integrative mining of traditional Chinese medicine literature and MEDLINE for functional gene networks

  • Xuezhong Zhou

      Affiliations

    • China Academy of Chinese Medical Sciences, Beijing 100700, China
    • Guanganmen Hospital, China Academy of Chinese Medical Sciences, Beijing 100053, China
    • Corresponding Author InformationCorresponding author at: China Academy of Chinese Medical Sciences, Beijing 100700, China. Tel.: +86 10 88001446; fax: +86 10 63131371.
  • ,
  • Baoyan Liu

      Affiliations

    • China Academy of Chinese Medical Sciences, Beijing 100700, China
  • ,
  • Zhaohui Wu

      Affiliations

    • College of Computer Science, Zhejiang University, Hangzhou 310027, China
  • ,
  • Yi Feng

      Affiliations

    • College of Computer Science, Zhejiang University, Hangzhou 310027, China

Received 1 December 2006 ,Revised 24 July 2007 ,Accepted 24 July 2007.

References 

  1. Ideker T, Galitski T, Hood L. A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet. 2001;2:343–372
  2. Hatzivassiloglou V, Duboue PA, Rzhetsky A. Disambiguating proteins, genes and RNA in text: a machine learning approach. Bioinformatics. 2001;17(Suppl. 1):S97–S106
  3. Fukuda K, Tsunoda T, Tamura A, Takagi T. Toward information extraction: identifying protein names from biological papers. In:  Altman RB,  Dunker AK,  Hunter L,  Klein TE editor. Pac. symp. biocomput.. 1998;p. 707–718
  4. Stephens M, Palakal M, Mukhopadhyay S, Raje R, Mostafa J. Detecting gene relations from MEDLINE abstracts. In:  Altman RB,  Dunker AK,  Hunter L,  Lauderdale K,  Klein TE editor. Pac. symp. biocomput.. 2001;p. 483–495
  5. Marcotte EM, Xenarios L, Eisenberg D. Mining literature for protein–protein interactions. Bioinformatics. 2001;17(4):359–363
  6. Daraselia N, Yuryev A, Egorov S. Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics. 2004;20:604–611
  7. Hao Y, Zhu X, Huang M, Li M. Discovering patterns to extract protein–protein interactions from the literature. Part II. Bioinformatics. 2005;21(15):3294–3300
  8. Blaschke C, Andrade MA, Ouzounis C, Valencia A. Automatic extraction of biological information from scientific text: protein–protein interactions. In:  Lengauer T,  Schneider R,  Bork P,  Brutlag DL,  Glasgow JI,  Mewes H-W,  Zimmer R editor. Proc. int. conf. intell. syst. mol. biol.. Menlo Park, California: AAAI Press; 1999;p. 60–67
  9. Jenssen T-K, Laegreid A, Komorowski J, Hovig E. A literature network of human genes for high-throughput analysis of gene expression. Nat Genet. 2001;28:21–28
  10. Scherf M, Epple A, Werner T. The next generation of literature analysis: integration of genomic analysis into text mining. Brief Bioinform. 2005;6(3):287–297
  11. Hoeglund A, Blum T, Brady S, Dönnes P, Miguel JS, Rocheford M, et al. Significantly improved prediction of subcellular localization by integrating text and protein sequence data. In:  Altman RB,  Murray T,  Klein TE,  Dunker AK,  Hunter L editor. Pac. symp. biocomput.. 2006;p. 16–27
  12. Eskin E, Agichtein E. Combining text mining and sequence analysis to discover protein functional regions. In:  Altman RB,  Dunker AK,  Hunter L,  Jung TA,  Klein TE editor. Pac. symp. biocomput.. 2004;p. 288–299
  13. Glenisson P, Mathys J, De Moor B. Meta-clustering of gene expression data and literature-extracted information. ACM SIGKDD Explorations, Special Issue on Microarray Data Mining. 2003;5(2):101–112
  14. Feng Y, Wu Z, Zhou X, Zhou Z. Knowledge discovery in traditional Chinese medicine: state of the art and perspectives. Artif Intell Med. 2006;38(3):219–236
  15. Cornish-Bowden A, Cárdenas ML. Systems biology may work when we learn to understand the parts in terms of the whole. Biochem Soc Trans. 2005;33(Pt 3):516–519
  16. Beijing University of Traditional Chinese Medicine . Basic theories of traditional Chinese medicine. Beijing, China: Academy Press (Xue Yuan); January 2002;
  17. Yin HH, Zhang BN. The basic theory of traditional Chinese medicine (in Chinese). Shanghai, China: Shanghai Science and Technology Publishers; 1984;
  18. Pan M, Li L, Li K. Study on syndrome characteristics of Chinese medicine and relative factors in patients with DM. J Beijing Univ Trad Chin Med (Clin Med). 2006;13(4):6–10
  19. The Department of Science Education, State Administration of Traditional Chinese Medicine of the People's Republic of China . The Summary of Workshop on Traditional Chinese Medicine and Genomics (in Chinese). World Sci Technol/Modernize Tradit Chin Med. 1999;1:67–68
  20. Shen ZY. The continuation of kidney study. Shanghai, China: Shanghai Science and Technology Publishers; 1990;
  21. Zhong LY, Shen ZY, Cai DF. Effect of three kinds (tonifying kidney, invigorating spleen, promoting blood circulation) recipes on the hypothalamus–pituitary–adrenal–thymus (HPAT) axis and CRF gene expression. Zhongguo Zhong Xi Yi Jie He Za Zhi. 1997;17(1):39–41
  22. Li S, Zhang ZQ, Wu LJ, Zhang XG, Li YD, Wang YY. Understanding ZHENG in traditional Chinese medicine in the context of neuro-endocrine-immune network. IET Syst Biol. 2007;1:51–60
  23. Mou H. Advancement of the treatment method proposed by Zhang Zhong-jing (in Chinese). Med Philos. 2006;27(1):51–60
  24. Chen K, Song J. Clinical study by way of combining diseases with differentiation of their syndromes, an important mode in study on combination of Chinese traditional and western medicine. World Sci Technol/Modernization Tradit Chin Med Mater Med. 2006;8(2):1–5
  25. Swanson DR. Two medical literatures that are logically but not bibliographically connected. J Am Soc Inf Retrieval. 1987;38(4):228–233
  26. Swanson DR. Complementary structures in disjoint science literatures. In:  Bookstein A,  Chiaramella Y,  Salton G,  Raghavan VV editor. Proceedings of the 14th annual international ACM SIGIR conference on research and development in information retrieval. New York: ACM Press; 1991;p. 280–289
  27. Swanson DR, Smalheiser NR. An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif Intell Med. 1997;91(2):183–203
  28. Gordon MD, Lindsay RK. Toward discovery support systems: a replication, re-examination, and extension of Swanson's work on literature-based discovery of a connection between Raynaud's and fish oil. J Am Soc Inf Sci. 1996;47(2):116–128
  29. Lindsay RK, Gordon MD. Literature-based discovery by lexical statistics. J Am Soc Inf Sci. 1999;50(7):574–587
  30. Weeber M, Klein H, Aronson AR, Mork JG, de Jong-van den Berg LR, Vos R. Text-based discovery in biomedicine: the architecture of the DAD-system. In:  Overhage JM editors. Proceedings of the AMIA annual symposium. Philadelphia: Hanley & Belfus; 2000;p. 903–907
  31. Tanabe L, Xie N, Thom LH, Matten W, Wilbur WJ. GENETAG: a tagged corpus for gene/protein named entity recognition. BMC Bioinformatics. 2005;6(Suppl. 1):S3
  32. Zhou G, Shen D, Zhang J, Su J, Tan S. Recognition of protein/gene names from text using an ensemble of classifiers. BMC Bioinformatics. 2005;6(Suppl. 1):S7
  33. Finkel J, Dingare S, Manning CD, Nissim M, Alex B, Grover C. Exploring the boundaries: gene and protein identification in biomedical text. BMC Bioinformatics. 2005;6(Suppl. 1):S5
  34. Hakenberg J, Bickel S, Plake C, Brefeld U, Zahn H, Faulstich L, et al. Systematic feature evaluation for gene name recognition. BMC Bioinformatics. 2005;6(Suppl. 1):S9
  35. McDonald R, Pereira F. Identifying gene and protein mentions in text using conditional random fields. BMC Bioinformatics. 2005;6(Suppl. 1):S6
  36. Xu H, Fan JW, Hripcsak G, Mendonça EA, Markatou M, Friedman C. Gene symbol disambiguation using knowledge-based profiles. Bioinformatics. 2007;23(8):1015–1022
  37. Hu X, Wu DD. Data mining and predictive modeling of biomolecular network from biomedical literature databases. IEEE/ACM Trans Comput Biol Bioinform. 2007;4(2):251–263
  38. Liu Y, Navathe SB, Civera J, Dasigi V, Ram A, Ciliax BJ, et al. Text mining biomedical literature for discovering gene-to-gene relationships: a comparative study of algorithms. IEEE/ACM Trans Comput Biol Bioinform. 2005;2(1):62–76
  39. Bunescu R, Mooney R, Ramani A, Marcotte E. Integrating co-occurrence statistics with information extraction for robust retrieval of protein interactions from Medline. In: Proceedings of the HLT-NAACL workshop on linking natural language processing and biology: towards deeper biological literature analysis (BioNLP-2006). New York City, NY. June 2006;p. 49–56
  40. Stapley BJ, Kelley LA, Sternberg MJ. Predicting the sub-cellular location of proteins from text using support vector machines. In:  Altman RB,  Dunker AK,  Hunter L,  Klein T editor. Pac. symp. biocomput.. 2002;p. 374–385
  41. Rindflesch TC, Rayan JV, Hunter L. Extracting molecular binding relationships from biomedical text. In: Proceedings of the sixth conference on applied natural language processing. San Francisco: Morgan Kaufmann Publishers; 2000;p. 188–195
  42. Rindflesch TC, Tanabe L, Weinstein JN, Hunter L. EDGAR: extraction of drugs, genes and relations from the biomedical literature. In:  Altman RB,  Dunker AK,  Hunter L,  Klein TE editor. Pac. symp. biocomput.. 2000;p. 517–528
  43. van Driel MA, Cuelenaere K, Kemmeren PP, Leunissen JA, Brunner HG, Vriend G. GeneSeeker: extraction and integration of human disease-related information from web-based genetic databases. Nucleic Acids Res. 2005;33:W758–W761
  44. van Driel MA, Bruggeman J, Vriend G, Brunner HG, Leunissen JA. A text-mining analysis of the human phenome. Eur J Hum Genet. 2006;14(5):535–542
  45. Chun H-W, Tsuruoka Y, Kim J-D, Shiba R, Nagata N, Hishiki T, et al. Extraction of gene-disease relations from Medline using domain dictionaries and machine learning. In:  Altman RB,  Murray T,  Klein TE,  Dunker AK,  Hunter L editor. Pac. symp. biocomput.. 2006;p. 4–15
  46. Masseroli M, Galati O, Pinciroli F. GFINDer: genetic disease and phenotype location statistical analysis and mining of dynamically annotated gene lists. Nucleic Acids Res. 2005;W717–W723
  47. Chen Y, Shen C, Sivachenko AY. Mining Alzheimer disease relevant proteins from integrated protein interactome data. In:  Altman RB,  Murray T,  Klein TE,  Dunker AK,  Hunter L editor. Pac. symp. biocomput.. 2006;p. 367–378
  48. Perez-Iratxeta C, Bork P, Andrade MA. Association of genes to genetically inherited diseases using data mining. Nat Genet. 2002;31(3):316–319
  49. Perez-Iratxeta C, Bork P, Wjst M, Bork P, Andrade MA. G2D: a tool for mining genes associated with disease. BMC Genet. 2005;6:45
  50. Freudenberg J, Propping P. A similarity-based method for genome-wide prediction of disease-relevant human genes. Bioinformatics. 2002;18(Suppl. 2):S110–S115
  51. Antal P, Fannes G, Timmerman D, Moreau Y, De Moor B. Using literature and data to learn Bayesian networks as clinical models of ovarian tumors. Artif Intell Med. 2004;30(3):257–281
  52. Bunescu R, Ge R, Kate RJ, Marcotte EM, Mooney RJ, Ramani AK, et al. Comparative experiments on learning information extractors for proteins and their interactions. Artif Intell Med. 2005;33(2):139–155
  53. de Bruijn B, Martin J. Getting to the (c)ore of knowledge: mining biomedical literature. Int J Med Inform. 2002;67(1–3):7–18
  54. Hirschman L, Park JC, Tsujii J, Wong L, Wu CH. Accomplishments and challenges in literature data mining for biology. Bioinformatics. 2002;18(12):1553–1561
  55. Hersh W. Evaluation of biomedical text-mining systems: lessons learned from information retrieval. Brief Bioinform. 2005;6(4):344–356
  56. Yandell MD, Majoros WH. Genomics and natural language processing. Nat Rev Genet. 2002;3(8):601–610
  57. Wilkinson D, Huberman BA. A method for finding communities of related genes. Proc Natl Acad Sci. 2004;101(Suppl. 1):5241–5248
  58. Adamic LA, Wilkinson D, Huberman BA, Adar E. A literature based method for identifying gene-disease connections. In: Proceedings of the IEEE comput. soc. bioinform. conf., vol. 1. Washington, DC: IEEE Computer Society; 2002;p. 109–117
  59. Yarowsky D. Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd annual meeting of the association for computational linguistics. Morristown, NJ: Association for Computational Linguistics; 1995;p. 189–196
  60. Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th annual conference on computational learning theory. New York: ACM Press; 1998;p. 92–100
  61. Jones R, McCallum A, Nigam K, Riloff E. Bootstrapping for text learning tasks. In:  Feldman R editors. Proceedings of the 16th international joint conference on artificial intelligence workshop on text mining: foundations, techniques and applications. San Francisco: Morgan Kaufmann; 1999;p. 52–63
  62. Riloff E, Jones R. Learning dictionaries for information extraction by multi-level bootstrapping. In: Proceedings of the 16th national conference on artificial intelligence and the 11th innovative applications of artificial intelligence conference. Menlo Park, California: AAAI Press; 1999;p. 474–479
  63. Brin S. Extracting patterns and relations from the World Wide Web. In:  Paolo A,  Alberto M,  Giansalvatore M editor. Proceedings of the International workshop on the World Wide Web and databases, LNCS 1590. London: Springer-Verlag; 1998;p. 172–183
  64. Craven M, DiPasquo D, Freitag D, McCallum A, Mitchell T, Nigam K, et al. Learning to extract symbolic knowledge from World Wide Web. In: Proceedings of the 15th national conference on artificial intelligence. Menlo Park, California: AAAI Press; 1998;p. 509–516
  65. Chien L, Pu H. Important issues on Chinese information retrieval. Comput Linguist Chin Lang Process. 1996;1(1):205–221
  66. Wu Z, Zhou X, Liu B, Chen J. Text mining for finding functional community of related genes using TCM knowledge. In:  Boulicaut J-F,  Esposito F,  Giannotti F,  Pedreschi D editor. Proceedings of the 8th European conference on principles and practice of knowledge discovery in databases, LNAI 3202. Berlin: Springer-Verlag; September 2004;p. 459–470
  67. Zhou X. Issues in TCM text mining (in Chinese). PhD thesis, Zhejiang University; December 2004.
  68. Lee I, Date SV, Adai AT, Marcotte EM. A probabilistic functional network of yeast genes. Science. 2004;306(5701):1555–1558

PII: S0933-3657(07)00094-2

doi: 10.1016/j.artmed.2007.07.007

Artificial Intelligence in Medicine
Volume 41, Issue 2 , Pages 87-104 , October 2007