Artificial Intelligence in Medicine
Volume 41, Issue 3 , Pages 197-207 , November 2007

Ensemble methods for classification of patients for personalized medicine with high-dimensional data

  • Hojin Moon

      Affiliations

    • Department of Mathematics and Statistics, California State University-Long Beach, 1250 Bellflower Blvd., Long Beach, CA 90840, USA
    • Corresponding Author InformationCorresponding author. Tel.: +1 501 366 4712.
  • ,
  • Hongshik Ahn

      Affiliations

    • Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794-3600, USA
  • ,
  • Ralph L. Kodell

      Affiliations

    • Department of Biostatistics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
  • ,
  • Songjoon Baek

      Affiliations

    • Division of Biometry and Risk Assessment, National Center for Toxicological Research, FDA, Jefferson, AR 72079, USA
  • ,
  • Chien-Ju Lin

      Affiliations

    • Division of Biometry and Risk Assessment, National Center for Toxicological Research, FDA, Jefferson, AR 72079, USA
  • ,
  • James J. Chen

      Affiliations

    • Division of Biometry and Risk Assessment, National Center for Toxicological Research, FDA, Jefferson, AR 72079, USA

Received 21 November 2006 ,Revised 18 June 2007 ,Accepted 6 July 2007.

References 

  1. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537
  2. Zhang H, Yu C-Y, Singer B, Xiong M. Recursive partitioning for tumor classification with gene expression microarray data. Proc Natl Acad Sci USA. 2001;98:6730–6735
  3. Alizadeh AA, Elsen MB, Davis ER, Ma C, Lossos IS, Rosenwald A, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403:503–511
  4. Gordon GJ, Jensen RV, Hsiao L-L, Gullans SR, Blumenstock JE, Ramaswamy S, et al. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 2002;62:4963–4967
  5. Alexandridis R, Lin S, Irwin M. Class discovery and classification of tumor samples using mixture modeling of gene expression data—a unified approach. Bioinformatics. 2004;20:2545–2552
  6. Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc. 2002;97:77–87
  7. van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, Mao M, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–536
  8. McGuire WL. Breast cancer prognostic factors: evaluation guidelines. J Natl Cancer Inst. 1991;83:154–155
  9. Cherkauer K. Human expert-level performance on a scientific image analysis task by a system using combined artificial neural networks. In:  Chan P editors. Working notes of the AAAI workshop on integrating multiple learned models. Thirteenth national conference on artificial intelligence. Portland, OR: AAAI Press; 1996;p. 15–21
  10. Tumer K, Ghosh J. Error correlation and error reduction in ensemble classifier. Connect Sci. 1996;8:385–404
  11. Chen K, Wang L, Chi H. Methods of combining multiple classifiers with different features and their applications to text-independent speaker identification. Int J Pattern Recogn Artif Intell. 1997;11:417–445
  12. Ahn H, Moon H, Fazzari MJ, Lim N, Chen JJ, Kodell RL. Classification by ensembles from random partitions of high-dimensional data. Comput Stat Data Anal. 2007;51:6166–6179
  13. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. New York: Springer; 2001;
  14. Webb GI. MultiBoosting: a technique for combining boosting and wagging. Mach Learn. 2000;40:159–196
  15. Schapire R. The strength of weak learnability. Mach Learn. 1990;5:197–227
  16. Freund Y, Schapire R. Experiments with a new boosting algorithm. In:  Saitta L editors. Proceedings of the thirteenth international conference on machine learning. San Francisco, CA: Morgan Kaufmann; 1996;p. 148–156
  17. Breiman L. Bagging predictors. Mach Learn. 1996;24:123–140
  18. Ho TK. The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell. 1998;20:832–844
  19. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. California: Wadsworth; 1984;
  20. Breiman L. Random forest. Mach Learn. 2001;45:5–32
  21. Freund Y, Schapire R. A decision-theoretic generalization of online learning and an application to boosting. J Comput Syst Sci. 1997;55:119–139
  22. Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting. Ann Stat. 2000;28:337–374
  23. Tong W, Hong H, Fang H, Xie Q, Perkins R. Decision forest: combining the predictions of multiple independent decision tree models. J Chem Inf Comput Sci. 2003;43:525–531
  24. Vapnik V. The nature of statistical learning theory. New York: Springer; 1995;
  25. Tibshirani R, Hastie T, Narasimhan B, Chu G. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci USA. 2002;99:6567–6572
  26. Kim H, Loh W-Y. Classification trees with unbiased multiway splits. J Am Stat Assoc. 2001;96:589–604
  27. Loh W-Y, Shih Y-S. Split selection methods for classification trees. Statistica Sinica. 1997;7:815–840
  28. Tan AC, Gilbert D. Ensemble machine learning on gene expression data for cancer classification. Appl Bioinformatics. 2003;2:S75–S83
  29. Long PM, Vega VB. Boosting and microarray data. Mach Learn. 2003;52:31–44
  30. Chen JJ, Tsai CA, Young JF, Kodell RL. Classification ensembles for unbalanced class sizes in predictive toxicology. SAR QSAR Environ Res. 2005;16:517–529
  31. Miller A. Subset selection in regression. 2nd ed.. Los Angeles, CA: Chapman and Hall/CRC; 2002;
  32. Lam L, Suen CY. Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Trans Syst Man Cybern Part A: Syst Hum. 1997;27:553–568
  33. Kuncheva LI, Whitaker CJ, Shipp CA, Duin RPW. Limits on the majority vote accuracy in classifier fusion. Pattern Anal Appl. 2003;6:22–31
  34. Ahn H, Chen JJ. Tree-structured logistic regression model for over-dispersed binomial data with application to modeling developmental effects. Biometrics. 1997;53:435–455
  35. Williams DA. The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. Biometrics. 1975;31:949–952
  36. Zhao Y, Pinilla C, Valmon D, Martin R, Simon R. Application of support vector machines for T-cell epitopes prediction. Bioinformatics. 2003;19:1978–1984
  37. Molinaro AM, Simon R, Pfeiffer RM. Prediction error estimation: a comparison of resampling methods. Bioinformatics. 2005;21:3301–3307
  38. Vose JM. Current approaches to the management of non-Hodgkin's lymphoma. Semin Oncol. 1998;25:483–491
  39. Hastie T, Tibshirani R, Sherlock G, Eisen M, Brown P, Botstein D. Imputing missing data for gene expression arrays. Stanford University Statistics Department Technical report, Stanford University, CA; 1999.
  40. Ambroise C, McLachlan G. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci USA. 2002;99:6562–6566
  41. Ridge JR. Reimbursement and coverage challenges associated with bringing emerging molecular diagnostics into the personalized medicine paradigm. Personalized Med. 2006;3(3):345–348

PII: S0933-3657(07)00086-3

doi: 10.1016/j.artmed.2007.07.003

Artificial Intelligence in Medicine
Volume 41, Issue 3 , Pages 197-207 , November 2007