Artificial Intelligence in Medicine
Volume 41, Issue 3 , Pages 197-207, November 2007

Ensemble methods for classification of patients for personalized medicine with high-dimensional data

  • Hojin Moon

      Affiliations

    • Department of Mathematics and Statistics, California State University-Long Beach, 1250 Bellflower Blvd., Long Beach, CA 90840, USA
    • Corresponding Author InformationCorresponding author. Tel.: +1 501 366 4712.
  • ,
  • Hongshik Ahn

      Affiliations

    • Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794-3600, USA
  • ,
  • Ralph L. Kodell

      Affiliations

    • Department of Biostatistics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
  • ,
  • Songjoon Baek

      Affiliations

    • Division of Biometry and Risk Assessment, National Center for Toxicological Research, FDA, Jefferson, AR 72079, USA
  • ,
  • Chien-Ju Lin

      Affiliations

    • Division of Biometry and Risk Assessment, National Center for Toxicological Research, FDA, Jefferson, AR 72079, USA
  • ,
  • James J. Chen

      Affiliations

    • Division of Biometry and Risk Assessment, National Center for Toxicological Research, FDA, Jefferson, AR 72079, USA

Received 21 November 2006; received in revised form 18 June 2007; accepted 6 July 2007.

Summary 

Objective

Personalized medicine is defined by the use of genomic signatures of patients in a target population for assignment of more effective therapies as well as better diagnosis and earlier interventions that might prevent or delay disease. An objective is to find a novel classification algorithm that can be used for prediction of response to therapy in order to help individualize clinical assignment of treatment.

Methods and materials

Classification algorithms are required to be highly accurate for optimal treatment on each patient. Typically, there are numerous genomic and clinical variables over a relatively small number of patients, which presents challenges for most traditional classification algorithms to avoid over-fitting the data. We developed a robust classification algorithm for high-dimensional data based on ensembles of classifiers built from the optimal number of random partitions of the feature space. The software is available on request from the authors.

Results

The proposed algorithm is applied to genomic data sets on lymphoma patients and lung cancer patients to distinguish disease subtypes for optimal treatment and to genomic data on breast cancer patients to identify patients most likely to benefit from adjuvant chemotherapy after surgery. The performance of the proposed algorithm is consistently ranked highly compared to the other classification algorithms.

Conclusion

The statistical classification method for individualized treatment of diseases developed in this study is expected to play a critical role in developing safer and more effective therapies that replace one-size-fits-all drugs with treatments that focus on specific patient needs.

Keywords: Class prediction, Cross-validation, Ensembles, Majority voting, Risk profiling

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

PII: S0933-3657(07)00086-3

doi:10.1016/j.artmed.2007.07.003

Artificial Intelligence in Medicine
Volume 41, Issue 3 , Pages 197-207, November 2007