Artificial Intelligence in Medicine
Volume 45, Issue 2 , Pages 151-162, February 2009

Dataset complexity in gene expression based cancer classification using ensembles of k-nearest neighbors

  • Oleg Okun

      Affiliations

    • University of Oulu, Department of Electrical and Information Engineering, P.O. Box 4500, Oulu 90014, Finland
    • Corresponding Author InformationCorresponding author. Tel.: +358 8 5532898; fax: +358 8 5532612.
  • ,
  • Helen Priisalu

      Affiliations

    • Tallinn University of Technology, Institute of Cybernetics, Akadeemia Tee 21, Tallinn 12618, Estonia

Received 12 November 2007; received in revised form 5 August 2008; accepted 6 August 2008.

Summary 

Objective

We explore the link between dataset complexity, determining how difficult a dataset is for classification, and classification performance defined by low-variance and low-biased bolstered resubstitution error made by k-nearest neighbor classifiers.

Methods and material

Gene expression based cancer classification is used as the task in this study. Six gene expression datasets containing different types of cancer constitute test data.

Results

Through extensive simulation coupled with the copula method for analysis of association in bivariate data, we show that dataset complexity and bolstered resubstitution error are associated in terms of dependence. As a result, we propose a new scheme for generating ensembles of classifiers that selects subsets of features of low complexity for ensemble members, which constitutes the accurate members according to the found dependence relation.

Conclusion

Experiments with six gene expression datasets demonstrate that our ensemble generating scheme based on the dependence of dataset complexity and classification error is superior to a single best classifier in the ensemble and to the traditional ensemble construction scheme that is ignorant of dataset complexity.

Keywords: Pattern recognition, Gene expression, Cancer classification, k-nearest neighbors, Ensemble of classifiers

MSC: 68T10

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

PII: S0933-3657(08)00111-5

doi:10.1016/j.artmed.2008.08.004

Artificial Intelligence in Medicine
Volume 45, Issue 2 , Pages 151-162, February 2009