Artificial Intelligence in Medicine
Volume 48, Issue 2 , Pages 91-98, February 2010

Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps

  • Rui Xu

      Affiliations

    • Applied Computational Intelligence Laboratory, Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO 65409-0249, USA
    • Corresponding Author InformationCorresponding author. Tel.: +1 573 341 6811; fax: +1 573 341 4532.
  • ,
  • Steven Damelin

      Affiliations

    • Department of Mathematical Sciences, Georgia Southern University, Statesboro, GA 30460-8093, USA
  • ,
  • Boaz Nadler

      Affiliations

    • Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel
  • ,
  • Donald C. Wunsch II

      Affiliations

    • Applied Computational Intelligence Laboratory, Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO 65409-0249, USA

Received 14 August 2008; received in revised form 24 June 2009; accepted 30 June 2009.

Abstract 

Objective

The importance of gene expression data in cancer diagnosis and treatment has become widely known by cancer researchers in recent years. However, one of the major challenges in the computational analysis of such data is the curse of dimensionality because of the overwhelming number of variables measured (genes) versus the small number of samples. Here, we use a two-step method to reduce the dimension of gene expression data and aim to address the problem of high dimensionality.

Methods

First, we extract a subset of genes based on statistical characteristics of their corresponding gene expression levels. Then, for further dimensionality reduction, we apply diffusion maps, which interpret the eigenfunctions of Markov matrices as a system of coordinates on the original data set, in order to obtain efficient representation of data geometric descriptions. Finally, a neural network clustering theory, fuzzy ART, is applied to the resulting data to generate clusters of cancer samples.

Results

Experimental results on the small round blue-cell tumor data set, compared with other widely used clustering algorithms, such as the hierarchical clustering algorithm and K-means, show that our proposed method can effectively identify different cancer types and generate high-quality cancer sample clusters.

Conclusion

The proposed feature selection methods and diffusion maps can achieve useful information from the multidimensional gene expression data and prove effective at addressing the problem of high dimensionality inherent in gene expression data analysis.

Keywords: Clustering, Diffusion maps, Feature filtering, Fuzzy ART, Gene expression profiles

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

PII: S0933-3657(09)00100-6

doi:10.1016/j.artmed.2009.06.001

Artificial Intelligence in Medicine
Volume 48, Issue 2 , Pages 91-98, February 2010