Artificial Intelligence in Medicine
Volume 48, Issue 2 , Pages 75-82, February 2010

A GMM-IG framework for selecting genes as expression panel biomarkers

  • Mingyi Wang

      Affiliations

    • School of Informatics, Indiana University, 535 W. Michigan Street, Indianapolis, IN 46202, USA
  • ,
  • Jake Y. Chen

      Affiliations

    • School of Informatics, Indiana University, 535 W. Michigan Street, Indianapolis, IN 46202, USA
    • Department of Computer and Information Science, Purdue University School of Science, Indianapolis, IN 46202, USA
    • Indiana Center for Systems Biology and Personalized Medicine, 719 N. Indiana Ave, WK Suite #190, Indianapolis, IN 46202, USA
    • Corresponding Author InformationCorresponding author at: Department of Computer and Information Science, Purdue University School of Science, Indianapolis, IN 46202, USA. Tel.: +1 317 278 7604; fax: +1 317 278 9201.

Received 28 August 2008; received in revised form 29 June 2009; accepted 2 July 2009.

Abstract 

Objective

The limitation of small sample size of functional genomics experiments has made it necessary to integrate DNA microarray experimental data from different sources. However, experimentation noises and biases of different microarray platforms have made integrated data analysis challenging. In this work, we propose an integrative computational framework to identify candidate biomarker genes from publicly available functional genomics studies.

Methods

We developed a new framework, Gaussian Mixture Modeling-Coupled Information Gain (GMM-IG). In this framework, we first apply a two-component Gaussian mixture model (GMM) to estimate the conditional probability distributions of gene expression data between two different types of samples, for example, normal versus cancer. An expectation-maximization algorithm is then used to estimate the maximum likelihood parameters of a mixture of two Gaussian models in the feature space and determine the underlying expression levels of genes. Gene expression results from different studies are discretized, based on GMM estimations and then unified. Significantly differentially-expressed genes are filtered and assessed with information gain (IG) measures.

Results

DNA microarray experimental data for lung cancers from three different prior studies was processed using the new GMM-IG method. Target gene markers from a gene expression panel were selected and compared with several conventional computational biomarker data analysis methods. GMM-IG showed consistently high accuracy for several classification assessments. A high reproducibility of gene selection results was also determined from statistical validations. Our study shows that the GMM-IG framework can overcome poor reliability issues from single-study DNA microarray experiment while maintaining high accuracies by combining true signals from multiple studies.

Conclusions

We present a conceptually simple framework that enables reliable integration of true differential gene expression signals from multiple microarray experiments. This novel computational method has been shown to generate interesting biomarker panels for lung cancer studies. It is promising as a general strategy for future panel biomarker development, especially for applications that requires integrating experimental results generated from different research centers or with different technology platforms.

Keywords: Gene selection, Data integration, Microarray data, Lung cancer, Gaussian mixture model, Information gain

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

PII: S0933-3657(09)00097-9

doi:10.1016/j.artmed.2009.07.006

Artificial Intelligence in Medicine
Volume 48, Issue 2 , Pages 75-82, February 2010