Artificial Intelligence in Medicine
Volume 41, Issue 2 , Pages 105-115, October 2007

Gene Ontology analysis in multiple gene clusters under multiple hypothesis testing framework

  • Sheng Zhong

      Affiliations

    • Department of Bioengineering, University of Illinois at Urbana Champaign, United States
    • Department of Statistics, University of Illinois at Urbana Champaign, United States
    • Department of Computer Science, University of Illinois at Urbana Champaign, United States
    • Institute of Genomic Biology, University of Illinois at Urbana Champaign, United States
    • Corresponding Author InformationCorresponding author at: 3215 Digital Computer Lab, MC-278 1304 W. Springfield Avenue, Urbana, IL 61801, United States. Tel.: +1 217 265 6589; fax: +1 217 265 0246.
  • ,
  • Dan Xie

      Affiliations

    • Department of Bioengineering, University of Illinois at Urbana Champaign, United States

Received 2 December 2006; received in revised form 2 August 2007; accepted 3 August 2007.

Summary 

Objective

Gene Ontology (GO) has become a routine resource for functional analysis of gene lists. Although a number of tools have been provided to identify enriched GO terms in one or two gene lists, two technical challenges remain. First, how to handle multiple hypothesis testing in the analysis given that the tests are heavily correlated; second, how to identify GO terms that are enriched in a gene cluster, as compared to multiple other gene clusters. We provide a statistical procedure to rigorously treat these problems and offer a software tool for applying GO to the analysis of gene clusters.

Methods

We previously introduced a statistical procedure that handles hypothesis testing in a two-group comparison scenario. In this paper we extend the two-group comparison procedure into a general procedure that enables the analysis of any number of gene lists/clusters. This new procedure enables identification of GO terms enriched in any gene cluster, while it controls for multiple hypothesis testing. This procedure is implemented into a user-friendly analysis tool: GoSurfer. The current version of GoSurfer takes one or several gene lists as input, and it identifies the GO terms that are enriched in any of the input gene lists. GoSurfer estimates a conservative false discovery rate (FDR) for every GO term. The FDR estimation procedure in GoSurfer has two advantages: it does not rely on independence assumption, and it does not assume all the hypotheses are null hypothesis (complete null). Thus GoSurfer's FDR estimates are mildly conservative rather than overly conservative.

Results

We implemented the new procedure for GO analysis in multiple gene clusters into the GoSurfer software. We provide three examples on using GoSurfer to analyze time course gene expression data sets on the differentiation of embryonic stem cells. In the example of analysis of multiple gene clusters, we first used a typical clustering algorithm and identified five gene clusters, representing up-regulation, down-regulation and other patterns in the differentiation time course. Taking all the five gene clusters as input data, GoSurfer reports “cell adhesion” and “muscle contraction” as significant GO terms for the up-regulated cluster, “amino acids metabolism” as a significant GO term for the down-regulated gene cluster, and GoSurfer reports a number of GO terms related to RNA processing and RNA transport as significant terms to a cluster that is up-regulated in both early and late time points. This may suggest that genes for RNA processing and genes for RNA transport are coregulated in the differentiation process of embryonic stem cells.

Conclusion

The GoSurfer software is provided to analyze multiple gene clusters and identify GO terms that are enriched in any gene cluster. Gosurfer is available at: www.gosurfer.org.

Keywords: Gene Ontology, Gene cluster, Multiple hypothesis testing, False discovery rate, GoSurfer

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

PII: S0933-3657(07)00100-5

doi:10.1016/j.artmed.2007.08.002

Artificial Intelligence in Medicine
Volume 41, Issue 2 , Pages 105-115, October 2007