Artificial Intelligence in Medicine
Volume 35, Issue 1 , Pages 107-119, September 2005

Computational modeling of oligonucleotide positional densities for human promoter prediction

Department of Computer Science, S16 #06-02, 3 Science Drive 2, National University of Singapore, Singapore 117543, Singapore

Received 30 October 2004; received in revised form 31 January 2005; accepted 22 February 2005.

Summary 

Objective:

The gene promoter region controls transcriptional initiation of a gene, which is the most important step in gene regulation. In-silico detection of promoter region in genomic sequences has a number of applications in gene discovery and understanding gene expression regulation. However, computational prediction of eukaryotic poly-II promoters has remained a difficult task. This paper introduces a novel statistical technique for detecting promoter regions in long genomic sequences.

Method:

A number of existing techniques analyze the occurrence frequencies of oligonucleotides in promoter sequences as compared to other genomic regions. In contrast, the present work studies the positional densities of oligonucleotides in promoter sequences. The analysis does not require any non-promoter sequence dataset or any model of the background oligonucleotide content of the genome. The statistical model learnt from a dataset of promoter sequences automatically recognizes a number of transcription factor binding sites simultaneously with their occurrence positions relative to the transcription start site. Based on this model, a continuous naïve Bayes classifier is developed for the detection of human promoters and transcription start sites in genomic sequences.

Results:

The present study extends the scope of statistical models in general promoter modeling and prediction. Promoter sequence features learnt by the model correlate well with known biological facts. Results of human transcription start site prediction compare favorably with existing 2nd generation promoter prediction tools.

Keywords: Promoter modeling, Bayesian networks, Regulatory region prediction

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

 Availability: Binary executable of the promoter prediction model, named BayesProm, is available at: http://www.comp.nus.edu.sg/∼bioinfo/BayesProm (accessed: 1 May 2005).

PII: S0933-3657(05)00055-2

doi:10.1016/j.artmed.2005.02.005

Artificial Intelligence in Medicine
Volume 35, Issue 1 , Pages 107-119, September 2005