Artificial Intelligence in Medicine
Volume 35, Issue 1 , Pages 9-18, September 2005

Finding the biologically optimal alignment of multiple sequences

Institute for Chemical Research, Kyoto University, Gokasho, Uji 611-0011, Japan

Received 8 November 2004; received in revised form 27 December 2004; accepted 12 January 2005.

Summary 

Objective:

Deterministic annealing, which is derived from statistical physics, is a method for obtaining the global optimum in parameter space. During the annealing process, starting from high temperatures which are then lowered, deterministic annealing deterministically find the (global) optimum at each temperature. Thus, deterministic annealing is expected to be more computationally efficient than stochastic sampling strategies to obtain the global optimum. We propose to apply the deterministic annealing technique to the problem of efficiently finding the biologically optimal alignment of multiple sequences.

Methods and material:

We take a strategy based on probabilistic models for aligning multiple sequences. That is, we train a probabilistic model using given training sequences and obtain their alignment by parsing, i.e. searching for the most likely parse of each sequence and gaps using the trained parameters of the model. In this scenario, we propose a new stochastic model, which is simple enough to be suited to multiple sequence alignment and, unlike existing stochastic models, say a profile hidden Markov model (HMM), allows us to use similarity scores between symbols (or a symbol and a gap). We further present a learning algorithm for our simple model by combining deterministic annealing with an expectation–maximization (EM) algorithm. We emphasize that our approach is time-efficient, even if the training is done through an annealing process.

Results:

In our experiments, we used actual protein sequences whose three-dimensional (3D) structures are determined and which are all aligned based on their 3D structures. We compared the results obtained by our approach with those by other existing approaches. Experimental results clearly showed that our approach gave the best performance, in terms of the similarity to the structurally determined alignment, among the approaches tested. Experimental results further indicated that our approach was ten times more efficient in terms of actual computation time than a competing method.

Keywords: Multiple sequence alignment, Multiple columns model, Similarity scores, Deterministic annealing, Maximum entropy

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

PII: S0933-3657(05)00062-X

doi:10.1016/j.artmed.2005.01.007

Artificial Intelligence in Medicine
Volume 35, Issue 1 , Pages 9-18, September 2005