Coding of amino acids by texture descriptors
Abstract
Objective
In this paper we propose a new feature extractor for peptide/protein classification based on the calculation of texture descriptors. Representing a peptide/protein using a matrix descriptor, instead of a vector, allows to deal with the peptide/protein as an image and to use texture descriptors for representation purposes.
Methods and materials
A matrix descriptor, which is a squared matrix of the dimension of the peptide/protein, is obtained considering a partial ordering of the amino acids of the peptide/protein according to their value of a given physicochemical property. Each matrix descriptor is considered as a texture image and several texture descriptors are considered to obtain a compact representation which is scale invariant (i.e. independent on the length of the peptide\protein). The texture descriptors tested in this work are: local binary patterns (LBP), discrete cosine transform (DCT) and Daubechies wavelets.
Results and conclusion
The experimental section reports several tests, aimed at supporting our ideas, performed on the following datasets: vaccine dataset for the predictions of peptides that bind human leukocyte antigens; human immunodeficiency virus (HIV-1) protease cleavage site prediction dataset and membrane proteins type dataset.
The experimental results confirm the usefulness of the novel descriptors: the performance obtained by our system on the three difficult datasets is quite high, indicating that the proposed method is a feasible system for extracting information from peptides and proteins. The performance obtained by each of the three texture descriptors calculated from the matrix-based representation, and coupled to a support vector machine classifier, is lower than the performance obtained by other vector-based descriptors based on physicochemical properties proposed in the literature. Anyway the new descriptors bring different information and our tests show that the texture descriptors and the vector-based descriptors can be combined to improve the overall performance of the system.
In particular the proposed approach improves the state-of-the-art results in two out of three tested problems (HIV-1 protease cleavage site prediction dataset and membrane proteins type dataset).
Keywords: Protein classification, Peptide classification, Vaccine development, Locally binary patterns, Discrete cosine transform, Support vector machine
To access this article, please choose from the options below
PII: S0933-3657(09)00137-7
doi:10.1016/j.artmed.2009.10.001
© 2009 Elsevier B.V. All rights reserved.
