Identification of fundamental building blocks in protein sequences using statistical association measures

Deborah Weisser, Judith Klein-Seetharaman

Research output: Contribution to conferencePaperpeer-review

9 Scopus citations

Abstract

Protein sequence data is abundant, yet derivation of structural features from sequence alone is generally restricted to prediction of domain architecture, secondary structure elements and motifs. Precise feature boundaries cannot be determined reliably, and it is unknown to what extent these features constitute fundamental building blocks of protein sequences, a question with particular relevance to protein folding. Here we propose a statistical approach using mutual information, a measure of association, to predict feature boundaries. In this approach, proteins are viewed as strings of adjacent, non-overlapping features, where each feature is a subsequence of the protein, and the union of the features is the entire protein. Mutual information values are measured between nearby amino acids along sequences, and low values are indicators for feature boundaries. These boundaries are then predicted using a flexible partitioning algorithm. The algorithms presented in this paper were tested on the GPCR protein family and subfamilies. A comparison with segment boundaries implied indirectly from secondary structure prediction and expert knowledge demonstrates that the algorithm can be used to statistically predict feature positions in protein sequences generically, without assumptions on the feature type to be detected. Access to the data used and algorithms presented in this paper are available at flan.blm.cs.cmu.edu.

Original languageEnglish (US)
Pages154-161
Number of pages8
DOIs
StatePublished - 2004
Externally publishedYes
EventApplied Computing 2004 - Proceedings of the 2004 ACM Symposium on Applied Computing - Nicosia, Cyprus
Duration: Mar 14 2004Mar 17 2004

Conference

ConferenceApplied Computing 2004 - Proceedings of the 2004 ACM Symposium on Applied Computing
Country/TerritoryCyprus
CityNicosia
Period3/14/043/17/04

Keywords

  • Feature prediction
  • G-protein coupled receptors
  • Mutual information
  • Rhodopsin

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Identification of fundamental building blocks in protein sequences using statistical association measures'. Together they form a unique fingerprint.

Cite this