A new formulation of protein evolutionary models that account for structural constraints

Andrew J. Bordner, Hans Mittelmann

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Despite the importance of a thermodynamically stable structure with a conserved fold for protein function, almost all evolutionary models neglect site-site correlations that arise from physical interactions between neighboring amino acid sites. This is mainly due to the difficulty in formulating a computationally tractable model since rate matrices can no longer be used. Here, we introduce a general framework, based on factor graphs, for constructing probabilistic models of protein evolution with site interdependence. Conveniently, efficient approximate inference algorithms, such as Belief Propagation, can be used to calculate likelihoods for these models. We fit an amino acid substitution model of this type that accounts for both solvent accessibility and site-site correlations. Comparisons of the new model with rate matrix models and alternative structure-dependent models demonstrate that it better fits the sequence data. We also examine evolution within a family of homohexameric enzymes and find that site-site correlations between most contacting subunits contribute to a higher likelihood. In addition, we show that the new substitution model has a similar mathematical form to the one introduced in Rodrigue et al. (Rodrigue N, Lartillot N, Bryant D, Philippe H. 2005. Site interdependence attributed to tertiary structure in amino acid sequence evolution. Gene 347:207-217), although with different parameter interpretations and values. We also perform a statistical analysis of the effects of amino acids at neighboring sites on substitution probabilities and find a significant perturbation of most probabilities, further supporting the significant role of site-site interactions in protein evolution and motivating the development of new evolutionary models similar to the one described here. Finally, we discuss possible extensions and applications of the new substitution model.

Original languageEnglish (US)
Pages (from-to)736-749
Number of pages14
JournalMolecular Biology and Evolution
Volume31
Issue number3
DOIs
StatePublished - 2014

Fingerprint

Structural Models
protein
Amino Acids
Proteins
proteins
Statistical Models
Amino Acid Substitution
substitution
Amino Acid Sequence
amino acid
Enzymes
Genes
probabilistic models
amino acids
matrix
amino acid substitution
accessibility
statistical analysis
amino acid sequences
perturbation

Keywords

  • approximate inference algorithms
  • factor graphs
  • protein evolution
  • protein structure
  • site-site correlations
  • substitution models

ASJC Scopus subject areas

  • Genetics
  • Molecular Biology
  • Ecology, Evolution, Behavior and Systematics

Cite this

A new formulation of protein evolutionary models that account for structural constraints. / Bordner, Andrew J.; Mittelmann, Hans.

In: Molecular Biology and Evolution, Vol. 31, No. 3, 2014, p. 736-749.

Research output: Contribution to journalArticle

@article{69075816757148af9ff9770602d1448e,
title = "A new formulation of protein evolutionary models that account for structural constraints",
abstract = "Despite the importance of a thermodynamically stable structure with a conserved fold for protein function, almost all evolutionary models neglect site-site correlations that arise from physical interactions between neighboring amino acid sites. This is mainly due to the difficulty in formulating a computationally tractable model since rate matrices can no longer be used. Here, we introduce a general framework, based on factor graphs, for constructing probabilistic models of protein evolution with site interdependence. Conveniently, efficient approximate inference algorithms, such as Belief Propagation, can be used to calculate likelihoods for these models. We fit an amino acid substitution model of this type that accounts for both solvent accessibility and site-site correlations. Comparisons of the new model with rate matrix models and alternative structure-dependent models demonstrate that it better fits the sequence data. We also examine evolution within a family of homohexameric enzymes and find that site-site correlations between most contacting subunits contribute to a higher likelihood. In addition, we show that the new substitution model has a similar mathematical form to the one introduced in Rodrigue et al. (Rodrigue N, Lartillot N, Bryant D, Philippe H. 2005. Site interdependence attributed to tertiary structure in amino acid sequence evolution. Gene 347:207-217), although with different parameter interpretations and values. We also perform a statistical analysis of the effects of amino acids at neighboring sites on substitution probabilities and find a significant perturbation of most probabilities, further supporting the significant role of site-site interactions in protein evolution and motivating the development of new evolutionary models similar to the one described here. Finally, we discuss possible extensions and applications of the new substitution model.",
keywords = "approximate inference algorithms, factor graphs, protein evolution, protein structure, site-site correlations, substitution models",
author = "Bordner, {Andrew J.} and Hans Mittelmann",
year = "2014",
doi = "10.1093/molbev/mst240",
language = "English (US)",
volume = "31",
pages = "736--749",
journal = "Molecular Biology and Evolution",
issn = "0737-4038",
publisher = "Oxford University Press",
number = "3",

}

TY - JOUR

T1 - A new formulation of protein evolutionary models that account for structural constraints

AU - Bordner, Andrew J.

AU - Mittelmann, Hans

PY - 2014

Y1 - 2014

N2 - Despite the importance of a thermodynamically stable structure with a conserved fold for protein function, almost all evolutionary models neglect site-site correlations that arise from physical interactions between neighboring amino acid sites. This is mainly due to the difficulty in formulating a computationally tractable model since rate matrices can no longer be used. Here, we introduce a general framework, based on factor graphs, for constructing probabilistic models of protein evolution with site interdependence. Conveniently, efficient approximate inference algorithms, such as Belief Propagation, can be used to calculate likelihoods for these models. We fit an amino acid substitution model of this type that accounts for both solvent accessibility and site-site correlations. Comparisons of the new model with rate matrix models and alternative structure-dependent models demonstrate that it better fits the sequence data. We also examine evolution within a family of homohexameric enzymes and find that site-site correlations between most contacting subunits contribute to a higher likelihood. In addition, we show that the new substitution model has a similar mathematical form to the one introduced in Rodrigue et al. (Rodrigue N, Lartillot N, Bryant D, Philippe H. 2005. Site interdependence attributed to tertiary structure in amino acid sequence evolution. Gene 347:207-217), although with different parameter interpretations and values. We also perform a statistical analysis of the effects of amino acids at neighboring sites on substitution probabilities and find a significant perturbation of most probabilities, further supporting the significant role of site-site interactions in protein evolution and motivating the development of new evolutionary models similar to the one described here. Finally, we discuss possible extensions and applications of the new substitution model.

AB - Despite the importance of a thermodynamically stable structure with a conserved fold for protein function, almost all evolutionary models neglect site-site correlations that arise from physical interactions between neighboring amino acid sites. This is mainly due to the difficulty in formulating a computationally tractable model since rate matrices can no longer be used. Here, we introduce a general framework, based on factor graphs, for constructing probabilistic models of protein evolution with site interdependence. Conveniently, efficient approximate inference algorithms, such as Belief Propagation, can be used to calculate likelihoods for these models. We fit an amino acid substitution model of this type that accounts for both solvent accessibility and site-site correlations. Comparisons of the new model with rate matrix models and alternative structure-dependent models demonstrate that it better fits the sequence data. We also examine evolution within a family of homohexameric enzymes and find that site-site correlations between most contacting subunits contribute to a higher likelihood. In addition, we show that the new substitution model has a similar mathematical form to the one introduced in Rodrigue et al. (Rodrigue N, Lartillot N, Bryant D, Philippe H. 2005. Site interdependence attributed to tertiary structure in amino acid sequence evolution. Gene 347:207-217), although with different parameter interpretations and values. We also perform a statistical analysis of the effects of amino acids at neighboring sites on substitution probabilities and find a significant perturbation of most probabilities, further supporting the significant role of site-site interactions in protein evolution and motivating the development of new evolutionary models similar to the one described here. Finally, we discuss possible extensions and applications of the new substitution model.

KW - approximate inference algorithms

KW - factor graphs

KW - protein evolution

KW - protein structure

KW - site-site correlations

KW - substitution models

UR - http://www.scopus.com/inward/record.url?scp=84895752277&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84895752277&partnerID=8YFLogxK

U2 - 10.1093/molbev/mst240

DO - 10.1093/molbev/mst240

M3 - Article

C2 - 24307688

AN - SCOPUS:84895752277

VL - 31

SP - 736

EP - 749

JO - Molecular Biology and Evolution

JF - Molecular Biology and Evolution

SN - 0737-4038

IS - 3

ER -