Coevolving residues inform protein dynamics profiles and disease susceptibility of nSNVs

Brandon M. Butler, I. Can Kazan, Avishek Kumar, Sefika Ozkan

Research output: Contribution to journalArticle

Abstract

The conformational dynamics of proteins is rarely used in methodologies used to predict the impact of genetic mutations due to the paucity of three-dimensional protein structures as compared to the vast number of available sequences. Until now a three-dimensional (3D) structure has been required to predict the conformational dynamics of a protein. We introduce an approach that estimates the conformational dynamics of a protein, without relying on structural information. This de novo approach utilizes coevolving residues identified from a multiple sequence alignment (MSA) using Potts models. These coevolving residues are used as contacts in a Gaussian network model (GNM) to obtain protein dynamics. B-factors calculated using sequence-based GNM (Seq-GNM) are in agreement with crystallographic B-factors as well as theoretical B-factors from the original GNM that utilizes the 3D structure. Moreover, we demonstrate the ability of the calculated B-factors from the Seq-GNM approach to discriminate genomic variants according to their phenotypes for a wide range of proteins. These results suggest that protein dynamics can be approximated based on sequence information alone, making it possible to assess the phenotypes of nSNVs in cases where a 3D structure is unknown. We hope this work will promote the use of dynamics information in genetic disease prediction at scale by circumventing the need for 3D structures.

Original languageEnglish (US)
Article numbere1006626
JournalPLoS Computational Biology
Volume14
Issue number11
DOIs
StatePublished - Nov 1 2018

Fingerprint

Disease Susceptibility
Susceptibility
disease resistance
Proteins
Protein
protein
Gaussian Model
Network Model
proteins
Phenotype
phenotype
Predict
Three-dimensional
Potts model
Multiple Sequence Alignment
Potts Model
genetic disorders
sequence alignment
Protein Structure
protein structure

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Modeling and Simulation
  • Ecology
  • Molecular Biology
  • Genetics
  • Cellular and Molecular Neuroscience
  • Computational Theory and Mathematics

Cite this

Coevolving residues inform protein dynamics profiles and disease susceptibility of nSNVs. / Butler, Brandon M.; Kazan, I. Can; Kumar, Avishek; Ozkan, Sefika.

In: PLoS Computational Biology, Vol. 14, No. 11, e1006626, 01.11.2018.

Research output: Contribution to journalArticle

@article{418308a9c27b4f5abc7036a5babb4a82,
title = "Coevolving residues inform protein dynamics profiles and disease susceptibility of nSNVs",
abstract = "The conformational dynamics of proteins is rarely used in methodologies used to predict the impact of genetic mutations due to the paucity of three-dimensional protein structures as compared to the vast number of available sequences. Until now a three-dimensional (3D) structure has been required to predict the conformational dynamics of a protein. We introduce an approach that estimates the conformational dynamics of a protein, without relying on structural information. This de novo approach utilizes coevolving residues identified from a multiple sequence alignment (MSA) using Potts models. These coevolving residues are used as contacts in a Gaussian network model (GNM) to obtain protein dynamics. B-factors calculated using sequence-based GNM (Seq-GNM) are in agreement with crystallographic B-factors as well as theoretical B-factors from the original GNM that utilizes the 3D structure. Moreover, we demonstrate the ability of the calculated B-factors from the Seq-GNM approach to discriminate genomic variants according to their phenotypes for a wide range of proteins. These results suggest that protein dynamics can be approximated based on sequence information alone, making it possible to assess the phenotypes of nSNVs in cases where a 3D structure is unknown. We hope this work will promote the use of dynamics information in genetic disease prediction at scale by circumventing the need for 3D structures.",
author = "Butler, {Brandon M.} and Kazan, {I. Can} and Avishek Kumar and Sefika Ozkan",
year = "2018",
month = "11",
day = "1",
doi = "10.1371/journal.pcbi.1006626",
language = "English (US)",
volume = "14",
journal = "PLoS Computational Biology",
issn = "1553-734X",
publisher = "Public Library of Science",
number = "11",

}

TY - JOUR

T1 - Coevolving residues inform protein dynamics profiles and disease susceptibility of nSNVs

AU - Butler, Brandon M.

AU - Kazan, I. Can

AU - Kumar, Avishek

AU - Ozkan, Sefika

PY - 2018/11/1

Y1 - 2018/11/1

N2 - The conformational dynamics of proteins is rarely used in methodologies used to predict the impact of genetic mutations due to the paucity of three-dimensional protein structures as compared to the vast number of available sequences. Until now a three-dimensional (3D) structure has been required to predict the conformational dynamics of a protein. We introduce an approach that estimates the conformational dynamics of a protein, without relying on structural information. This de novo approach utilizes coevolving residues identified from a multiple sequence alignment (MSA) using Potts models. These coevolving residues are used as contacts in a Gaussian network model (GNM) to obtain protein dynamics. B-factors calculated using sequence-based GNM (Seq-GNM) are in agreement with crystallographic B-factors as well as theoretical B-factors from the original GNM that utilizes the 3D structure. Moreover, we demonstrate the ability of the calculated B-factors from the Seq-GNM approach to discriminate genomic variants according to their phenotypes for a wide range of proteins. These results suggest that protein dynamics can be approximated based on sequence information alone, making it possible to assess the phenotypes of nSNVs in cases where a 3D structure is unknown. We hope this work will promote the use of dynamics information in genetic disease prediction at scale by circumventing the need for 3D structures.

AB - The conformational dynamics of proteins is rarely used in methodologies used to predict the impact of genetic mutations due to the paucity of three-dimensional protein structures as compared to the vast number of available sequences. Until now a three-dimensional (3D) structure has been required to predict the conformational dynamics of a protein. We introduce an approach that estimates the conformational dynamics of a protein, without relying on structural information. This de novo approach utilizes coevolving residues identified from a multiple sequence alignment (MSA) using Potts models. These coevolving residues are used as contacts in a Gaussian network model (GNM) to obtain protein dynamics. B-factors calculated using sequence-based GNM (Seq-GNM) are in agreement with crystallographic B-factors as well as theoretical B-factors from the original GNM that utilizes the 3D structure. Moreover, we demonstrate the ability of the calculated B-factors from the Seq-GNM approach to discriminate genomic variants according to their phenotypes for a wide range of proteins. These results suggest that protein dynamics can be approximated based on sequence information alone, making it possible to assess the phenotypes of nSNVs in cases where a 3D structure is unknown. We hope this work will promote the use of dynamics information in genetic disease prediction at scale by circumventing the need for 3D structures.

UR - http://www.scopus.com/inward/record.url?scp=85058093196&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058093196&partnerID=8YFLogxK

U2 - 10.1371/journal.pcbi.1006626

DO - 10.1371/journal.pcbi.1006626

M3 - Article

VL - 14

JO - PLoS Computational Biology

JF - PLoS Computational Biology

SN - 1553-734X

IS - 11

M1 - e1006626

ER -