Performance of computational tools in evaluating the functional impact of laboratory-induced amino acid mutations

Vanessa E. Gray, Kimberly R. Kukurba, Sudhir Kumar

Research output: Contribution to journalArticle

37 Citations (Scopus)

Abstract

Summary: Site-directed mutagenesis is frequently used by scientists to investigate the functional impact of amino acid mutations in the laboratory. Over 10 000 such laboratory-induced mutations have been reported in the UniProt database along with the outcomes of functional assays. Here, we explore the performance of state-of-the-art computational tools (Condel, PolyPhen-2 and SIFT) in correctly annotating the function-altering potential of 10 913 laboratory-induced mutations from 2372 proteins. We find that computational tools are very successful in diagnosing laboratory-induced mutations that elicit significant functional change in the laboratory (up to 92% accuracy). But, these tools consistently fail in correctly annotating laboratory-induced mutations that show no functional impact in the laboratory assays. Therefore, the overall accuracy of computational tools for laboratory-induced mutations is much lower than that observed for the naturally occurring human variants. We tested and rejected the possibilities that the preponderance of changes to alanine and the presence of multiple base-pair mutations in the laboratory were the reasons for the observed discordance between the performance of computational tools for natural and laboratory mutations. Instead, we discover that the laboratory-induced mutations occur predominately at the highly conserved positions in proteins, where the computational tools have the lowest accuracy of correct prediction for variants that do not impact function (neutral). Therefore, the comparisons of experimental-profiling results with those from computational predictions need to be sensitive to the evolutionary conservation of the positions harboring the amino acid change.

Original languageEnglish (US)
Article numberbts336
Pages (from-to)2093-2096
Number of pages4
JournalBioinformatics
Volume28
Issue number16
DOIs
StatePublished - Aug 2012

Fingerprint

Amino Acids
Amino acids
Mutation
Site-directed mutagenesis
Assays
Protein
Scale Invariant Feature Transform
Prediction
Proteins
Potential Function
Profiling
Mutagenesis
Conservation
Lowest
Site-Directed Mutagenesis
Base Pairing
Alanine
Databases

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability
  • Medicine(all)

Cite this

Performance of computational tools in evaluating the functional impact of laboratory-induced amino acid mutations. / Gray, Vanessa E.; Kukurba, Kimberly R.; Kumar, Sudhir.

In: Bioinformatics, Vol. 28, No. 16, bts336, 08.2012, p. 2093-2096.

Research output: Contribution to journalArticle

Gray, Vanessa E. ; Kukurba, Kimberly R. ; Kumar, Sudhir. / Performance of computational tools in evaluating the functional impact of laboratory-induced amino acid mutations. In: Bioinformatics. 2012 ; Vol. 28, No. 16. pp. 2093-2096.
@article{e4b623cd557c4303937bc08c5ad3051e,
title = "Performance of computational tools in evaluating the functional impact of laboratory-induced amino acid mutations",
abstract = "Summary: Site-directed mutagenesis is frequently used by scientists to investigate the functional impact of amino acid mutations in the laboratory. Over 10 000 such laboratory-induced mutations have been reported in the UniProt database along with the outcomes of functional assays. Here, we explore the performance of state-of-the-art computational tools (Condel, PolyPhen-2 and SIFT) in correctly annotating the function-altering potential of 10 913 laboratory-induced mutations from 2372 proteins. We find that computational tools are very successful in diagnosing laboratory-induced mutations that elicit significant functional change in the laboratory (up to 92{\%} accuracy). But, these tools consistently fail in correctly annotating laboratory-induced mutations that show no functional impact in the laboratory assays. Therefore, the overall accuracy of computational tools for laboratory-induced mutations is much lower than that observed for the naturally occurring human variants. We tested and rejected the possibilities that the preponderance of changes to alanine and the presence of multiple base-pair mutations in the laboratory were the reasons for the observed discordance between the performance of computational tools for natural and laboratory mutations. Instead, we discover that the laboratory-induced mutations occur predominately at the highly conserved positions in proteins, where the computational tools have the lowest accuracy of correct prediction for variants that do not impact function (neutral). Therefore, the comparisons of experimental-profiling results with those from computational predictions need to be sensitive to the evolutionary conservation of the positions harboring the amino acid change.",
author = "Gray, {Vanessa E.} and Kukurba, {Kimberly R.} and Sudhir Kumar",
year = "2012",
month = "8",
doi = "10.1093/bioinformatics/bts336",
language = "English (US)",
volume = "28",
pages = "2093--2096",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "16",

}

TY - JOUR

T1 - Performance of computational tools in evaluating the functional impact of laboratory-induced amino acid mutations

AU - Gray, Vanessa E.

AU - Kukurba, Kimberly R.

AU - Kumar, Sudhir

PY - 2012/8

Y1 - 2012/8

N2 - Summary: Site-directed mutagenesis is frequently used by scientists to investigate the functional impact of amino acid mutations in the laboratory. Over 10 000 such laboratory-induced mutations have been reported in the UniProt database along with the outcomes of functional assays. Here, we explore the performance of state-of-the-art computational tools (Condel, PolyPhen-2 and SIFT) in correctly annotating the function-altering potential of 10 913 laboratory-induced mutations from 2372 proteins. We find that computational tools are very successful in diagnosing laboratory-induced mutations that elicit significant functional change in the laboratory (up to 92% accuracy). But, these tools consistently fail in correctly annotating laboratory-induced mutations that show no functional impact in the laboratory assays. Therefore, the overall accuracy of computational tools for laboratory-induced mutations is much lower than that observed for the naturally occurring human variants. We tested and rejected the possibilities that the preponderance of changes to alanine and the presence of multiple base-pair mutations in the laboratory were the reasons for the observed discordance between the performance of computational tools for natural and laboratory mutations. Instead, we discover that the laboratory-induced mutations occur predominately at the highly conserved positions in proteins, where the computational tools have the lowest accuracy of correct prediction for variants that do not impact function (neutral). Therefore, the comparisons of experimental-profiling results with those from computational predictions need to be sensitive to the evolutionary conservation of the positions harboring the amino acid change.

AB - Summary: Site-directed mutagenesis is frequently used by scientists to investigate the functional impact of amino acid mutations in the laboratory. Over 10 000 such laboratory-induced mutations have been reported in the UniProt database along with the outcomes of functional assays. Here, we explore the performance of state-of-the-art computational tools (Condel, PolyPhen-2 and SIFT) in correctly annotating the function-altering potential of 10 913 laboratory-induced mutations from 2372 proteins. We find that computational tools are very successful in diagnosing laboratory-induced mutations that elicit significant functional change in the laboratory (up to 92% accuracy). But, these tools consistently fail in correctly annotating laboratory-induced mutations that show no functional impact in the laboratory assays. Therefore, the overall accuracy of computational tools for laboratory-induced mutations is much lower than that observed for the naturally occurring human variants. We tested and rejected the possibilities that the preponderance of changes to alanine and the presence of multiple base-pair mutations in the laboratory were the reasons for the observed discordance between the performance of computational tools for natural and laboratory mutations. Instead, we discover that the laboratory-induced mutations occur predominately at the highly conserved positions in proteins, where the computational tools have the lowest accuracy of correct prediction for variants that do not impact function (neutral). Therefore, the comparisons of experimental-profiling results with those from computational predictions need to be sensitive to the evolutionary conservation of the positions harboring the amino acid change.

UR - http://www.scopus.com/inward/record.url?scp=84865064881&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84865064881&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bts336

DO - 10.1093/bioinformatics/bts336

M3 - Article

C2 - 22685075

AN - SCOPUS:84865064881

VL - 28

SP - 2093

EP - 2096

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 16

M1 - bts336

ER -