Evolutionary balancing is critical for correctly forecasting disease-associated amino acid variants

Li Liu, Sudhir Kumar

Research output: Contribution to journalArticle

11 Scopus citations

Abstract

Computational predictions have become indispensable for evaluating the disease-related impact of nonsynonymous single-nucleotide variants discovered in exome sequencing. Many such methods have their roots in molecular evolution, as they use information derived from multiple sequence alignments. We show that the performance of current methods (e.g., PolyPhen-2 and SIFT) is improved significantly by optimizing their statistical models on evolutionarily balanced training data, where equal numbers of positive and negative controls within each evolutionary conservation class are used. Evolutionary balancing significantly reduces the false-positive rates for variants observed at highly conserved sites and false-negative rates for variants observed at fast evolving sites. Use of these improved methods enables more accurate forecasting when concordant diagnosis from multiple methods is regarded as a more reliable indicator of the prediction. Applied to a large exome variation data set, we find that the current methods produce concordant predictions for less than half of the population variants. These advances are implemented in a web resource for use in practical applications (www.mypeg.info, last accessed March 13, 2013).

Original languageEnglish (US)
Pages (from-to)1252-1257
Number of pages6
JournalMolecular Biology and Evolution
Volume30
Issue number6
DOIs
StatePublished - Jun 2013

    Fingerprint

Keywords

  • computational prediction
  • evolutionary medicine
  • nonsynonymous single nucleotide variant

ASJC Scopus subject areas

  • Genetics
  • Molecular Biology
  • Ecology, Evolution, Behavior and Systematics

Cite this