Evolutionary balancing is critical for correctly forecasting disease-associated amino acid variants

Li Liu, Sudhir Kumar

Research output: Contribution to journalArticlepeer-review

15 Scopus citations


Computational predictions have become indispensable for evaluating the disease-related impact of nonsynonymous single-nucleotide variants discovered in exome sequencing. Many such methods have their roots in molecular evolution, as they use information derived from multiple sequence alignments. We show that the performance of current methods (e.g., PolyPhen-2 and SIFT) is improved significantly by optimizing their statistical models on evolutionarily balanced training data, where equal numbers of positive and negative controls within each evolutionary conservation class are used. Evolutionary balancing significantly reduces the false-positive rates for variants observed at highly conserved sites and false-negative rates for variants observed at fast evolving sites. Use of these improved methods enables more accurate forecasting when concordant diagnosis from multiple methods is regarded as a more reliable indicator of the prediction. Applied to a large exome variation data set, we find that the current methods produce concordant predictions for less than half of the population variants. These advances are implemented in a web resource for use in practical applications (www.mypeg.info, last accessed March 13, 2013).

Original languageEnglish (US)
Pages (from-to)1252-1257
Number of pages6
JournalMolecular biology and evolution
Issue number6
StatePublished - Jun 2013


  • computational prediction
  • evolutionary medicine
  • nonsynonymous single nucleotide variant

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Molecular Biology
  • Genetics


Dive into the research topics of 'Evolutionary balancing is critical for correctly forecasting disease-associated amino acid variants'. Together they form a unique fingerprint.

Cite this