Controlling the false-positive rate in multilocus genome scans for selection

Kevin R. Thornton, Jeffrey Jensen

Research output: Contribution to journalArticle

107 Citations (Scopus)

Abstract

Rapid typing of genetic variation at many regions of the genome is an efficient way to survey variability in natural populations in an effort to identify segments of the genome that have experienced recent natural selection. Following such a genome scan, individual regions may be chosen for further sequencing and a more detailed analysis of patterns of variability, often to perform a parametric test for selection and to estimate the strength of a recent selective sweep. We show here that not accounting for the ascertainment of loci in such analyses leads to false inference of natural selection when the true model is selective neutrality, because the procedure of choosing unusual loci (in comparison to the rest of the genome-scan data) selects regions of the genome with genealogies similar to those expected under models of recent directional selection.We describe a simple and efficient correction for this ascertainment bias, which restores the false-positive rate to near-nominal levels. For the parameters considered here, we find that obtaining a test with the expected distribution of P-values depends on accurately accounting both for ascertainment of regions and for demography. Finally, we use simulations to explore the utility of relying on outlier loci to detect recent selective sweeps. We find that measures of diversity and of population differentiation are more effective than summaries of the site-frequency spectrum and that sequencing larger regions (2.5 kbp) in genome-scan studies leads to more power to detect recent selective sweeps.

Original languageEnglish (US)
Pages (from-to)737-750
Number of pages14
JournalGenetics
Volume175
Issue number2
DOIs
StatePublished - Feb 2007
Externally publishedYes

Fingerprint

Genome
Genetic Selection
Genealogy and Heraldry
Population
Demography

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

Controlling the false-positive rate in multilocus genome scans for selection. / Thornton, Kevin R.; Jensen, Jeffrey.

In: Genetics, Vol. 175, No. 2, 02.2007, p. 737-750.

Research output: Contribution to journalArticle

@article{84ed81b7117e4723800f2cff2ccfa0be,
title = "Controlling the false-positive rate in multilocus genome scans for selection",
abstract = "Rapid typing of genetic variation at many regions of the genome is an efficient way to survey variability in natural populations in an effort to identify segments of the genome that have experienced recent natural selection. Following such a genome scan, individual regions may be chosen for further sequencing and a more detailed analysis of patterns of variability, often to perform a parametric test for selection and to estimate the strength of a recent selective sweep. We show here that not accounting for the ascertainment of loci in such analyses leads to false inference of natural selection when the true model is selective neutrality, because the procedure of choosing unusual loci (in comparison to the rest of the genome-scan data) selects regions of the genome with genealogies similar to those expected under models of recent directional selection.We describe a simple and efficient correction for this ascertainment bias, which restores the false-positive rate to near-nominal levels. For the parameters considered here, we find that obtaining a test with the expected distribution of P-values depends on accurately accounting both for ascertainment of regions and for demography. Finally, we use simulations to explore the utility of relying on outlier loci to detect recent selective sweeps. We find that measures of diversity and of population differentiation are more effective than summaries of the site-frequency spectrum and that sequencing larger regions (2.5 kbp) in genome-scan studies leads to more power to detect recent selective sweeps.",
author = "Thornton, {Kevin R.} and Jeffrey Jensen",
year = "2007",
month = "2",
doi = "10.1534/genetics.106.064642",
language = "English (US)",
volume = "175",
pages = "737--750",
journal = "Genetics",
issn = "0016-6731",
publisher = "Genetics Society of America",
number = "2",

}

TY - JOUR

T1 - Controlling the false-positive rate in multilocus genome scans for selection

AU - Thornton, Kevin R.

AU - Jensen, Jeffrey

PY - 2007/2

Y1 - 2007/2

N2 - Rapid typing of genetic variation at many regions of the genome is an efficient way to survey variability in natural populations in an effort to identify segments of the genome that have experienced recent natural selection. Following such a genome scan, individual regions may be chosen for further sequencing and a more detailed analysis of patterns of variability, often to perform a parametric test for selection and to estimate the strength of a recent selective sweep. We show here that not accounting for the ascertainment of loci in such analyses leads to false inference of natural selection when the true model is selective neutrality, because the procedure of choosing unusual loci (in comparison to the rest of the genome-scan data) selects regions of the genome with genealogies similar to those expected under models of recent directional selection.We describe a simple and efficient correction for this ascertainment bias, which restores the false-positive rate to near-nominal levels. For the parameters considered here, we find that obtaining a test with the expected distribution of P-values depends on accurately accounting both for ascertainment of regions and for demography. Finally, we use simulations to explore the utility of relying on outlier loci to detect recent selective sweeps. We find that measures of diversity and of population differentiation are more effective than summaries of the site-frequency spectrum and that sequencing larger regions (2.5 kbp) in genome-scan studies leads to more power to detect recent selective sweeps.

AB - Rapid typing of genetic variation at many regions of the genome is an efficient way to survey variability in natural populations in an effort to identify segments of the genome that have experienced recent natural selection. Following such a genome scan, individual regions may be chosen for further sequencing and a more detailed analysis of patterns of variability, often to perform a parametric test for selection and to estimate the strength of a recent selective sweep. We show here that not accounting for the ascertainment of loci in such analyses leads to false inference of natural selection when the true model is selective neutrality, because the procedure of choosing unusual loci (in comparison to the rest of the genome-scan data) selects regions of the genome with genealogies similar to those expected under models of recent directional selection.We describe a simple and efficient correction for this ascertainment bias, which restores the false-positive rate to near-nominal levels. For the parameters considered here, we find that obtaining a test with the expected distribution of P-values depends on accurately accounting both for ascertainment of regions and for demography. Finally, we use simulations to explore the utility of relying on outlier loci to detect recent selective sweeps. We find that measures of diversity and of population differentiation are more effective than summaries of the site-frequency spectrum and that sequencing larger regions (2.5 kbp) in genome-scan studies leads to more power to detect recent selective sweeps.

UR - http://www.scopus.com/inward/record.url?scp=34247891656&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34247891656&partnerID=8YFLogxK

U2 - 10.1534/genetics.106.064642

DO - 10.1534/genetics.106.064642

M3 - Article

C2 - 17110489

AN - SCOPUS:34247891656

VL - 175

SP - 737

EP - 750

JO - Genetics

JF - Genetics

SN - 0016-6731

IS - 2

ER -