Controlling the false-positive rate in multilocus genome scans for selection

Kevin R. Thornton; Jeffrey D. Jensen

doi:10.1534/genetics.106.064642

Controlling the false-positive rate in multilocus genome scans for selection

Kevin R. Thornton, Jeffrey D. Jensen

Research output: Contribution to journal › Article › peer-review

133 Scopus citations

Abstract

Rapid typing of genetic variation at many regions of the genome is an efficient way to survey variability in natural populations in an effort to identify segments of the genome that have experienced recent natural selection. Following such a genome scan, individual regions may be chosen for further sequencing and a more detailed analysis of patterns of variability, often to perform a parametric test for selection and to estimate the strength of a recent selective sweep. We show here that not accounting for the ascertainment of loci in such analyses leads to false inference of natural selection when the true model is selective neutrality, because the procedure of choosing unusual loci (in comparison to the rest of the genome-scan data) selects regions of the genome with genealogies similar to those expected under models of recent directional selection.We describe a simple and efficient correction for this ascertainment bias, which restores the false-positive rate to near-nominal levels. For the parameters considered here, we find that obtaining a test with the expected distribution of P-values depends on accurately accounting both for ascertainment of regions and for demography. Finally, we use simulations to explore the utility of relying on outlier loci to detect recent selective sweeps. We find that measures of diversity and of population differentiation are more effective than summaries of the site-frequency spectrum and that sequencing larger regions (2.5 kbp) in genome-scan studies leads to more power to detect recent selective sweeps.

Original language	English (US)
Pages (from-to)	737-750
Number of pages	14
Journal	Genetics
Volume	175
Issue number	2
DOIs	https://doi.org/10.1534/genetics.106.064642
State	Published - Feb 2007
Externally published	Yes

ASJC Scopus subject areas

Genetics

Access to Document

10.1534/genetics.106.064642

Cite this

@article{84ed81b7117e4723800f2cff2ccfa0be,

title = "Controlling the false-positive rate in multilocus genome scans for selection",

abstract = "Rapid typing of genetic variation at many regions of the genome is an efficient way to survey variability in natural populations in an effort to identify segments of the genome that have experienced recent natural selection. Following such a genome scan, individual regions may be chosen for further sequencing and a more detailed analysis of patterns of variability, often to perform a parametric test for selection and to estimate the strength of a recent selective sweep. We show here that not accounting for the ascertainment of loci in such analyses leads to false inference of natural selection when the true model is selective neutrality, because the procedure of choosing unusual loci (in comparison to the rest of the genome-scan data) selects regions of the genome with genealogies similar to those expected under models of recent directional selection.We describe a simple and efficient correction for this ascertainment bias, which restores the false-positive rate to near-nominal levels. For the parameters considered here, we find that obtaining a test with the expected distribution of P-values depends on accurately accounting both for ascertainment of regions and for demography. Finally, we use simulations to explore the utility of relying on outlier loci to detect recent selective sweeps. We find that measures of diversity and of population differentiation are more effective than summaries of the site-frequency spectrum and that sequencing larger regions (2.5 kbp) in genome-scan studies leads to more power to detect recent selective sweeps.",

author = "Thornton, {Kevin R.} and Jensen, {Jeffrey D.}",

year = "2007",

month = feb,

doi = "10.1534/genetics.106.064642",

language = "English (US)",

volume = "175",

pages = "737--750",

journal = "Genetics",

issn = "0016-6731",

publisher = "Genetics Society of America",

number = "2",

}

TY - JOUR

T1 - Controlling the false-positive rate in multilocus genome scans for selection

AU - Thornton, Kevin R.

AU - Jensen, Jeffrey D.

PY - 2007/2

Y1 - 2007/2

N2 - Rapid typing of genetic variation at many regions of the genome is an efficient way to survey variability in natural populations in an effort to identify segments of the genome that have experienced recent natural selection. Following such a genome scan, individual regions may be chosen for further sequencing and a more detailed analysis of patterns of variability, often to perform a parametric test for selection and to estimate the strength of a recent selective sweep. We show here that not accounting for the ascertainment of loci in such analyses leads to false inference of natural selection when the true model is selective neutrality, because the procedure of choosing unusual loci (in comparison to the rest of the genome-scan data) selects regions of the genome with genealogies similar to those expected under models of recent directional selection.We describe a simple and efficient correction for this ascertainment bias, which restores the false-positive rate to near-nominal levels. For the parameters considered here, we find that obtaining a test with the expected distribution of P-values depends on accurately accounting both for ascertainment of regions and for demography. Finally, we use simulations to explore the utility of relying on outlier loci to detect recent selective sweeps. We find that measures of diversity and of population differentiation are more effective than summaries of the site-frequency spectrum and that sequencing larger regions (2.5 kbp) in genome-scan studies leads to more power to detect recent selective sweeps.

AB - Rapid typing of genetic variation at many regions of the genome is an efficient way to survey variability in natural populations in an effort to identify segments of the genome that have experienced recent natural selection. Following such a genome scan, individual regions may be chosen for further sequencing and a more detailed analysis of patterns of variability, often to perform a parametric test for selection and to estimate the strength of a recent selective sweep. We show here that not accounting for the ascertainment of loci in such analyses leads to false inference of natural selection when the true model is selective neutrality, because the procedure of choosing unusual loci (in comparison to the rest of the genome-scan data) selects regions of the genome with genealogies similar to those expected under models of recent directional selection.We describe a simple and efficient correction for this ascertainment bias, which restores the false-positive rate to near-nominal levels. For the parameters considered here, we find that obtaining a test with the expected distribution of P-values depends on accurately accounting both for ascertainment of regions and for demography. Finally, we use simulations to explore the utility of relying on outlier loci to detect recent selective sweeps. We find that measures of diversity and of population differentiation are more effective than summaries of the site-frequency spectrum and that sequencing larger regions (2.5 kbp) in genome-scan studies leads to more power to detect recent selective sweeps.

UR - http://www.scopus.com/inward/record.url?scp=34247891656&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34247891656&partnerID=8YFLogxK

U2 - 10.1534/genetics.106.064642

DO - 10.1534/genetics.106.064642

M3 - Article

C2 - 17110489

AN - SCOPUS:34247891656

SN - 0016-6731

VL - 175

SP - 737

EP - 750

JO - Genetics

JF - Genetics

IS - 2

ER -

Controlling the false-positive rate in multilocus genome scans for selection

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this