Inferring selection in partially sequenced regions

Jeffrey D. Jensen; Kevin R. Thornton; Charles F. Aquadro

doi:10.1093/molbev/msm273

Inferring selection in partially sequenced regions

Jeffrey D. Jensen, Kevin R. Thornton, Charles F. Aquadro

Research output: Contribution to journal › Article › peer-review

11 Scopus citations

Abstract

A common approach for identifying loci influenced by positive selection involves scanning large portions of the genome for regions that are inconsistent with the neutral equilibrium model or represent outliers relative to the empirical distribution of some aspect of the data. Once identified, partial sequence is generated spanning this more localized region in order to quantify the site-frequency spectrum and evaluate the data with tests of neutrality and selection. This method is widely used as partial sequencing is less expensive with regard to both time and money. Here, we demonstrate that this approach can lead to biased maximum likelihood estimates of selection parameters and reduced rejection rates, with some parameter combinations resulting in clearly misleading results. Most significantly, for a commonly used sample size in Drosophila population genetics (i.e., n = 12), the estimate of the target of selection has a large mean square error and the strength of selection is severely under estimated when the true selected site has not been sampled. We propose sequencing approaches that are much more likely to accurately localize the target and estimate the strength of selection. Additionally, we examine the performance of a commonly used test of selection under a variety of recurrent and single sweep models.

Original language	English (US)
Pages (from-to)	438-446
Number of pages	9
Journal	Molecular biology and evolution
Volume	25
Issue number	2
DOIs	https://doi.org/10.1093/molbev/msm273
State	Published - Feb 2008
Externally published	Yes

Keywords

Composite likelihood
Natural selection
Recurrent selection
Selective sweeps

ASJC Scopus subject areas

Ecology, Evolution, Behavior and Systematics
Molecular Biology
Genetics

Access to Document

10.1093/molbev/msm273

Cite this

@article{dc74ac889998478594bf41edcdfffe00,

title = "Inferring selection in partially sequenced regions",

abstract = "A common approach for identifying loci influenced by positive selection involves scanning large portions of the genome for regions that are inconsistent with the neutral equilibrium model or represent outliers relative to the empirical distribution of some aspect of the data. Once identified, partial sequence is generated spanning this more localized region in order to quantify the site-frequency spectrum and evaluate the data with tests of neutrality and selection. This method is widely used as partial sequencing is less expensive with regard to both time and money. Here, we demonstrate that this approach can lead to biased maximum likelihood estimates of selection parameters and reduced rejection rates, with some parameter combinations resulting in clearly misleading results. Most significantly, for a commonly used sample size in Drosophila population genetics (i.e., n = 12), the estimate of the target of selection has a large mean square error and the strength of selection is severely under estimated when the true selected site has not been sampled. We propose sequencing approaches that are much more likely to accurately localize the target and estimate the strength of selection. Additionally, we examine the performance of a commonly used test of selection under a variety of recurrent and single sweep models.",

keywords = "Composite likelihood, Natural selection, Recurrent selection, Selective sweeps",

author = "Jensen, {Jeffrey D.} and Thornton, {Kevin R.} and Aquadro, {Charles F.}",

year = "2008",

month = feb,

doi = "10.1093/molbev/msm273",

language = "English (US)",

volume = "25",

pages = "438--446",

journal = "Molecular biology and evolution",

issn = "0737-4038",

publisher = "Oxford University Press",

number = "2",

}

TY - JOUR

T1 - Inferring selection in partially sequenced regions

AU - Jensen, Jeffrey D.

AU - Thornton, Kevin R.

AU - Aquadro, Charles F.

PY - 2008/2

Y1 - 2008/2

N2 - A common approach for identifying loci influenced by positive selection involves scanning large portions of the genome for regions that are inconsistent with the neutral equilibrium model or represent outliers relative to the empirical distribution of some aspect of the data. Once identified, partial sequence is generated spanning this more localized region in order to quantify the site-frequency spectrum and evaluate the data with tests of neutrality and selection. This method is widely used as partial sequencing is less expensive with regard to both time and money. Here, we demonstrate that this approach can lead to biased maximum likelihood estimates of selection parameters and reduced rejection rates, with some parameter combinations resulting in clearly misleading results. Most significantly, for a commonly used sample size in Drosophila population genetics (i.e., n = 12), the estimate of the target of selection has a large mean square error and the strength of selection is severely under estimated when the true selected site has not been sampled. We propose sequencing approaches that are much more likely to accurately localize the target and estimate the strength of selection. Additionally, we examine the performance of a commonly used test of selection under a variety of recurrent and single sweep models.

AB - A common approach for identifying loci influenced by positive selection involves scanning large portions of the genome for regions that are inconsistent with the neutral equilibrium model or represent outliers relative to the empirical distribution of some aspect of the data. Once identified, partial sequence is generated spanning this more localized region in order to quantify the site-frequency spectrum and evaluate the data with tests of neutrality and selection. This method is widely used as partial sequencing is less expensive with regard to both time and money. Here, we demonstrate that this approach can lead to biased maximum likelihood estimates of selection parameters and reduced rejection rates, with some parameter combinations resulting in clearly misleading results. Most significantly, for a commonly used sample size in Drosophila population genetics (i.e., n = 12), the estimate of the target of selection has a large mean square error and the strength of selection is severely under estimated when the true selected site has not been sampled. We propose sequencing approaches that are much more likely to accurately localize the target and estimate the strength of selection. Additionally, we examine the performance of a commonly used test of selection under a variety of recurrent and single sweep models.

KW - Composite likelihood

KW - Natural selection

KW - Recurrent selection

KW - Selective sweeps

UR - http://www.scopus.com/inward/record.url?scp=38949155535&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38949155535&partnerID=8YFLogxK

U2 - 10.1093/molbev/msm273

DO - 10.1093/molbev/msm273

M3 - Article

C2 - 18165259

AN - SCOPUS:38949155535

SN - 0737-4038

VL - 25

SP - 438

EP - 446

JO - Molecular biology and evolution

JF - Molecular biology and evolution

IS - 2

ER -

Inferring selection in partially sequenced regions

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this