Inferring selection in partially sequenced regions

Jeffrey Jensen, Kevin R. Thornton, Charles F. Aquadro

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

A common approach for identifying loci influenced by positive selection involves scanning large portions of the genome for regions that are inconsistent with the neutral equilibrium model or represent outliers relative to the empirical distribution of some aspect of the data. Once identified, partial sequence is generated spanning this more localized region in order to quantify the site-frequency spectrum and evaluate the data with tests of neutrality and selection. This method is widely used as partial sequencing is less expensive with regard to both time and money. Here, we demonstrate that this approach can lead to biased maximum likelihood estimates of selection parameters and reduced rejection rates, with some parameter combinations resulting in clearly misleading results. Most significantly, for a commonly used sample size in Drosophila population genetics (i.e., n = 12), the estimate of the target of selection has a large mean square error and the strength of selection is severely under estimated when the true selected site has not been sampled. We propose sequencing approaches that are much more likely to accurately localize the target and estimate the strength of selection. Additionally, we examine the performance of a commonly used test of selection under a variety of recurrent and single sweep models.

Original languageEnglish (US)
Pages (from-to)438-446
Number of pages9
JournalMolecular Biology and Evolution
Volume25
Issue number2
DOIs
StatePublished - Feb 2008
Externally publishedYes

Fingerprint

Likelihood Functions
Population Genetics
Sample Size
Drosophila
Genome
Mean square error
Maximum likelihood
population genetics
Genes
testing
Scanning
loci
genome
sampling
outlier
methodology

Keywords

  • Composite likelihood
  • Natural selection
  • Recurrent selection
  • Selective sweeps

ASJC Scopus subject areas

  • Agricultural and Biological Sciences (miscellaneous)
  • Ecology, Evolution, Behavior and Systematics
  • Biochemistry, Genetics and Molecular Biology(all)
  • Biochemistry
  • Genetics
  • Molecular Biology
  • Genetics(clinical)

Cite this

Inferring selection in partially sequenced regions. / Jensen, Jeffrey; Thornton, Kevin R.; Aquadro, Charles F.

In: Molecular Biology and Evolution, Vol. 25, No. 2, 02.2008, p. 438-446.

Research output: Contribution to journalArticle

Jensen, Jeffrey ; Thornton, Kevin R. ; Aquadro, Charles F. / Inferring selection in partially sequenced regions. In: Molecular Biology and Evolution. 2008 ; Vol. 25, No. 2. pp. 438-446.
@article{dc74ac889998478594bf41edcdfffe00,
title = "Inferring selection in partially sequenced regions",
abstract = "A common approach for identifying loci influenced by positive selection involves scanning large portions of the genome for regions that are inconsistent with the neutral equilibrium model or represent outliers relative to the empirical distribution of some aspect of the data. Once identified, partial sequence is generated spanning this more localized region in order to quantify the site-frequency spectrum and evaluate the data with tests of neutrality and selection. This method is widely used as partial sequencing is less expensive with regard to both time and money. Here, we demonstrate that this approach can lead to biased maximum likelihood estimates of selection parameters and reduced rejection rates, with some parameter combinations resulting in clearly misleading results. Most significantly, for a commonly used sample size in Drosophila population genetics (i.e., n = 12), the estimate of the target of selection has a large mean square error and the strength of selection is severely under estimated when the true selected site has not been sampled. We propose sequencing approaches that are much more likely to accurately localize the target and estimate the strength of selection. Additionally, we examine the performance of a commonly used test of selection under a variety of recurrent and single sweep models.",
keywords = "Composite likelihood, Natural selection, Recurrent selection, Selective sweeps",
author = "Jeffrey Jensen and Thornton, {Kevin R.} and Aquadro, {Charles F.}",
year = "2008",
month = "2",
doi = "10.1093/molbev/msm273",
language = "English (US)",
volume = "25",
pages = "438--446",
journal = "Molecular Biology and Evolution",
issn = "0737-4038",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - Inferring selection in partially sequenced regions

AU - Jensen, Jeffrey

AU - Thornton, Kevin R.

AU - Aquadro, Charles F.

PY - 2008/2

Y1 - 2008/2

N2 - A common approach for identifying loci influenced by positive selection involves scanning large portions of the genome for regions that are inconsistent with the neutral equilibrium model or represent outliers relative to the empirical distribution of some aspect of the data. Once identified, partial sequence is generated spanning this more localized region in order to quantify the site-frequency spectrum and evaluate the data with tests of neutrality and selection. This method is widely used as partial sequencing is less expensive with regard to both time and money. Here, we demonstrate that this approach can lead to biased maximum likelihood estimates of selection parameters and reduced rejection rates, with some parameter combinations resulting in clearly misleading results. Most significantly, for a commonly used sample size in Drosophila population genetics (i.e., n = 12), the estimate of the target of selection has a large mean square error and the strength of selection is severely under estimated when the true selected site has not been sampled. We propose sequencing approaches that are much more likely to accurately localize the target and estimate the strength of selection. Additionally, we examine the performance of a commonly used test of selection under a variety of recurrent and single sweep models.

AB - A common approach for identifying loci influenced by positive selection involves scanning large portions of the genome for regions that are inconsistent with the neutral equilibrium model or represent outliers relative to the empirical distribution of some aspect of the data. Once identified, partial sequence is generated spanning this more localized region in order to quantify the site-frequency spectrum and evaluate the data with tests of neutrality and selection. This method is widely used as partial sequencing is less expensive with regard to both time and money. Here, we demonstrate that this approach can lead to biased maximum likelihood estimates of selection parameters and reduced rejection rates, with some parameter combinations resulting in clearly misleading results. Most significantly, for a commonly used sample size in Drosophila population genetics (i.e., n = 12), the estimate of the target of selection has a large mean square error and the strength of selection is severely under estimated when the true selected site has not been sampled. We propose sequencing approaches that are much more likely to accurately localize the target and estimate the strength of selection. Additionally, we examine the performance of a commonly used test of selection under a variety of recurrent and single sweep models.

KW - Composite likelihood

KW - Natural selection

KW - Recurrent selection

KW - Selective sweeps

UR - http://www.scopus.com/inward/record.url?scp=38949155535&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38949155535&partnerID=8YFLogxK

U2 - 10.1093/molbev/msm273

DO - 10.1093/molbev/msm273

M3 - Article

C2 - 18165259

AN - SCOPUS:38949155535

VL - 25

SP - 438

EP - 446

JO - Molecular Biology and Evolution

JF - Molecular Biology and Evolution

SN - 0737-4038

IS - 2

ER -