Estimation of allele frequencies from high-coverage genome-sequencing projects

Research output: Contribution to journalArticle

78 Citations (Scopus)

Abstract

A new generation of high-throughput sequencing strategies will soon lead to the acquisition of high-coverage genomic profiles of hundreds to thousands of individuals within species, generating unprecedented levels of information on the frequencies of nucleotides segregating at individual sites. However, because these new technologies are error prone and yield unevencoverage of alleles in diploid individuals, they also introduce the need for novel methods for analyzing the raw read data. A maximum-likelihood method for the estimation of allele frequencies is developed, eliminating both the need to arbitrarily discard individuals with low coverage and the requirement for an extrinsic measure of the sequence error rate. The resultant estimates are nearly unbiased with asymptotically minimal sampling variance, thereby defining the limits to our ability to estimate population-genetic parameters and providing a logical basis for the optimal design of population-genomic surveys.

Original languageEnglish (US)
Pages (from-to)295-301
Number of pages7
JournalGenetics
Volume182
Issue number1
DOIs
StatePublished - May 1 2009
Externally publishedYes

Fingerprint

Gene Frequency
Genome
Metagenomics
Population Genetics
Diploidy
Nucleotides
Alleles
Technology
Surveys and Questionnaires

ASJC Scopus subject areas

  • Genetics

Cite this

Estimation of allele frequencies from high-coverage genome-sequencing projects. / Lynch, Michael.

In: Genetics, Vol. 182, No. 1, 01.05.2009, p. 295-301.

Research output: Contribution to journalArticle

@article{b4263ecba54c4b3294c2ebd07b104fea,
title = "Estimation of allele frequencies from high-coverage genome-sequencing projects",
abstract = "A new generation of high-throughput sequencing strategies will soon lead to the acquisition of high-coverage genomic profiles of hundreds to thousands of individuals within species, generating unprecedented levels of information on the frequencies of nucleotides segregating at individual sites. However, because these new technologies are error prone and yield unevencoverage of alleles in diploid individuals, they also introduce the need for novel methods for analyzing the raw read data. A maximum-likelihood method for the estimation of allele frequencies is developed, eliminating both the need to arbitrarily discard individuals with low coverage and the requirement for an extrinsic measure of the sequence error rate. The resultant estimates are nearly unbiased with asymptotically minimal sampling variance, thereby defining the limits to our ability to estimate population-genetic parameters and providing a logical basis for the optimal design of population-genomic surveys.",
author = "Michael Lynch",
year = "2009",
month = "5",
day = "1",
doi = "10.1534/genetics.109.100479",
language = "English (US)",
volume = "182",
pages = "295--301",
journal = "Genetics",
issn = "0016-6731",
publisher = "Genetics Society of America",
number = "1",

}

TY - JOUR

T1 - Estimation of allele frequencies from high-coverage genome-sequencing projects

AU - Lynch, Michael

PY - 2009/5/1

Y1 - 2009/5/1

N2 - A new generation of high-throughput sequencing strategies will soon lead to the acquisition of high-coverage genomic profiles of hundreds to thousands of individuals within species, generating unprecedented levels of information on the frequencies of nucleotides segregating at individual sites. However, because these new technologies are error prone and yield unevencoverage of alleles in diploid individuals, they also introduce the need for novel methods for analyzing the raw read data. A maximum-likelihood method for the estimation of allele frequencies is developed, eliminating both the need to arbitrarily discard individuals with low coverage and the requirement for an extrinsic measure of the sequence error rate. The resultant estimates are nearly unbiased with asymptotically minimal sampling variance, thereby defining the limits to our ability to estimate population-genetic parameters and providing a logical basis for the optimal design of population-genomic surveys.

AB - A new generation of high-throughput sequencing strategies will soon lead to the acquisition of high-coverage genomic profiles of hundreds to thousands of individuals within species, generating unprecedented levels of information on the frequencies of nucleotides segregating at individual sites. However, because these new technologies are error prone and yield unevencoverage of alleles in diploid individuals, they also introduce the need for novel methods for analyzing the raw read data. A maximum-likelihood method for the estimation of allele frequencies is developed, eliminating both the need to arbitrarily discard individuals with low coverage and the requirement for an extrinsic measure of the sequence error rate. The resultant estimates are nearly unbiased with asymptotically minimal sampling variance, thereby defining the limits to our ability to estimate population-genetic parameters and providing a logical basis for the optimal design of population-genomic surveys.

UR - http://www.scopus.com/inward/record.url?scp=67849128665&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67849128665&partnerID=8YFLogxK

U2 - 10.1534/genetics.109.100479

DO - 10.1534/genetics.109.100479

M3 - Article

C2 - 19293142

AN - SCOPUS:67849128665

VL - 182

SP - 295

EP - 301

JO - Genetics

JF - Genetics

SN - 0016-6731

IS - 1

ER -