From next-generation resequencing reads to a high-quality variant data set

Susanne Pfeifer

doi:10.1038/hdy.2016.102

From next-generation resequencing reads to a high-quality variant data set

Susanne Pfeifer

Research output: Contribution to journal › Review article › peer-review

48 Scopus citations

Abstract

Sequencing has revolutionized biology by permitting the analysis of genomic variation at an unprecedented resolution. High-throughput sequencing is fast and inexpensive, making it accessible for a wide range of research topics. However, the produced data contain subtle but complex types of errors, biases and uncertainties that impose several statistical and computational challenges to the reliable detection of variants. To tap the full potential of high-throughput sequencing, a thorough understanding of the data produced as well as the available methodologies is required. Here, I review several commonly used methods for generating and processing next-generation resequencing data, discuss the influence of errors and biases together with their resulting implications for downstream analyses and provide general guidelines and recommendations for producing high-quality single-nucleotide polymorphism data sets from raw reads by highlighting several sophisticated reference-based methods representing the current state of the art.

Original language	English (US)
Pages (from-to)	111-124
Number of pages	14
Journal	Heredity
Volume	118
Issue number	2
DOIs	https://doi.org/10.1038/hdy.2016.102
State	Published - Feb 1 2017

ASJC Scopus subject areas

Genetics
Genetics(clinical)

Access to Document

10.1038/hdy.2016.102

Cite this

@article{d2553559e9cb4a818927f7acfbeed294,

title = "From next-generation resequencing reads to a high-quality variant data set",

abstract = "Sequencing has revolutionized biology by permitting the analysis of genomic variation at an unprecedented resolution. High-throughput sequencing is fast and inexpensive, making it accessible for a wide range of research topics. However, the produced data contain subtle but complex types of errors, biases and uncertainties that impose several statistical and computational challenges to the reliable detection of variants. To tap the full potential of high-throughput sequencing, a thorough understanding of the data produced as well as the available methodologies is required. Here, I review several commonly used methods for generating and processing next-generation resequencing data, discuss the influence of errors and biases together with their resulting implications for downstream analyses and provide general guidelines and recommendations for producing high-quality single-nucleotide polymorphism data sets from raw reads by highlighting several sophisticated reference-based methods representing the current state of the art.",

author = "Susanne Pfeifer",

note = "Publisher Copyright: {\textcopyright} 2017 Macmillan Publishers Limited, part of Springer Nature.",

year = "2017",

month = feb,

day = "1",

doi = "10.1038/hdy.2016.102",

language = "English (US)",

volume = "118",

pages = "111--124",

journal = "Heredity",

issn = "0018-067X",

publisher = "Nature Publishing Group",

number = "2",

}

TY - JOUR

T1 - From next-generation resequencing reads to a high-quality variant data set

AU - Pfeifer, Susanne

PY - 2017/2/1

Y1 - 2017/2/1

N2 - Sequencing has revolutionized biology by permitting the analysis of genomic variation at an unprecedented resolution. High-throughput sequencing is fast and inexpensive, making it accessible for a wide range of research topics. However, the produced data contain subtle but complex types of errors, biases and uncertainties that impose several statistical and computational challenges to the reliable detection of variants. To tap the full potential of high-throughput sequencing, a thorough understanding of the data produced as well as the available methodologies is required. Here, I review several commonly used methods for generating and processing next-generation resequencing data, discuss the influence of errors and biases together with their resulting implications for downstream analyses and provide general guidelines and recommendations for producing high-quality single-nucleotide polymorphism data sets from raw reads by highlighting several sophisticated reference-based methods representing the current state of the art.

AB - Sequencing has revolutionized biology by permitting the analysis of genomic variation at an unprecedented resolution. High-throughput sequencing is fast and inexpensive, making it accessible for a wide range of research topics. However, the produced data contain subtle but complex types of errors, biases and uncertainties that impose several statistical and computational challenges to the reliable detection of variants. To tap the full potential of high-throughput sequencing, a thorough understanding of the data produced as well as the available methodologies is required. Here, I review several commonly used methods for generating and processing next-generation resequencing data, discuss the influence of errors and biases together with their resulting implications for downstream analyses and provide general guidelines and recommendations for producing high-quality single-nucleotide polymorphism data sets from raw reads by highlighting several sophisticated reference-based methods representing the current state of the art.

UR - http://www.scopus.com/inward/record.url?scp=84991585337&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84991585337&partnerID=8YFLogxK

U2 - 10.1038/hdy.2016.102

DO - 10.1038/hdy.2016.102

M3 - Review article

C2 - 27759079

AN - SCOPUS:84991585337

SN - 0018-067X

VL - 118

SP - 111

EP - 124

JO - Heredity

JF - Heredity

IS - 2

ER -

From next-generation resequencing reads to a high-quality variant data set

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this