Performance evaluation of six popular short-read simulators

Mark Milhaven, Susanne P. Pfeifer

Research output: Contribution to journalArticlepeer-review

Abstract

High-throughput sequencing data enables the comprehensive study of genomes and the variation therein. Essential for the interpretation of this genomic data is a thorough understanding of the computational methods used for processing and analysis. Whereas “gold-standard” empirical datasets exist for this purpose in humans, synthetic (i.e., simulated) sequencing data can offer important insights into the capabilities and limitations of computational pipelines for any arbitrary species and/or study design—yet, the ability of read simulator software to emulate genomic characteristics of empirical datasets remains poorly understood. We here compare the performance of six popular short-read simulators—ART, DWGSIM, InSilicoSeq, Mason, NEAT, and wgsim—and discuss important considerations for selecting suitable models for benchmarking.

Original languageEnglish (US)
JournalHeredity
DOIs
StateAccepted/In press - 2022
Externally publishedYes

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Fingerprint

Dive into the research topics of 'Performance evaluation of six popular short-read simulators'. Together they form a unique fingerprint.

Cite this