Swapping Metagenomics Preprocessing Pipeline Components Offers Speed and Sensitivity Increases

George Armstrong, Cameron Martino, Justin Morris, Behnam Khaleghi, Jaeyoung Kang, Jeff DeReus, Qiyun Zhu, Daniel Roush, Daniel McDonald, Antonio Gonazlez, Justin P. Shaffer, Carolina Carpenter, Mehrbod Estaki, Stephen Wandro, Sean Eilert, Ameen Akel, Justin Eno, Ken Curewitz, Austin D. Swafford, Niema MoshiriTajana Rosing, Rob Knight

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


Increasing data volumes on high-throughput sequencing instruments such as the NovaSeq 6000 leads to long computational bottlenecks for common metagenomics data preprocessing tasks such as adaptor and primer trimming and host removal. Here, we test whether faster recently developed computational tools (Fastp and Minimap2) can replace widely used choices (Atropos and Bowtie2), obtaining dramatic accelerations with additional sensitivity and minimal loss of specificity for these tasks. Furthermore, the taxonomic tables resulting from downstream processing provide biologically comparable results. However, we demonstrate that for taxonomic assignment, Bowtie2's specificity is still required. We suggest that periodic reevaluation of pipeline components, together with improvements to standardized APIs to chain them together, will greatly enhance the efficiency of common bioinformatics tasks while also facilitating incorporation of further optimized steps running on GPUs, FPGAs, or other architectures. We also note that a detailed exploration of available algorithms and pipeline components is an important step that should be taken before optimization of less efficient algorithms on advanced or nonstandard hardware.

Original languageEnglish (US)
Issue number2
StatePublished - Apr 2022


  • alignment
  • host filtering
  • metagenomics

ASJC Scopus subject areas

  • Microbiology
  • Physiology
  • Biochemistry
  • Ecology, Evolution, Behavior and Systematics
  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computer Science Applications


Dive into the research topics of 'Swapping Metagenomics Preprocessing Pipeline Components Offers Speed and Sensitivity Increases'. Together they form a unique fingerprint.

Cite this