A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms

Sing Hoi Sze, Jonathan J. Parrott, Aaron M. Tarone

Research output: Contribution to journalArticlepeer-review

Abstract

BACKGROUND: While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies.

RESULTS: We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies.

CONCLUSIONS: Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies.

Original languageEnglish (US)
Article number895
Pages (from-to)895
Number of pages1
JournalBMC Genomics
Volume18
DOIs
StatePublished - Dec 6 2017
Externally publishedYes

Keywords

  • Divide-and-conquer
  • RNA-Seq
  • de novo transcriptome assembly

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Fingerprint

Dive into the research topics of 'A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms'. Together they form a unique fingerprint.

Cite this