A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms

Sing Hoi Sze; Jonathan J. Parrott; Aaron M. Tarone

doi:10.1186/s12864-017-4270-9

A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms

Sing Hoi Sze, Jonathan J. Parrott, Aaron M. Tarone

Research output: Contribution to journal › Article › peer-review

Abstract

BACKGROUND: While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies.

RESULTS: We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies.

CONCLUSIONS: Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies.

Original language	English (US)
Article number	895
Pages (from-to)	895
Number of pages	1
Journal	BMC Genomics
Volume	18
DOIs	https://doi.org/10.1186/s12864-017-4270-9
State	Published - Dec 6 2017
Externally published	Yes

Keywords

Divide-and-conquer
RNA-Seq
de novo transcriptome assembly

ASJC Scopus subject areas

Biotechnology
Genetics

Access to Document

10.1186/s12864-017-4270-9

Cite this

@article{cca30c6384fc4296b203051827e406ee,

title = "A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms",

abstract = "BACKGROUND: While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies.RESULTS: We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies.CONCLUSIONS: Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies.",

keywords = "Divide-and-conquer, RNA-Seq, de novo transcriptome assembly",

author = "Sze, {Sing Hoi} and Parrott, {Jonathan J.} and Tarone, {Aaron M.}",

year = "2017",

month = dec,

day = "6",

doi = "10.1186/s12864-017-4270-9",

language = "English (US)",

volume = "18",

pages = "895",

journal = "BMC Genomics",

issn = "1471-2164",

publisher = "BioMed Central",

}

TY - JOUR

T1 - A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms

AU - Sze, Sing Hoi

AU - Parrott, Jonathan J.

AU - Tarone, Aaron M.

PY - 2017/12/6

Y1 - 2017/12/6

N2 - BACKGROUND: While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies.RESULTS: We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies.CONCLUSIONS: Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies.

AB - BACKGROUND: While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies.RESULTS: We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies.CONCLUSIONS: Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies.

KW - Divide-and-conquer

KW - RNA-Seq

KW - de novo transcriptome assembly

UR - http://www.scopus.com/inward/record.url?scp=85049905183&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049905183&partnerID=8YFLogxK

U2 - 10.1186/s12864-017-4270-9

DO - 10.1186/s12864-017-4270-9

M3 - Article

C2 - 29244008

AN - SCOPUS:85049905183

SN - 1471-2164

VL - 18

SP - 895

JO - BMC Genomics

JF - BMC Genomics

M1 - 895

ER -

A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this