Detection of structural variants involving repetitive regions in the reference genome

Heewook Lee, Ellen Popodi, Patricia L. Foster, Haixu Tang

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

Next-generation sequencing techniques are now commonly used to characterize structural variations (SVs) in population genomics and elucidate their associations with phenotypes. Many of the computational tools developed for detecting structural variations work by mapping paired-end reads to a reference genome and identifying the discordant read-pairs whose mapped loci in the reference genome deviate from the expected insert size and orientation. However, repetitive regions in the reference genome represent a major challenge in SV detection, because the paired-end reads from these regions may be mapped to multiple loci in the reference genome, resulting in spuriously discordant read-pairs. To address this issue, we have developed an algorithmic approach for read mapping and SV detection based on the framework of A-Bruijn graphs. Instead of mapping reads to a linear sequence of the reference genome, we propose to map reads onto the A-Bruijn graph constructed from the reference genome in which all instances of the same repeat are collapsed into a single edge. As a result, any given read, either from repetitive regions or not, will be mapped to a unique location in the A-Bruijn graph, and each discordant read-pair in the A-Bruijn graph indicates a potentially true SV event. We also developed a simple clustering algorithm to derive valid clusters of these discordant read-pairs, each supporting a different SV event. Finally, we demonstrate the performance of this approach, compared to existing approaches, by identifying transposition events of insertion sequence (IS) elements, a class of simple mobile genetic elements (MGEs), in E. coli by using simulated and real paired-end sequence data acquired from E. coli mutation accumulation lines.

Original languageEnglish (US)
Pages (from-to)219-233
Number of pages15
JournalJournal of Computational Biology
Volume21
Issue number3
DOIs
StatePublished - Mar 1 2014
Externally publishedYes

Keywords

  • algorithms
  • alignment
  • combinatorics
  • genomic rearrangements
  • genomics
  • graph theory
  • sequence analysis

ASJC Scopus subject areas

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Detection of structural variants involving repetitive regions in the reference genome'. Together they form a unique fingerprint.

Cite this