Abstract

Current state-of-the-art approaches for biological sequence querying and alignment require preprocessing and lack robustness to repetitions in the sequence. In addition, these approaches do not provide much support for efficiently querying subsequences, a process that is essential for tracking localized database matches. We propose a query-based alignment method for biological sequences that first maps sequences to time-domain waveforms before processing the waveforms for alignment in the time-frequency plane. The mapping uses waveforms, such as Gaussian functions, with unique sequence representations in the time-frequency plane. The proposed alignment method employs a robust querying algorithm that utilizes a time-frequency signal expansion whose basis function is matched to the basic waveform in the mapped sequences. The resulting WAVEQuery approach was demonstrated for both deoxyribonucleic acid (DNA) and protein sequences using the matching pursuit decomposition as the signal basis expansion. We specifically evaluated the alignment localization of WAVEQuery over repetitive database segments, and we demonstrated its operation in real-time without preprocessing. We also demonstrated that WAVEQuery significantly outperformed the biological sequence alignment method BLAST for queries with repetitive segments for DNA sequences. A generalized version of the WAVEQuery approach with the metaplectic transform is also described for protein sequence structure prediction.

Original languageEnglish (US)
Article number5776708
Pages (from-to)4210-4224
Number of pages15
JournalIEEE Transactions on Signal Processing
Volume59
Issue number9
DOIs
StatePublished - Sep 1 2011

    Fingerprint

Keywords

  • Chirp signals
  • Gaussian signal
  • matched filter
  • matching pursuit decomposition
  • querying
  • sequence alignment
  • time-frequency analysis

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering

Cite this