TY - JOUR
T1 - Waveform mapping and time-frequency processing of DNA and protein sequences
AU - Ravichandran, Lakshminarayan
AU - Papandreou-Suppappola, Antonia
AU - Spanias, Andreas
AU - Lacroix, Zoé
AU - Legendre, Christophe
N1 - Funding Information:
Manuscript received August 31, 2010; revised February 11, 2011; accepted May 02, 2011. Date of publication May 27, 2011; date of current version August 10, 2011. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Cédric Richard. This work was supported in part by the NSF by Grants IIS 0431174, IIS 0551444, IIS 0612273, IIS 0738906, IIS 0832551, and CNS 0849980.
PY - 2011/9
Y1 - 2011/9
N2 - Current state-of-the-art approaches for biological sequence querying and alignment require preprocessing and lack robustness to repetitions in the sequence. In addition, these approaches do not provide much support for efficiently querying subsequences, a process that is essential for tracking localized database matches. We propose a query-based alignment method for biological sequences that first maps sequences to time-domain waveforms before processing the waveforms for alignment in the time-frequency plane. The mapping uses waveforms, such as Gaussian functions, with unique sequence representations in the time-frequency plane. The proposed alignment method employs a robust querying algorithm that utilizes a time-frequency signal expansion whose basis function is matched to the basic waveform in the mapped sequences. The resulting WAVEQuery approach was demonstrated for both deoxyribonucleic acid (DNA) and protein sequences using the matching pursuit decomposition as the signal basis expansion. We specifically evaluated the alignment localization of WAVEQuery over repetitive database segments, and we demonstrated its operation in real-time without preprocessing. We also demonstrated that WAVEQuery significantly outperformed the biological sequence alignment method BLAST for queries with repetitive segments for DNA sequences. A generalized version of the WAVEQuery approach with the metaplectic transform is also described for protein sequence structure prediction.
AB - Current state-of-the-art approaches for biological sequence querying and alignment require preprocessing and lack robustness to repetitions in the sequence. In addition, these approaches do not provide much support for efficiently querying subsequences, a process that is essential for tracking localized database matches. We propose a query-based alignment method for biological sequences that first maps sequences to time-domain waveforms before processing the waveforms for alignment in the time-frequency plane. The mapping uses waveforms, such as Gaussian functions, with unique sequence representations in the time-frequency plane. The proposed alignment method employs a robust querying algorithm that utilizes a time-frequency signal expansion whose basis function is matched to the basic waveform in the mapped sequences. The resulting WAVEQuery approach was demonstrated for both deoxyribonucleic acid (DNA) and protein sequences using the matching pursuit decomposition as the signal basis expansion. We specifically evaluated the alignment localization of WAVEQuery over repetitive database segments, and we demonstrated its operation in real-time without preprocessing. We also demonstrated that WAVEQuery significantly outperformed the biological sequence alignment method BLAST for queries with repetitive segments for DNA sequences. A generalized version of the WAVEQuery approach with the metaplectic transform is also described for protein sequence structure prediction.
KW - Chirp signals
KW - Gaussian signal
KW - matched filter
KW - matching pursuit decomposition
KW - querying
KW - sequence alignment
KW - time-frequency analysis
UR - http://www.scopus.com/inward/record.url?scp=80051751263&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80051751263&partnerID=8YFLogxK
U2 - 10.1109/TSP.2011.2157915
DO - 10.1109/TSP.2011.2157915
M3 - Article
AN - SCOPUS:80051751263
SN - 1053-587X
VL - 59
SP - 4210
EP - 4224
JO - IEEE Transactions on Signal Processing
JF - IEEE Transactions on Signal Processing
IS - 9
M1 - 5776708
ER -