Homology search with fragmented nucleic acid sequence patterns

Axel Mosig, Julian Chen, Peter F. Stadler

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Citations (Scopus)

Abstract

The comprehensive annotation of non-coding RNAs in newly sequenced genomes is still a largely unsolved problem because many functional RNAs exhibit not only poorly conserved sequences but also large variability in structure. In many cases, such as Y RNAs, vault RNAs, or telomerase RNAs, sequences differ by large insertions or deletions and have only a few small sequence patterns in common. Here we present fragrep2, a purely sequence-based approach to detect such patterns in complete genomes. A fragrep2 pattern consists of an ordered list of position-specific weight matrices (PWMs) describing short, approximately conserved sequence elements, that are separated by intervals of non-conserved regions of bounded length. The program uses a fractional programming approach to align the PWMs to genomic DNA in order to allow for a bounded number of insertions and deletions in the patterns. These patterns are then combined to significant combinations of PWMs. At this step, a subset of PWMs may be deleted, i.e., have no match in the current region of the genome. The program furthermore estimates p- and E-values for the matches. We apply fragrep2 to homology searches for RNase MRP, unveiling two previously unidentified matches as well as reproducing the results of two previous surveys. Furthermore, we complement the picture of vertebrate vault RNAs, a class of ncRNAs that has not received much attention so far.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages335-345
Number of pages11
Volume4645 LNBI
StatePublished - 2007
Event7th International Workshop on Algorithms in Bioinformatics, WABI 2007 - PhiIadelphia, PA, United States
Duration: Sep 8 2007Sep 9 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4645 LNBI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other7th International Workshop on Algorithms in Bioinformatics, WABI 2007
CountryUnited States
CityPhiIadelphia, PA
Period9/8/079/9/07

Fingerprint

Position-Specific Scoring Matrices
Nucleic acid sequences
RNA
Density (specific gravity)
Nucleic Acids
Homology
Conserved Sequence
Genes
Genome
Deletion
Insertion
Untranslated RNA
Ribonucleases
Fractional Programming
Vertebrates
Genomics
Annotation
Complement
DNA
Interval

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Mosig, A., Chen, J., & Stadler, P. F. (2007). Homology search with fragmented nucleic acid sequence patterns. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4645 LNBI, pp. 335-345). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4645 LNBI).

Homology search with fragmented nucleic acid sequence patterns. / Mosig, Axel; Chen, Julian; Stadler, Peter F.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4645 LNBI 2007. p. 335-345 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4645 LNBI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mosig, A, Chen, J & Stadler, PF 2007, Homology search with fragmented nucleic acid sequence patterns. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 4645 LNBI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4645 LNBI, pp. 335-345, 7th International Workshop on Algorithms in Bioinformatics, WABI 2007, PhiIadelphia, PA, United States, 9/8/07.
Mosig A, Chen J, Stadler PF. Homology search with fragmented nucleic acid sequence patterns. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4645 LNBI. 2007. p. 335-345. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Mosig, Axel ; Chen, Julian ; Stadler, Peter F. / Homology search with fragmented nucleic acid sequence patterns. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4645 LNBI 2007. pp. 335-345 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{48eec3751aca4497ae28cdeb8a35d1d5,
title = "Homology search with fragmented nucleic acid sequence patterns",
abstract = "The comprehensive annotation of non-coding RNAs in newly sequenced genomes is still a largely unsolved problem because many functional RNAs exhibit not only poorly conserved sequences but also large variability in structure. In many cases, such as Y RNAs, vault RNAs, or telomerase RNAs, sequences differ by large insertions or deletions and have only a few small sequence patterns in common. Here we present fragrep2, a purely sequence-based approach to detect such patterns in complete genomes. A fragrep2 pattern consists of an ordered list of position-specific weight matrices (PWMs) describing short, approximately conserved sequence elements, that are separated by intervals of non-conserved regions of bounded length. The program uses a fractional programming approach to align the PWMs to genomic DNA in order to allow for a bounded number of insertions and deletions in the patterns. These patterns are then combined to significant combinations of PWMs. At this step, a subset of PWMs may be deleted, i.e., have no match in the current region of the genome. The program furthermore estimates p- and E-values for the matches. We apply fragrep2 to homology searches for RNase MRP, unveiling two previously unidentified matches as well as reproducing the results of two previous surveys. Furthermore, we complement the picture of vertebrate vault RNAs, a class of ncRNAs that has not received much attention so far.",
author = "Axel Mosig and Julian Chen and Stadler, {Peter F.}",
year = "2007",
language = "English (US)",
isbn = "9783540741251",
volume = "4645 LNBI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "335--345",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Homology search with fragmented nucleic acid sequence patterns

AU - Mosig, Axel

AU - Chen, Julian

AU - Stadler, Peter F.

PY - 2007

Y1 - 2007

N2 - The comprehensive annotation of non-coding RNAs in newly sequenced genomes is still a largely unsolved problem because many functional RNAs exhibit not only poorly conserved sequences but also large variability in structure. In many cases, such as Y RNAs, vault RNAs, or telomerase RNAs, sequences differ by large insertions or deletions and have only a few small sequence patterns in common. Here we present fragrep2, a purely sequence-based approach to detect such patterns in complete genomes. A fragrep2 pattern consists of an ordered list of position-specific weight matrices (PWMs) describing short, approximately conserved sequence elements, that are separated by intervals of non-conserved regions of bounded length. The program uses a fractional programming approach to align the PWMs to genomic DNA in order to allow for a bounded number of insertions and deletions in the patterns. These patterns are then combined to significant combinations of PWMs. At this step, a subset of PWMs may be deleted, i.e., have no match in the current region of the genome. The program furthermore estimates p- and E-values for the matches. We apply fragrep2 to homology searches for RNase MRP, unveiling two previously unidentified matches as well as reproducing the results of two previous surveys. Furthermore, we complement the picture of vertebrate vault RNAs, a class of ncRNAs that has not received much attention so far.

AB - The comprehensive annotation of non-coding RNAs in newly sequenced genomes is still a largely unsolved problem because many functional RNAs exhibit not only poorly conserved sequences but also large variability in structure. In many cases, such as Y RNAs, vault RNAs, or telomerase RNAs, sequences differ by large insertions or deletions and have only a few small sequence patterns in common. Here we present fragrep2, a purely sequence-based approach to detect such patterns in complete genomes. A fragrep2 pattern consists of an ordered list of position-specific weight matrices (PWMs) describing short, approximately conserved sequence elements, that are separated by intervals of non-conserved regions of bounded length. The program uses a fractional programming approach to align the PWMs to genomic DNA in order to allow for a bounded number of insertions and deletions in the patterns. These patterns are then combined to significant combinations of PWMs. At this step, a subset of PWMs may be deleted, i.e., have no match in the current region of the genome. The program furthermore estimates p- and E-values for the matches. We apply fragrep2 to homology searches for RNase MRP, unveiling two previously unidentified matches as well as reproducing the results of two previous surveys. Furthermore, we complement the picture of vertebrate vault RNAs, a class of ncRNAs that has not received much attention so far.

UR - http://www.scopus.com/inward/record.url?scp=37249001791&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=37249001791&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:37249001791

SN - 9783540741251

VL - 4645 LNBI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 335

EP - 345

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -