CLAST: Clustering Biological Sequences

Vicente Molieri, Lina Karam, Zoé Lacroix

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Clustering sequences is important in a variety of applications, including development of nonredundant databases, function prediction, and identifying patterns of gene expression. Currently, clustering methods rely on a prealignment as supplementary information to guide the construction of clusters. This chapter introduces a novel algorithm to cluster nucleotide and peptide sequences. The algorithm is a no-reference approach that utilizes only the sequences as input. We also introduce a novel metric that is used to describe the relationship between biological sequences, and serves as the distance measurement for clustering. Results are presented for real biological sequences, comparing the proposed algorithm to other similar tools available.

Original languageEnglish (US)
Title of host publicationEmerging Trends in Computational Biology, Bioinformatics, and Systems Biology: Algorithms and Software Tools
PublisherElsevier Inc.
Pages203-220
Number of pages18
ISBN (Print)9780128026465, 9780128025086
DOIs
StatePublished - Aug 7 2015

Keywords

  • Biological sequences
  • Clustering
  • Databases
  • Graph cuts
  • Hashing
  • Nucleotide
  • Peptide

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'CLAST: Clustering Biological Sequences'. Together they form a unique fingerprint.

  • Cite this

    Molieri, V., Karam, L., & Lacroix, Z. (2015). CLAST: Clustering Biological Sequences. In Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology: Algorithms and Software Tools (pp. 203-220). Elsevier Inc.. https://doi.org/10.1016/B978-0-12-802508-6.00010-7