CLAST: Clustering Biological Sequences

Vicente Molieri, Lina Karam, Zoé Lacroix

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Clustering sequences is important in a variety of applications, including development of nonredundant databases, function prediction, and identifying patterns of gene expression. Currently, clustering methods rely on a prealignment as supplementary information to guide the construction of clusters. This chapter introduces a novel algorithm to cluster nucleotide and peptide sequences. The algorithm is a no-reference approach that utilizes only the sequences as input. We also introduce a novel metric that is used to describe the relationship between biological sequences, and serves as the distance measurement for clustering. Results are presented for real biological sequences, comparing the proposed algorithm to other similar tools available.

Original languageEnglish (US)
Title of host publicationEmerging Trends in Computational Biology, Bioinformatics, and Systems Biology: Algorithms and Software Tools
PublisherElsevier Inc.
Pages203-220
Number of pages18
ISBN (Print)9780128026465, 9780128025086
DOIs
StatePublished - Aug 7 2015

Fingerprint

Distance measurement
Nucleotides
Gene expression
Peptides

Keywords

  • Biological sequences
  • Clustering
  • Databases
  • Graph cuts
  • Hashing
  • Nucleotide
  • Peptide

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Molieri, V., Karam, L., & Lacroix, Z. (2015). CLAST: Clustering Biological Sequences. In Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology: Algorithms and Software Tools (pp. 203-220). Elsevier Inc.. https://doi.org/10.1016/B978-0-12-802508-6.00010-7

CLAST : Clustering Biological Sequences. / Molieri, Vicente; Karam, Lina; Lacroix, Zoé.

Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology: Algorithms and Software Tools. Elsevier Inc., 2015. p. 203-220.

Research output: Chapter in Book/Report/Conference proceedingChapter

Molieri, V, Karam, L & Lacroix, Z 2015, CLAST: Clustering Biological Sequences. in Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology: Algorithms and Software Tools. Elsevier Inc., pp. 203-220. https://doi.org/10.1016/B978-0-12-802508-6.00010-7
Molieri V, Karam L, Lacroix Z. CLAST: Clustering Biological Sequences. In Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology: Algorithms and Software Tools. Elsevier Inc. 2015. p. 203-220 https://doi.org/10.1016/B978-0-12-802508-6.00010-7
Molieri, Vicente ; Karam, Lina ; Lacroix, Zoé. / CLAST : Clustering Biological Sequences. Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology: Algorithms and Software Tools. Elsevier Inc., 2015. pp. 203-220
@inbook{366fadd058d94539a0d47376db4c5e59,
title = "CLAST: Clustering Biological Sequences",
abstract = "Clustering sequences is important in a variety of applications, including development of nonredundant databases, function prediction, and identifying patterns of gene expression. Currently, clustering methods rely on a prealignment as supplementary information to guide the construction of clusters. This chapter introduces a novel algorithm to cluster nucleotide and peptide sequences. The algorithm is a no-reference approach that utilizes only the sequences as input. We also introduce a novel metric that is used to describe the relationship between biological sequences, and serves as the distance measurement for clustering. Results are presented for real biological sequences, comparing the proposed algorithm to other similar tools available.",
keywords = "Biological sequences, Clustering, Databases, Graph cuts, Hashing, Nucleotide, Peptide",
author = "Vicente Molieri and Lina Karam and Zo{\'e} Lacroix",
year = "2015",
month = "8",
day = "7",
doi = "10.1016/B978-0-12-802508-6.00010-7",
language = "English (US)",
isbn = "9780128026465",
pages = "203--220",
booktitle = "Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology: Algorithms and Software Tools",
publisher = "Elsevier Inc.",

}

TY - CHAP

T1 - CLAST

T2 - Clustering Biological Sequences

AU - Molieri, Vicente

AU - Karam, Lina

AU - Lacroix, Zoé

PY - 2015/8/7

Y1 - 2015/8/7

N2 - Clustering sequences is important in a variety of applications, including development of nonredundant databases, function prediction, and identifying patterns of gene expression. Currently, clustering methods rely on a prealignment as supplementary information to guide the construction of clusters. This chapter introduces a novel algorithm to cluster nucleotide and peptide sequences. The algorithm is a no-reference approach that utilizes only the sequences as input. We also introduce a novel metric that is used to describe the relationship between biological sequences, and serves as the distance measurement for clustering. Results are presented for real biological sequences, comparing the proposed algorithm to other similar tools available.

AB - Clustering sequences is important in a variety of applications, including development of nonredundant databases, function prediction, and identifying patterns of gene expression. Currently, clustering methods rely on a prealignment as supplementary information to guide the construction of clusters. This chapter introduces a novel algorithm to cluster nucleotide and peptide sequences. The algorithm is a no-reference approach that utilizes only the sequences as input. We also introduce a novel metric that is used to describe the relationship between biological sequences, and serves as the distance measurement for clustering. Results are presented for real biological sequences, comparing the proposed algorithm to other similar tools available.

KW - Biological sequences

KW - Clustering

KW - Databases

KW - Graph cuts

KW - Hashing

KW - Nucleotide

KW - Peptide

UR - http://www.scopus.com/inward/record.url?scp=84944559449&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84944559449&partnerID=8YFLogxK

U2 - 10.1016/B978-0-12-802508-6.00010-7

DO - 10.1016/B978-0-12-802508-6.00010-7

M3 - Chapter

AN - SCOPUS:84944559449

SN - 9780128026465

SN - 9780128025086

SP - 203

EP - 220

BT - Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology: Algorithms and Software Tools

PB - Elsevier Inc.

ER -