Database similarity join for metric spaces

Yasin Silva, Spencer S. Pearson, Jason A. Cheney

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

Similarity Joins are recognized among the most useful data processing and analysis operations. They retrieve all data pairs whose distances are smaller than a predefined threshold ε. While several standalone implementations have been proposed, very little work has addressed the implementation of Similarity Join as a physical database operator. In this paper, we focus on the study, design and implementation of a Similarity Join database operator for any dataset that lies in a metric space (DBSimJoin). We describe the changes in each query engine module to implement DBSimJoin and provide details of our implementation in PostgreSQL. The extensive performance evaluation shows that DBSimJoin significantly outperforms alternative approaches.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages266-279
Number of pages14
Volume8199 LNCS
DOIs
StatePublished - 2013
Event6th International Conference on Similarity Search and Applications, SISAP 2013 - A Coruna, Spain
Duration: Oct 2 2013Oct 4 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8199 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other6th International Conference on Similarity Search and Applications, SISAP 2013
CountrySpain
CityA Coruna
Period10/2/1310/4/13

Fingerprint

Join
Metric space
Engines
Operator
Performance Evaluation
Data analysis
Engine
Query
Module
Similarity
Alternatives

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Silva, Y., Pearson, S. S., & Cheney, J. A. (2013). Database similarity join for metric spaces. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8199 LNCS, pp. 266-279). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8199 LNCS). https://doi.org/10.1007/978-3-642-41062-8_27

Database similarity join for metric spaces. / Silva, Yasin; Pearson, Spencer S.; Cheney, Jason A.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8199 LNCS 2013. p. 266-279 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8199 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Silva, Y, Pearson, SS & Cheney, JA 2013, Database similarity join for metric spaces. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 8199 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8199 LNCS, pp. 266-279, 6th International Conference on Similarity Search and Applications, SISAP 2013, A Coruna, Spain, 10/2/13. https://doi.org/10.1007/978-3-642-41062-8_27
Silva Y, Pearson SS, Cheney JA. Database similarity join for metric spaces. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8199 LNCS. 2013. p. 266-279. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-41062-8_27
Silva, Yasin ; Pearson, Spencer S. ; Cheney, Jason A. / Database similarity join for metric spaces. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8199 LNCS 2013. pp. 266-279 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{0fc4a9af3850494ea8de782c2e48bcd3,
title = "Database similarity join for metric spaces",
abstract = "Similarity Joins are recognized among the most useful data processing and analysis operations. They retrieve all data pairs whose distances are smaller than a predefined threshold ε. While several standalone implementations have been proposed, very little work has addressed the implementation of Similarity Join as a physical database operator. In this paper, we focus on the study, design and implementation of a Similarity Join database operator for any dataset that lies in a metric space (DBSimJoin). We describe the changes in each query engine module to implement DBSimJoin and provide details of our implementation in PostgreSQL. The extensive performance evaluation shows that DBSimJoin significantly outperforms alternative approaches.",
author = "Yasin Silva and Pearson, {Spencer S.} and Cheney, {Jason A.}",
year = "2013",
doi = "10.1007/978-3-642-41062-8_27",
language = "English (US)",
isbn = "9783642410611",
volume = "8199 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "266--279",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Database similarity join for metric spaces

AU - Silva, Yasin

AU - Pearson, Spencer S.

AU - Cheney, Jason A.

PY - 2013

Y1 - 2013

N2 - Similarity Joins are recognized among the most useful data processing and analysis operations. They retrieve all data pairs whose distances are smaller than a predefined threshold ε. While several standalone implementations have been proposed, very little work has addressed the implementation of Similarity Join as a physical database operator. In this paper, we focus on the study, design and implementation of a Similarity Join database operator for any dataset that lies in a metric space (DBSimJoin). We describe the changes in each query engine module to implement DBSimJoin and provide details of our implementation in PostgreSQL. The extensive performance evaluation shows that DBSimJoin significantly outperforms alternative approaches.

AB - Similarity Joins are recognized among the most useful data processing and analysis operations. They retrieve all data pairs whose distances are smaller than a predefined threshold ε. While several standalone implementations have been proposed, very little work has addressed the implementation of Similarity Join as a physical database operator. In this paper, we focus on the study, design and implementation of a Similarity Join database operator for any dataset that lies in a metric space (DBSimJoin). We describe the changes in each query engine module to implement DBSimJoin and provide details of our implementation in PostgreSQL. The extensive performance evaluation shows that DBSimJoin significantly outperforms alternative approaches.

UR - http://www.scopus.com/inward/record.url?scp=84886384961&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84886384961&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-41062-8_27

DO - 10.1007/978-3-642-41062-8_27

M3 - Conference contribution

AN - SCOPUS:84886384961

SN - 9783642410611

VL - 8199 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 266

EP - 279

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -