Database similarity join for metric spaces

Yasin Silva, Spencer S. Pearson, Jason A. Cheney

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Scopus citations

Abstract

Similarity Joins are recognized among the most useful data processing and analysis operations. They retrieve all data pairs whose distances are smaller than a predefined threshold ε. While several standalone implementations have been proposed, very little work has addressed the implementation of Similarity Join as a physical database operator. In this paper, we focus on the study, design and implementation of a Similarity Join database operator for any dataset that lies in a metric space (DBSimJoin). We describe the changes in each query engine module to implement DBSimJoin and provide details of our implementation in PostgreSQL. The extensive performance evaluation shows that DBSimJoin significantly outperforms alternative approaches.

Original languageEnglish (US)
Title of host publicationSimilarity Search and Applications - 6th International Conference, SISAP 2013, Proceedings
Pages266-279
Number of pages14
DOIs
StatePublished - Oct 30 2013
Event6th International Conference on Similarity Search and Applications, SISAP 2013 - A Coruna, Spain
Duration: Oct 2 2013Oct 4 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8199 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other6th International Conference on Similarity Search and Applications, SISAP 2013
Country/TerritorySpain
CityA Coruna
Period10/2/1310/4/13

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Database similarity join for metric spaces'. Together they form a unique fingerprint.

Cite this