Comparing mapreduce-based k-NN similarity joins on hadoop for high-dimensional data

Přemysl Čech, Jakub Maroušek, Jakub Lokoč, Yasin Silva, Jeremy Starks

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

Similarity joins represent a useful operator for data mining, data analysis and data exploration applications. With the exponential growth of data to be analyzed, distributed approaches like MapReduce are required. So far, the state-of-the-art similarity join approaches based on MapReduce mainly focused on the processing of vector data with less than one hundred dimensions. In this paper, we revisit and investigate the performance of different MapReduce-based approximate k-NN similarity join approaches on Apache Hadoop for large volumes of high-dimensional vector data.

Original languageEnglish (US)
Title of host publicationAdvanced Data Mining and Applications - 13th International Conference, ADMA 2017, Proceedings
EditorsWen-Chih Peng, Wei Emma Zhang, Gao Cong, Aixin Sun, Chengliang Li
PublisherSpringer Verlag
Pages63-75
Number of pages13
ISBN (Print)9783319691787
DOIs
StatePublished - 2017
Event13th International Conference on Advanced Data Mining and Applications, ADMA 2017 - Singapore, Singapore
Duration: Nov 5 2017Nov 6 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10604 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other13th International Conference on Advanced Data Mining and Applications, ADMA 2017
Country/TerritorySingapore
CitySingapore
Period11/5/1711/6/17

Keywords

  • Approximate similarity join
  • HTTPS data
  • Hadoop
  • K-NN
  • MapReduce

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Comparing mapreduce-based k-NN similarity joins on hadoop for high-dimensional data'. Together they form a unique fingerprint.

Cite this