RanKloud: A scalable ranked query processing framework on Hadoop

Kasim Candan, Parth Nagarkar, Mithila Nagendra, Renwei Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

The popularity of batch-oriented cluster architectures like Hadoop is on the rise. These batch-based systems successfully achieve high degrees of scalability by carefully allocating resources and leveraging opportunities to parallelize basic processing tasks. However, they are known to fall short in certain application domains such as large scale media analysis. In these applications, the utility of a given data element plays a vital role in a particular analysis task, and this utility most often depends on the way the data is collected or interpreted. However, existing batch data processing frameworks do not consider data utility in allocating resources, and hence fail to optimize for ranked/top-k query processing in which the user is interested in obtaining a relatively small subset of the best result instances. A naïve implementation of these operations on an existing system would need to enumerate more candidates than needed, before it can filter out the k best results. We note that such waste can be avoided by utilizing utility-aware task partitioning and resource allocation strategies that can prune unpromising objects from consideration. In this demonstration, we introduce RanKloud, an efficient and scalable utility-aware parallel processing system built for the analysis of large media datasets. RanKloud extends Hadoop's MapReduce paradigm to provide support for ranked query operations, such as k-nearest neighbor and k-closest pair search, skylines, skyline-joins, and top-k join processing.

Original languageEnglish (US)
Title of host publicationAdvances in Database Technology - EDBT 2011
Subtitle of host publication14th International Conference on Extending Database Technology, Proceedings
Pages574-577
Number of pages4
DOIs
StatePublished - Apr 18 2011
Event14th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2011 - Uppsala, Sweden
Duration: Mar 22 2011Mar 24 2011

Publication series

NameACM International Conference Proceeding Series

Other

Other14th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2011
CountrySweden
CityUppsala
Period3/22/113/24/11

Keywords

  • KNN
  • MapReduce
  • Parallel processing
  • Skyline
  • Top-K

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'RanKloud: A scalable ranked query processing framework on Hadoop'. Together they form a unique fingerprint.

Cite this