Data-utility sensitive query processing on server clusters to support scalable data analysis services

Renwei Yu, Mithila Nagendra, Parth Nagarkar, Kasim Candan, Jong Wook Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

The observation that a significant class of data processing and analysis applications can be expressed in terms of a small set of primitives that are easy to parallelize has resulted in increasing popularity of batch-oriented, highly-parallelizable cluster frameworks to support data analysis services. These frameworks, however, are known to have shortcomings for certain application domains. For example, in many data analysis applications, the utility of a given data element to the particular analysis task depends on the way the data is collected (e.g. its precision) or interpreted. However, since existing batch oriented data processing frameworks do not consider variations in data utility, they are not able to focus on the best results. Even if the user is interested in obtaining a relatively small subset of the best result instances, these systems often need to enumerate entire result sets, even if these sets contain low-utility results. RanKloud is an efficient and scalable utility-aware parallel processing system for ranked query processing over large data sets. In this paper, we focus on the uSplit data partitioning and work-allocation strategies of RanKloud for processing top-k join queries to support data analysis services. In particular, we describe how uSplit adaptively samples data from "upstream" operators to help allocate resources in a work-balanced and wasted-work avoiding manner for top-k join processing. Experimental results show that the proposed sampling, data partitioning, and join processing strategies enable uSplit to return top-k results with high confidence and low-overhead (up to ∼ 9× faster than alternative schemes on 10 servers).

Original languageEnglish (US)
Title of host publicationNew Frontiers in Information and Software as Services
Subtitle of host publicationService and Application Design Challenges in the Cloud
PublisherSpringer Verlag
Pages155-184
Number of pages30
ISBN (Print)9783642192937
DOIs
StatePublished - Jan 1 2011

Publication series

NameLecture Notes in Business Information Processing
Volume74 LNBIP
ISSN (Print)1865-1348

ASJC Scopus subject areas

  • Management Information Systems
  • Control and Systems Engineering
  • Business and International Management
  • Information Systems
  • Modeling and Simulation
  • Information Systems and Management

Fingerprint Dive into the research topics of 'Data-utility sensitive query processing on server clusters to support scalable data analysis services'. Together they form a unique fingerprint.

  • Cite this

    Yu, R., Nagendra, M., Nagarkar, P., Candan, K., & Kim, J. W. (2011). Data-utility sensitive query processing on server clusters to support scalable data analysis services. In New Frontiers in Information and Software as Services: Service and Application Design Challenges in the Cloud (pp. 155-184). (Lecture Notes in Business Information Processing; Vol. 74 LNBIP). Springer Verlag. https://doi.org/10.1007/978-3-642-19294-4_7