Data-Quality Aware Middleware for Scalable Data Analysis

Project: Research project

Project Details

Description

Data-Quality Aware Middleware for Scalable Data Analysis Data-Quality Aware Middleware for Scalable Data Analysis Today, data is produced in massive quantities. The applications that drive this massive data influx span a large spectrum from business applications to, web and social networks. This application diversity is matched by need for processing and efficient analysis of data with diverse characteristics, including quality of capture and variable precision in representation. Thus, for the next generation massive data analysis middleware to have transformative impact, the fundamental principles that govern the design must include data and operator imprecision and relevance of data to a particular analysis task. Recently, there is increased number of attempts (many of them building on the MapReduce framework) focused on providing data parallelism in support of large data processing pplications, including business and web intelligence. While these frameworks have produced impressive results in these contexts, they do not address the key requirements of a larger class of data analysis applications which involve data and operator imprecision. The primary technical contribution of the proposed work is to research and develop a scalable rank- and quality-aware data processing middleware, called RanKloud, to support massive data processing and decision making applications, where (a) qualities of the data elements and operations on this data are variable, (b) the data matching process is inherently imprecise, and (c) analysis operations can trade-off quality against time by picking alternative processing strategies or by regulating the data/features they choose to operate on. In particular, the project will investigate description of quality- and rank-aware data analysis workflows of adaptable data processing primitives and run-time adaptation of analysis-workflows based data and processing characteristics discovered in run-time. The implementation of RanKloud, based on the novel techniques and algorithms, will extend the Hadoop architecture (an open source implementation of MapReduce). Efficient and large scale analysis over data with variable quality will enable a new class of applications impacting web intelligence, business intelligence, and scientific and applications. Usefulness of the extended, quality-aware MapReduce model and the efficiency of the RanKloud architecture for large scale data analysis will be tested through a novel context-aware web information-flow analysis application, which is not properly supported by todays massive data analysis paradigms.
StatusFinished
Effective start/end date8/1/097/31/12

Funding

  • INDUSTRY: Domestic Company: $150,000.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.