BayesWipe: A multimodal system for data cleaning and consistent query answering on structured bigdata

Sushovan De, Yuheng Hu, Yi Chen, Subbarao Kambhampati

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Recent efforts in data cleaning of structured data have focused exclusively on problems like data deduplication, record matching, and data standardization; none of these focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like CFDs (which have to be provided by domain experts, or learned from a clean sample of the database). In this paper, we provide a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly. We thus avoid the necessity for a domain expert or clean master data. We also show how to efficiently perform consistent query answering using this model over a dirty database, in case write permissions to the database are unavailable. We evaluate our methods over both synthetic and real data.

Original languageEnglish (US)
Title of host publicationProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
EditorsWo Chang, Jun Huan, Nick Cercone, Saumyadipta Pyne, Vasant Honavar, Jimmy Lin, Xiaohua Tony Hu, Charu Aggarwal, Bamshad Mobasher, Jian Pei, Raghunath Nambiar
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages15-24
Number of pages10
ISBN (Electronic)9781479956654
DOIs
StatePublished - Jan 7 2015
Event2nd IEEE International Conference on Big Data, IEEE Big Data 2014 - Washington, United States
Duration: Oct 27 2014Oct 30 2014

Publication series

NameProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014

Other

Other2nd IEEE International Conference on Big Data, IEEE Big Data 2014
CountryUnited States
CityWashington
Period10/27/1410/30/14

Keywords

  • Data cleaning
  • Databases
  • Query rewriting
  • Uncertainty
  • Web databases

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems

Fingerprint Dive into the research topics of 'BayesWipe: A multimodal system for data cleaning and consistent query answering on structured bigdata'. Together they form a unique fingerprint.

  • Cite this

    De, S., Hu, Y., Chen, Y., & Kambhampati, S. (2015). BayesWipe: A multimodal system for data cleaning and consistent query answering on structured bigdata. In W. Chang, J. Huan, N. Cercone, S. Pyne, V. Honavar, J. Lin, X. T. Hu, C. Aggarwal, B. Mobasher, J. Pei, & R. Nambiar (Eds.), Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014 (pp. 15-24). [7004207] (Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2014.7004207