Probabilistic models to reconcile complex data from inaccurate data sources

Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti

Research output: Chapter in Book/Report/Conference proceedingConference contribution

44 Scopus citations

Abstract

Several techniques have been developed to extract and integrate data from web sources. However, web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence. We also report the results of several experiments on both synthetic and real-life data to show the effectiveness of the proposed approach.

Original languageEnglish (US)
Title of host publicationAdvanced Information Systems Engineering - 22nd International Conference, CAiSE 2010, Proceedings
Pages83-97
Number of pages15
DOIs
StatePublished - Dec 1 2010
Event22nd International Conference on Advanced Information Systems Engineering, CAiSE 2010 - Hammamet, Tunisia
Duration: Jun 7 2010Jun 9 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6051 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other22nd International Conference on Advanced Information Systems Engineering, CAiSE 2010
CountryTunisia
CityHammamet
Period6/7/106/9/10

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Probabilistic models to reconcile complex data from inaccurate data sources'. Together they form a unique fingerprint.

  • Cite this

    Blanco, L., Crescenzi, V., Merialdo, P., & Papotti, P. (2010). Probabilistic models to reconcile complex data from inaccurate data sources. In Advanced Information Systems Engineering - 22nd International Conference, CAiSE 2010, Proceedings (pp. 83-97). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6051 LNCS). https://doi.org/10.1007/978-3-642-13094-6_8