Probabilistic reconciliation of records from inaccurate web sources (extended abstract)

Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence of copying.

Original languageEnglish (US)
Title of host publicationSEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems
PublisherEsculapio Editore
Pages390-397
Number of pages8
ISBN (Print)9788874883691
StatePublished - Jan 1 2010
Event18th Italian Symposium on Advanced Database Systems, SEBD 2010 - Rimini, Italy
Duration: Jun 20 2010Jun 23 2010

Publication series

NameSEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems

Other

Other18th Italian Symposium on Advanced Database Systems, SEBD 2010
Country/TerritoryItaly
CityRimini
Period6/20/106/23/10

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Probabilistic reconciliation of records from inaccurate web sources (extended abstract)'. Together they form a unique fingerprint.

Cite this