Probabilistic reconciliation of records from inaccurate web sources (extended abstract)

Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence of copying.

Original languageEnglish (US)
Title of host publicationSEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems
PublisherEsculapio Editore
Pages390-397
Number of pages8
ISBN (Print)9788874883691
StatePublished - 2010
Externally publishedYes
Event18th Italian Symposium on Advanced Database Systems, SEBD 2010 - Rimini, Italy
Duration: Jun 20 2010Jun 23 2010

Other

Other18th Italian Symposium on Advanced Database Systems, SEBD 2010
CountryItaly
CityRimini
Period6/20/106/23/10

Fingerprint

Copying
Probability distributions
Statistical Models
Uncertainty

ASJC Scopus subject areas

  • Software

Cite this

Blanco, L., Crescenzi, V., Merialdo, P., & Papotti, P. (2010). Probabilistic reconciliation of records from inaccurate web sources (extended abstract). In SEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems (pp. 390-397). Esculapio Editore.

Probabilistic reconciliation of records from inaccurate web sources (extended abstract). / Blanco, Lorenzo; Crescenzi, Valter; Merialdo, Paolo; Papotti, Paolo.

SEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems. Esculapio Editore, 2010. p. 390-397.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Blanco, L, Crescenzi, V, Merialdo, P & Papotti, P 2010, Probabilistic reconciliation of records from inaccurate web sources (extended abstract). in SEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems. Esculapio Editore, pp. 390-397, 18th Italian Symposium on Advanced Database Systems, SEBD 2010, Rimini, Italy, 6/20/10.
Blanco L, Crescenzi V, Merialdo P, Papotti P. Probabilistic reconciliation of records from inaccurate web sources (extended abstract). In SEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems. Esculapio Editore. 2010. p. 390-397
Blanco, Lorenzo ; Crescenzi, Valter ; Merialdo, Paolo ; Papotti, Paolo. / Probabilistic reconciliation of records from inaccurate web sources (extended abstract). SEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems. Esculapio Editore, 2010. pp. 390-397
@inproceedings{962dcb2413a34394b598054565c0dc74,
title = "Probabilistic reconciliation of records from inaccurate web sources (extended abstract)",
abstract = "Web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence of copying.",
author = "Lorenzo Blanco and Valter Crescenzi and Paolo Merialdo and Paolo Papotti",
year = "2010",
language = "English (US)",
isbn = "9788874883691",
pages = "390--397",
booktitle = "SEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems",
publisher = "Esculapio Editore",

}

TY - GEN

T1 - Probabilistic reconciliation of records from inaccurate web sources (extended abstract)

AU - Blanco, Lorenzo

AU - Crescenzi, Valter

AU - Merialdo, Paolo

AU - Papotti, Paolo

PY - 2010

Y1 - 2010

N2 - Web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence of copying.

AB - Web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence of copying.

UR - http://www.scopus.com/inward/record.url?scp=84890948926&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890948926&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84890948926

SN - 9788874883691

SP - 390

EP - 397

BT - SEBD 2010 - Proceedings of the 18th Italian Symposium on Advanced Database Systems

PB - Esculapio Editore

ER -