Probabilistic models to reconcile complex data from inaccurate data sources

Lorenzo Blanco; Valter Crescenzi; Paolo Merialdo; Paolo Papotti

doi:10.1007/978-3-642-13094-6_8

Probabilistic models to reconcile complex data from inaccurate data sources

Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

47 Scopus citations

Abstract

Several techniques have been developed to extract and integrate data from web sources. However, web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence. We also report the results of several experiments on both synthetic and real-life data to show the effectiveness of the proposed approach.

Original language	English (US)
Title of host publication	Advanced Information Systems Engineering - 22nd International Conference, CAiSE 2010, Proceedings
Pages	83-97
Number of pages	15
DOIs	https://doi.org/10.1007/978-3-642-13094-6_8
State	Published - 2010
Externally published	Yes
Event	22nd International Conference on Advanced Information Systems Engineering, CAiSE 2010 - Hammamet, Tunisia Duration: Jun 7 2010 → Jun 9 2010

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	6051 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Other

Other	22nd International Conference on Advanced Information Systems Engineering, CAiSE 2010
Country/Territory	Tunisia
City	Hammamet
Period	6/7/10 → 6/9/10

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/978-3-642-13094-6_8

Cite this

Blanco, L., Crescenzi, V., Merialdo, P., & Papotti, P. (2010). Probabilistic models to reconcile complex data from inaccurate data sources. In Advanced Information Systems Engineering - 22nd International Conference, CAiSE 2010, Proceedings (pp. 83-97). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6051 LNCS). https://doi.org/10.1007/978-3-642-13094-6_8

Probabilistic models to reconcile complex data from inaccurate data sources. / Blanco, Lorenzo; Crescenzi, Valter; Merialdo, Paolo et al.
Advanced Information Systems Engineering - 22nd International Conference, CAiSE 2010, Proceedings. 2010. p. 83-97 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6051 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Blanco, L, Crescenzi, V, Merialdo, P & Papotti, P 2010, Probabilistic models to reconcile complex data from inaccurate data sources. in Advanced Information Systems Engineering - 22nd International Conference, CAiSE 2010, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6051 LNCS, pp. 83-97, 22nd International Conference on Advanced Information Systems Engineering, CAiSE 2010, Hammamet, Tunisia, 6/7/10. https://doi.org/10.1007/978-3-642-13094-6_8

Blanco L, Crescenzi V, Merialdo P, Papotti P. Probabilistic models to reconcile complex data from inaccurate data sources. In Advanced Information Systems Engineering - 22nd International Conference, CAiSE 2010, Proceedings. 2010. p. 83-97. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-642-13094-6_8

Blanco, Lorenzo ; Crescenzi, Valter ; Merialdo, Paolo et al. / Probabilistic models to reconcile complex data from inaccurate data sources. Advanced Information Systems Engineering - 22nd International Conference, CAiSE 2010, Proceedings. 2010. pp. 83-97 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{1f8728163c2c4f37912ad079d5592fa8,

title = "Probabilistic models to reconcile complex data from inaccurate data sources",

abstract = "Several techniques have been developed to extract and integrate data from web sources. However, web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence. We also report the results of several experiments on both synthetic and real-life data to show the effectiveness of the proposed approach.",

author = "Lorenzo Blanco and Valter Crescenzi and Paolo Merialdo and Paolo Papotti",

year = "2010",

doi = "10.1007/978-3-642-13094-6_8",

language = "English (US)",

isbn = "3642130933",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "83--97",

booktitle = "Advanced Information Systems Engineering - 22nd International Conference, CAiSE 2010, Proceedings",

note = "22nd International Conference on Advanced Information Systems Engineering, CAiSE 2010 ; Conference date: 07-06-2010 Through 09-06-2010",

}

TY - GEN

T1 - Probabilistic models to reconcile complex data from inaccurate data sources

AU - Blanco, Lorenzo

AU - Crescenzi, Valter

AU - Merialdo, Paolo

AU - Papotti, Paolo

PY - 2010

Y1 - 2010

N2 - Several techniques have been developed to extract and integrate data from web sources. However, web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence. We also report the results of several experiments on both synthetic and real-life data to show the effectiveness of the proposed approach.

AB - Several techniques have been developed to extract and integrate data from web sources. However, web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence. We also report the results of several experiments on both synthetic and real-life data to show the effectiveness of the proposed approach.

UR - http://www.scopus.com/inward/record.url?scp=79955068748&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79955068748&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-13094-6_8

DO - 10.1007/978-3-642-13094-6_8

M3 - Conference contribution

AN - SCOPUS:79955068748

SN - 3642130933

SN - 9783642130939

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 83

EP - 97

BT - Advanced Information Systems Engineering - 22nd International Conference, CAiSE 2010, Proceedings

T2 - 22nd International Conference on Advanced Information Systems Engineering, CAiSE 2010

Y2 - 7 June 2010 through 9 June 2010

ER -

Probabilistic models to reconcile complex data from inaccurate data sources

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this