Automatically building probabilistic databases from the web

Lorenzo Blanco; Mirko Bronzi; Valter Crescenzi; Paolo Merialdo; Paolo Papotti

doi:10.1145/1963192.1963285

Automatically building probabilistic databases from the web

Lorenzo Blanco, Mirko Bronzi, Valter Crescenzi, Paolo Merialdo, Paolo Papotti

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

2 Scopus citations

Abstract

A relevant number of web sites publish structured data about recognizable concepts (such as stock quotes, movies, restau- rants, etc.). There is a great chance to create applications that rely on a huge amount of data taken from the Web. We present an automatic and domain independent system that performs all the steps required to benefit from these data: it discovers data intensive web sites containing information about an entity of interest, extracts and integrate the published data, and finally performs a probabilistic analysis to characterize the impreciseness of the data and the accuracy of the sources. The results of the processing can be used to populate a probabilistic database.

Original language	English (US)
Title of host publication	Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011
Pages	185-188
Number of pages	4
DOIs	https://doi.org/10.1145/1963192.1963285
State	Published - 2011
Externally published	Yes
Event	20th International Conference Companion on World Wide Web, WWW 2011 - Hyderabad, India Duration: Mar 28 2011 → Apr 1 2011

Publication series

Name	Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011

Other

Other	20th International Conference Companion on World Wide Web, WWW 2011
Country/Territory	India
City	Hyderabad
Period	3/28/11 → 4/1/11

Keywords

data integration
probabilistic data
web data extraction

ASJC Scopus subject areas

Computer Networks and Communications
Information Systems

Access to Document

10.1145/1963192.1963285

Cite this

Blanco, L., Bronzi, M., Crescenzi, V., Merialdo, P., & Papotti, P. (2011). Automatically building probabilistic databases from the web. In Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011 (pp. 185-188). (Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011). https://doi.org/10.1145/1963192.1963285

Automatically building probabilistic databases from the web. / Blanco, Lorenzo; Bronzi, Mirko; Crescenzi, Valter et al.
Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011. 2011. p. 185-188 (Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Blanco, L, Bronzi, M, Crescenzi, V, Merialdo, P & Papotti, P 2011, Automatically building probabilistic databases from the web. in Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011. Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011, pp. 185-188, 20th International Conference Companion on World Wide Web, WWW 2011, Hyderabad, India, 3/28/11. https://doi.org/10.1145/1963192.1963285

@inproceedings{5ba23c2d612c402ebf9703dc3c3f3223,

title = "Automatically building probabilistic databases from the web",

abstract = "A relevant number of web sites publish structured data about recognizable concepts (such as stock quotes, movies, restau- rants, etc.). There is a great chance to create applications that rely on a huge amount of data taken from the Web. We present an automatic and domain independent system that performs all the steps required to benefit from these data: it discovers data intensive web sites containing information about an entity of interest, extracts and integrate the published data, and finally performs a probabilistic analysis to characterize the impreciseness of the data and the accuracy of the sources. The results of the processing can be used to populate a probabilistic database.",

keywords = "data integration, probabilistic data, web data extraction",

author = "Lorenzo Blanco and Mirko Bronzi and Valter Crescenzi and Paolo Merialdo and Paolo Papotti",

year = "2011",

doi = "10.1145/1963192.1963285",

language = "English (US)",

isbn = "9781450305181",

series = "Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011",

pages = "185--188",

booktitle = "Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011",

note = "20th International Conference Companion on World Wide Web, WWW 2011 ; Conference date: 28-03-2011 Through 01-04-2011",

}

TY - GEN

T1 - Automatically building probabilistic databases from the web

AU - Blanco, Lorenzo

AU - Bronzi, Mirko

AU - Crescenzi, Valter

AU - Merialdo, Paolo

AU - Papotti, Paolo

PY - 2011

Y1 - 2011

N2 - A relevant number of web sites publish structured data about recognizable concepts (such as stock quotes, movies, restau- rants, etc.). There is a great chance to create applications that rely on a huge amount of data taken from the Web. We present an automatic and domain independent system that performs all the steps required to benefit from these data: it discovers data intensive web sites containing information about an entity of interest, extracts and integrate the published data, and finally performs a probabilistic analysis to characterize the impreciseness of the data and the accuracy of the sources. The results of the processing can be used to populate a probabilistic database.

AB - A relevant number of web sites publish structured data about recognizable concepts (such as stock quotes, movies, restau- rants, etc.). There is a great chance to create applications that rely on a huge amount of data taken from the Web. We present an automatic and domain independent system that performs all the steps required to benefit from these data: it discovers data intensive web sites containing information about an entity of interest, extracts and integrate the published data, and finally performs a probabilistic analysis to characterize the impreciseness of the data and the accuracy of the sources. The results of the processing can be used to populate a probabilistic database.

KW - data integration

KW - probabilistic data

KW - web data extraction

UR - http://www.scopus.com/inward/record.url?scp=79955147285&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79955147285&partnerID=8YFLogxK

U2 - 10.1145/1963192.1963285

DO - 10.1145/1963192.1963285

M3 - Conference contribution

AN - SCOPUS:79955147285

SN - 9781450305181

T3 - Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011

SP - 185

EP - 188

BT - Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011

T2 - 20th International Conference Companion on World Wide Web, WWW 2011

Y2 - 28 March 2011 through 1 April 2011

ER -

Automatically building probabilistic databases from the web

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this