Characterizing the uncertainty of web data: Models and experiences

Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

An increasing number of web sites offer structured information about recognizable concepts, relevant to many application domains, such as finance, sport, commercial products. However, web data is inherently imprecise and uncertain, and conflicting values can be provided by different web sources. Characterizing the uncertainty of web data represents an important issue and several models have been recently proposed in the literature. The paper illustrates state-of-the-art Bayesan models to evaluate the quality of data extracted from the Web and reports the results of an extensive application of the models on real life web data. Our experimental results show that for some applications even simple approaches can provide effective results, while sophisticated solutions are needed to obtain a more precise characterization of the uncertainty.

Original languageEnglish (US)
Title of host publicationWebQuality 2011 - Proceedings of the Joint WICOW/AIRWeb Workshop on Web Quality
Pages1-8
Number of pages8
DOIs
StatePublished - 2011
Externally publishedYes
EventJoint WICOW/AIRWeb Workshop on Web Quality, WebQuality 2011, Held in Conjunction with the 20th International World Wide Web Conference, WWW 2011 - Hyderabad, India
Duration: Mar 28 2011Mar 28 2011

Publication series

NameACM International Conference Proceeding Series

Other

OtherJoint WICOW/AIRWeb Workshop on Web Quality, WebQuality 2011, Held in Conjunction with the 20th International World Wide Web Conference, WWW 2011
Country/TerritoryIndia
CityHyderabad
Period3/28/113/28/11

Keywords

  • data reconciliation
  • probabilistic data
  • web data extraction

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Characterizing the uncertainty of web data: Models and experiences'. Together they form a unique fingerprint.

Cite this