TY - GEN
T1 - Characterizing the uncertainty of web data
T2 - Joint WICOW/AIRWeb Workshop on Web Quality, WebQuality 2011, Held in Conjunction with the 20th International World Wide Web Conference, WWW 2011
AU - Blanco, Lorenzo
AU - Crescenzi, Valter
AU - Merialdo, Paolo
AU - Papotti, Paolo
PY - 2011
Y1 - 2011
N2 - An increasing number of web sites offer structured information about recognizable concepts, relevant to many application domains, such as finance, sport, commercial products. However, web data is inherently imprecise and uncertain, and conflicting values can be provided by different web sources. Characterizing the uncertainty of web data represents an important issue and several models have been recently proposed in the literature. The paper illustrates state-of-the-art Bayesan models to evaluate the quality of data extracted from the Web and reports the results of an extensive application of the models on real life web data. Our experimental results show that for some applications even simple approaches can provide effective results, while sophisticated solutions are needed to obtain a more precise characterization of the uncertainty.
AB - An increasing number of web sites offer structured information about recognizable concepts, relevant to many application domains, such as finance, sport, commercial products. However, web data is inherently imprecise and uncertain, and conflicting values can be provided by different web sources. Characterizing the uncertainty of web data represents an important issue and several models have been recently proposed in the literature. The paper illustrates state-of-the-art Bayesan models to evaluate the quality of data extracted from the Web and reports the results of an extensive application of the models on real life web data. Our experimental results show that for some applications even simple approaches can provide effective results, while sophisticated solutions are needed to obtain a more precise characterization of the uncertainty.
KW - data reconciliation
KW - probabilistic data
KW - web data extraction
UR - http://www.scopus.com/inward/record.url?scp=79955057032&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79955057032&partnerID=8YFLogxK
U2 - 10.1145/1964114.1964116
DO - 10.1145/1964114.1964116
M3 - Conference contribution
AN - SCOPUS:79955057032
SN - 9781450307062
T3 - ACM International Conference Proceeding Series
SP - 1
EP - 8
BT - WebQuality 2011 - Proceedings of the Joint WICOW/AIRWeb Workshop on Web Quality
Y2 - 28 March 2011 through 28 March 2011
ER -