Wrapper generation for overlapping Web sources

Mirko Bronzi, Valter Crescenzi, Paolo Merialdo, Paolo Papotti

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

Exploiting the huge amount of data available on the Web involves the generation of wrappers to extract data from webpages.We argue that existing approaches for web data extraction from data-intensive websites miss the opportunities related to the presence of redundant information on the Web. We propose an innovative approach that aims at pushing further the level of automation of existing wrapper generation systems by leveraging the redundancy of data on the Web. An experimental evaluation of the proposed solution shows a relevant improvement for the precision of the extracted data, without a significant loss in the recall.

Original languageEnglish (US)
Title of host publicationProceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011
Pages32-35
Number of pages4
DOIs
StatePublished - Nov 7 2011
Event2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011 - Lyon, France
Duration: Aug 22 2011Aug 27 2011

Publication series

NameProceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011
Volume1

Other

Other2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011
CountryFrance
CityLyon
Period8/22/118/27/11

    Fingerprint

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Artificial Intelligence

Cite this

Bronzi, M., Crescenzi, V., Merialdo, P., & Papotti, P. (2011). Wrapper generation for overlapping Web sources. In Proceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011 (pp. 32-35). [6040492] (Proceedings - 2011 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2011; Vol. 1). https://doi.org/10.1109/WI-IAT.2011.160