Relational model based annotation of the web data

Fatih Gelgi, Srinivas Vadrevu, Hasan Davulcu

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, we present a fast and scalable Bayesian model for improving weakly annotated data - which is typically generated by a (semi) automated information extraction (IE) system from Web documents. Weakly annotated data suffers from incorrect ontological role assignments. Our experimental evaluations with the TAP and a collection of 20,000 home pages from university, shopping and sports Web sites, indicate that the model described here can improve the accuracy of role assignments from 40% to 85% for template driven sites, from 68% to 87% for non-template driven sites.

Original languageEnglish (US)
Pages (from-to)124-129
Number of pages6
JournalAdvances in Soft Computing
Volume43
DOIs
StatePublished - 2007

Keywords

  • Bayesian models
  • Classification
  • Information extraction
  • Weakly annotated data

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computational Mechanics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Relational model based annotation of the web data'. Together they form a unique fingerprint.

Cite this