Abstract
In this paper, we present a fast and scalable Bayesian model for improving weakly annotated data - which is typically generated by a (semi) automated information extraction (IE) system from Web documents. Weakly annotated data suffers from incorrect ontological role assignments. Our experimental evaluations with the TAP and a collection of 20,000 home pages from university, shopping and sports Web sites, indicate that the model described here can improve the accuracy of role assignments from 40% to 85% for template driven sites, from 68% to 87% for non-template driven sites.
Original language | English (US) |
---|---|
Pages (from-to) | 124-129 |
Number of pages | 6 |
Journal | Advances in Soft Computing |
Volume | 43 |
DOIs | |
State | Published - 2007 |
Keywords
- Bayesian models
- Classification
- Information extraction
- Weakly annotated data
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- Computational Mechanics
- Computer Science Applications