Baum-Welch style em approach on simple Bayesian models for Web data annotation

Fatih Gelgi, Hasan Davulcu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

In this paper, our focus will be on weakly annotated data (WAD) which is typically generated by a (semi) automated information extraction system from the Web documents. The extracted information has a certain level of accuracy which can be surpassed by using statistical models that are capable of contextual reasoning such as Bayesian models. Our contribution is an EM algorithm that operates on simple Bayesian models to re-annotate WAD. EM estimates the parameters, i.e., the prior and conditional probabilities by iterating Bayesian model on the given Web data. In the expectation step, Bayesian classifier is trained from current annotations, and in the maximization step, the roles of all the labels are re-annotated to find the best fitting annotation with the current model then the probabilities are re-adjusted from the new annotations. Our experiments show that EM increases the Web data annotation accuracies up to 8%. We use Baum-Welch methodology in our EM approach.

Original languageEnglish (US)
Title of host publicationProceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007
Pages736-742
Number of pages7
DOIs
StatePublished - 2007
EventIEEE/WIC/ACM International Conference on Web Intelligence, WI 2007 - Silicon Valley, CA, United States
Duration: Nov 2 2007Nov 5 2007

Publication series

NameProceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007

Other

OtherIEEE/WIC/ACM International Conference on Web Intelligence, WI 2007
Country/TerritoryUnited States
CitySilicon Valley, CA
Period11/2/0711/5/07

Keywords

  • Baum-Welch
  • Bayesian models
  • Expectation-maximization
  • Weakly annotated data

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Baum-Welch style em approach on simple Bayesian models for Web data annotation'. Together they form a unique fingerprint.

Cite this