Baum-Welch style em approach on simple Bayesian models for Web data annotation

Fatih Gelgi, Hasan Davulcu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In this paper, our focus will be on weakly annotated data (WAD) which is typically generated by a (semi) automated information extraction system from the Web documents. The extracted information has a certain level of accuracy which can be surpassed by using statistical models that are capable of contextual reasoning such as Bayesian models. Our contribution is an EM algorithm that operates on simple Bayesian models to re-annotate WAD. EM estimates the parameters, i.e., the prior and conditional probabilities by iterating Bayesian model on the given Web data. In the expectation step, Bayesian classifier is trained from current annotations, and in the maximization step, the roles of all the labels are re-annotated to find the best fitting annotation with the current model then the probabilities are re-adjusted from the new annotations. Our experiments show that EM increases the Web data annotation accuracies up to 8%. We use Baum-Welch methodology in our EM approach.

Original languageEnglish (US)
Title of host publicationProceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007
Pages736-742
Number of pages7
DOIs
StatePublished - 2007
EventIEEE/WIC/ACM International Conference on Web Intelligence, WI 2007 - Silicon Valley, CA, United States
Duration: Nov 2 2007Nov 5 2007

Other

OtherIEEE/WIC/ACM International Conference on Web Intelligence, WI 2007
CountryUnited States
CitySilicon Valley, CA
Period11/2/0711/5/07

Fingerprint

Labels
Classifiers
Experiments
Statistical Models

Keywords

  • Baum-Welch
  • Bayesian models
  • Expectation-maximization
  • Weakly annotated data

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications

Cite this

Gelgi, F., & Davulcu, H. (2007). Baum-Welch style em approach on simple Bayesian models for Web data annotation. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007 (pp. 736-742). [4427182] https://doi.org/10.1109/WI.2007.4427182

Baum-Welch style em approach on simple Bayesian models for Web data annotation. / Gelgi, Fatih; Davulcu, Hasan.

Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007. 2007. p. 736-742 4427182.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gelgi, F & Davulcu, H 2007, Baum-Welch style em approach on simple Bayesian models for Web data annotation. in Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007., 4427182, pp. 736-742, IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007, Silicon Valley, CA, United States, 11/2/07. https://doi.org/10.1109/WI.2007.4427182
Gelgi F, Davulcu H. Baum-Welch style em approach on simple Bayesian models for Web data annotation. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007. 2007. p. 736-742. 4427182 https://doi.org/10.1109/WI.2007.4427182
Gelgi, Fatih ; Davulcu, Hasan. / Baum-Welch style em approach on simple Bayesian models for Web data annotation. Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007. 2007. pp. 736-742
@inproceedings{6b5ef24884d543b6adb109c66d1df52d,
title = "Baum-Welch style em approach on simple Bayesian models for Web data annotation",
abstract = "In this paper, our focus will be on weakly annotated data (WAD) which is typically generated by a (semi) automated information extraction system from the Web documents. The extracted information has a certain level of accuracy which can be surpassed by using statistical models that are capable of contextual reasoning such as Bayesian models. Our contribution is an EM algorithm that operates on simple Bayesian models to re-annotate WAD. EM estimates the parameters, i.e., the prior and conditional probabilities by iterating Bayesian model on the given Web data. In the expectation step, Bayesian classifier is trained from current annotations, and in the maximization step, the roles of all the labels are re-annotated to find the best fitting annotation with the current model then the probabilities are re-adjusted from the new annotations. Our experiments show that EM increases the Web data annotation accuracies up to 8{\%}. We use Baum-Welch methodology in our EM approach.",
keywords = "Baum-Welch, Bayesian models, Expectation-maximization, Weakly annotated data",
author = "Fatih Gelgi and Hasan Davulcu",
year = "2007",
doi = "10.1109/WI.2007.4427182",
language = "English (US)",
isbn = "0769530265",
pages = "736--742",
booktitle = "Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007",

}

TY - GEN

T1 - Baum-Welch style em approach on simple Bayesian models for Web data annotation

AU - Gelgi, Fatih

AU - Davulcu, Hasan

PY - 2007

Y1 - 2007

N2 - In this paper, our focus will be on weakly annotated data (WAD) which is typically generated by a (semi) automated information extraction system from the Web documents. The extracted information has a certain level of accuracy which can be surpassed by using statistical models that are capable of contextual reasoning such as Bayesian models. Our contribution is an EM algorithm that operates on simple Bayesian models to re-annotate WAD. EM estimates the parameters, i.e., the prior and conditional probabilities by iterating Bayesian model on the given Web data. In the expectation step, Bayesian classifier is trained from current annotations, and in the maximization step, the roles of all the labels are re-annotated to find the best fitting annotation with the current model then the probabilities are re-adjusted from the new annotations. Our experiments show that EM increases the Web data annotation accuracies up to 8%. We use Baum-Welch methodology in our EM approach.

AB - In this paper, our focus will be on weakly annotated data (WAD) which is typically generated by a (semi) automated information extraction system from the Web documents. The extracted information has a certain level of accuracy which can be surpassed by using statistical models that are capable of contextual reasoning such as Bayesian models. Our contribution is an EM algorithm that operates on simple Bayesian models to re-annotate WAD. EM estimates the parameters, i.e., the prior and conditional probabilities by iterating Bayesian model on the given Web data. In the expectation step, Bayesian classifier is trained from current annotations, and in the maximization step, the roles of all the labels are re-annotated to find the best fitting annotation with the current model then the probabilities are re-adjusted from the new annotations. Our experiments show that EM increases the Web data annotation accuracies up to 8%. We use Baum-Welch methodology in our EM approach.

KW - Baum-Welch

KW - Bayesian models

KW - Expectation-maximization

KW - Weakly annotated data

UR - http://www.scopus.com/inward/record.url?scp=48349147565&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=48349147565&partnerID=8YFLogxK

U2 - 10.1109/WI.2007.4427182

DO - 10.1109/WI.2007.4427182

M3 - Conference contribution

AN - SCOPUS:48349147565

SN - 0769530265

SN - 9780769530260

SP - 736

EP - 742

BT - Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007

ER -