TY - GEN
T1 - Baum-Welch style em approach on simple Bayesian models for Web data annotation
AU - Gelgi, Fatih
AU - Davulcu, Hasan
PY - 2007
Y1 - 2007
N2 - In this paper, our focus will be on weakly annotated data (WAD) which is typically generated by a (semi) automated information extraction system from the Web documents. The extracted information has a certain level of accuracy which can be surpassed by using statistical models that are capable of contextual reasoning such as Bayesian models. Our contribution is an EM algorithm that operates on simple Bayesian models to re-annotate WAD. EM estimates the parameters, i.e., the prior and conditional probabilities by iterating Bayesian model on the given Web data. In the expectation step, Bayesian classifier is trained from current annotations, and in the maximization step, the roles of all the labels are re-annotated to find the best fitting annotation with the current model then the probabilities are re-adjusted from the new annotations. Our experiments show that EM increases the Web data annotation accuracies up to 8%. We use Baum-Welch methodology in our EM approach.
AB - In this paper, our focus will be on weakly annotated data (WAD) which is typically generated by a (semi) automated information extraction system from the Web documents. The extracted information has a certain level of accuracy which can be surpassed by using statistical models that are capable of contextual reasoning such as Bayesian models. Our contribution is an EM algorithm that operates on simple Bayesian models to re-annotate WAD. EM estimates the parameters, i.e., the prior and conditional probabilities by iterating Bayesian model on the given Web data. In the expectation step, Bayesian classifier is trained from current annotations, and in the maximization step, the roles of all the labels are re-annotated to find the best fitting annotation with the current model then the probabilities are re-adjusted from the new annotations. Our experiments show that EM increases the Web data annotation accuracies up to 8%. We use Baum-Welch methodology in our EM approach.
KW - Baum-Welch
KW - Bayesian models
KW - Expectation-maximization
KW - Weakly annotated data
UR - http://www.scopus.com/inward/record.url?scp=48349147565&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=48349147565&partnerID=8YFLogxK
U2 - 10.1109/WI.2007.4427182
DO - 10.1109/WI.2007.4427182
M3 - Conference contribution
AN - SCOPUS:48349147565
SN - 0769530265
SN - 9780769530260
T3 - Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007
SP - 736
EP - 742
BT - Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007
T2 - IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007
Y2 - 2 November 2007 through 5 November 2007
ER -