Relational model based annotation of the web data

Fatih Gelgi, Srinivas Vadrevu, Hasan Davulcu

Research output: Contribution to journalArticle

Abstract

In this paper, we present a fast and scalable Bayesian model for improving weakly annotated data - which is typically generated by a (semi) automated information extraction (IE) system from Web documents. Weakly annotated data suffers from incorrect ontological role assignments. Our experimental evaluations with the TAP and a collection of 20,000 home pages from university, shopping and sports Web sites, indicate that the model described here can improve the accuracy of role assignments from 40% to 85% for template driven sites, from 68% to 87% for non-template driven sites.

Original languageEnglish (US)
Pages (from-to)124-129
Number of pages6
JournalAdvances in Intelligent Systems and Computing
Volume43
DOIs
StatePublished - 2007

Fingerprint

Websites
Sports

Keywords

  • Bayesian models
  • Classification
  • Information extraction
  • Weakly annotated data

ASJC Scopus subject areas

  • Computational Mechanics
  • Computer Science Applications
  • Computer Science (miscellaneous)

Cite this

Relational model based annotation of the web data. / Gelgi, Fatih; Vadrevu, Srinivas; Davulcu, Hasan.

In: Advances in Intelligent Systems and Computing, Vol. 43, 2007, p. 124-129.

Research output: Contribution to journalArticle

@article{3491445bec714479a770da3eb9a04483,
title = "Relational model based annotation of the web data",
abstract = "In this paper, we present a fast and scalable Bayesian model for improving weakly annotated data - which is typically generated by a (semi) automated information extraction (IE) system from Web documents. Weakly annotated data suffers from incorrect ontological role assignments. Our experimental evaluations with the TAP and a collection of 20,000 home pages from university, shopping and sports Web sites, indicate that the model described here can improve the accuracy of role assignments from 40{\%} to 85{\%} for template driven sites, from 68{\%} to 87{\%} for non-template driven sites.",
keywords = "Bayesian models, Classification, Information extraction, Weakly annotated data",
author = "Fatih Gelgi and Srinivas Vadrevu and Hasan Davulcu",
year = "2007",
doi = "10.1007/978-3-540-72575-6_20",
language = "English (US)",
volume = "43",
pages = "124--129",
journal = "Advances in Soft Computing",
issn = "1867-5662",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Relational model based annotation of the web data

AU - Gelgi, Fatih

AU - Vadrevu, Srinivas

AU - Davulcu, Hasan

PY - 2007

Y1 - 2007

N2 - In this paper, we present a fast and scalable Bayesian model for improving weakly annotated data - which is typically generated by a (semi) automated information extraction (IE) system from Web documents. Weakly annotated data suffers from incorrect ontological role assignments. Our experimental evaluations with the TAP and a collection of 20,000 home pages from university, shopping and sports Web sites, indicate that the model described here can improve the accuracy of role assignments from 40% to 85% for template driven sites, from 68% to 87% for non-template driven sites.

AB - In this paper, we present a fast and scalable Bayesian model for improving weakly annotated data - which is typically generated by a (semi) automated information extraction (IE) system from Web documents. Weakly annotated data suffers from incorrect ontological role assignments. Our experimental evaluations with the TAP and a collection of 20,000 home pages from university, shopping and sports Web sites, indicate that the model described here can improve the accuracy of role assignments from 40% to 85% for template driven sites, from 68% to 87% for non-template driven sites.

KW - Bayesian models

KW - Classification

KW - Information extraction

KW - Weakly annotated data

UR - http://www.scopus.com/inward/record.url?scp=58149270808&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=58149270808&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-72575-6_20

DO - 10.1007/978-3-540-72575-6_20

M3 - Article

AN - SCOPUS:58149270808

VL - 43

SP - 124

EP - 129

JO - Advances in Soft Computing

JF - Advances in Soft Computing

SN - 1867-5662

ER -