OntoMiner: automated metadata and instance mining from news websites

Hasan Davulcu; Srinivas Vadrevu; Saravanakumar Nagarajan

doi:10.1504/IJWGS.2005.008320

OntoMiner: automated metadata and instance mining from news websites

Hasan Davulcu, Srinivas Vadrevu, Saravanakumar Nagarajan

Computer Science and Engineering

Research output: Contribution to journal › Article › peer-review

7 Scopus citations

Abstract

RDF/XML has been widely recognised as the standard for annotating online web documents and for transforming the HTML web into the so-called Semantic Web. In order to enable widespread usability of the Semantic Web, there is a need to bootstrap large, rich and up-to-date domain ontologies that organise the most relevant concepts, their relationships and instances. In this paper, we present automated techniques for bootstrapping and populating specialised domain ontologies by organising and mining a set of relevant overlapping websites. We develop algorithms that detect and utilise HTML regularities in the web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. We also report experimental evaluation for the news, travel and shopping domains to demonstrate the efficacy of our algorithms.

Original language	English (US)
Pages (from-to)	196-221
Number of pages	26
Journal	International Journal of Web and Grid Services
Volume	1
Issue number	2
DOIs	https://doi.org/10.1504/IJWGS.2005.008320
State	Published - 2005

Keywords

automation
instance ontology
metadata
mining
news
semantic
web

ASJC Scopus subject areas

Software
Computer Networks and Communications

Access to Document

10.1504/IJWGS.2005.008320

Cite this

@article{96037fefead94240a84738990c6bb381,

title = "OntoMiner: automated metadata and instance mining from news websites",

abstract = "RDF/XML has been widely recognised as the standard for annotating online web documents and for transforming the HTML web into the so-called Semantic Web. In order to enable widespread usability of the Semantic Web, there is a need to bootstrap large, rich and up-to-date domain ontologies that organise the most relevant concepts, their relationships and instances. In this paper, we present automated techniques for bootstrapping and populating specialised domain ontologies by organising and mining a set of relevant overlapping websites. We develop algorithms that detect and utilise HTML regularities in the web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. We also report experimental evaluation for the news, travel and shopping domains to demonstrate the efficacy of our algorithms.",

keywords = "automation, instance ontology, metadata, mining, news, semantic, web",

author = "Hasan Davulcu and Srinivas Vadrevu and Saravanakumar Nagarajan",

year = "2005",

doi = "10.1504/IJWGS.2005.008320",

language = "English (US)",

volume = "1",

pages = "196--221",

journal = "International Journal of Web and Grid Services",

issn = "1741-1106",

publisher = "Inderscience Enterprises Ltd",

number = "2",

}

TY - JOUR

T1 - OntoMiner

T2 - automated metadata and instance mining from news websites

AU - Davulcu, Hasan

AU - Vadrevu, Srinivas

AU - Nagarajan, Saravanakumar

PY - 2005

Y1 - 2005

N2 - RDF/XML has been widely recognised as the standard for annotating online web documents and for transforming the HTML web into the so-called Semantic Web. In order to enable widespread usability of the Semantic Web, there is a need to bootstrap large, rich and up-to-date domain ontologies that organise the most relevant concepts, their relationships and instances. In this paper, we present automated techniques for bootstrapping and populating specialised domain ontologies by organising and mining a set of relevant overlapping websites. We develop algorithms that detect and utilise HTML regularities in the web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. We also report experimental evaluation for the news, travel and shopping domains to demonstrate the efficacy of our algorithms.

AB - RDF/XML has been widely recognised as the standard for annotating online web documents and for transforming the HTML web into the so-called Semantic Web. In order to enable widespread usability of the Semantic Web, there is a need to bootstrap large, rich and up-to-date domain ontologies that organise the most relevant concepts, their relationships and instances. In this paper, we present automated techniques for bootstrapping and populating specialised domain ontologies by organising and mining a set of relevant overlapping websites. We develop algorithms that detect and utilise HTML regularities in the web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. We also report experimental evaluation for the news, travel and shopping domains to demonstrate the efficacy of our algorithms.

KW - automation

KW - instance ontology

KW - metadata

KW - mining

KW - news

KW - semantic

KW - web

UR - http://www.scopus.com/inward/record.url?scp=33745281538&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745281538&partnerID=8YFLogxK

U2 - 10.1504/IJWGS.2005.008320

DO - 10.1504/IJWGS.2005.008320

M3 - Article

AN - SCOPUS:33745281538

SN - 1741-1106

VL - 1

SP - 196

EP - 221

JO - International Journal of Web and Grid Services

JF - International Journal of Web and Grid Services

IS - 2

ER -

OntoMiner: automated metadata and instance mining from news websites

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this