OntoMiner: Bootstrapping ontologies from overlapping domain specific web sites

Hasan Davulcu, Srinivas Vadrevu, Saravanakumar Nagarajan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

In this paper, we present automated techniques for boot-strapping and populating specialized domain ontologies by organizing and mining a set of relevant overlapping Web sites provided by the user. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. Experimental evaluation for the News, Travel, and Shopping domains indicates that our algorithms can bootstrap and populate domain specific ontologies with high precision and recall.

Original languageEnglish (US)
Title of host publicationThirteenth International World Wide Web Conference Proceedings, WWW2004
Pages1232-1233
Number of pages2
StatePublished - 2004
EventThirteenth International World Wide Web Conference Proceedings, WWW2004 - New York, NY, United States
Duration: May 17 2004May 22 2004

Other

OtherThirteenth International World Wide Web Conference Proceedings, WWW2004
CountryUnited States
CityNew York, NY
Period5/17/045/22/04

Fingerprint

Ontology
Websites
Strapping
HTML
XML
World Wide Web
Labels
Semantics

Keywords

  • Data Mining
  • Ontology
  • Semantic Web
  • Web Mining

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Davulcu, H., Vadrevu, S., & Nagarajan, S. (2004). OntoMiner: Bootstrapping ontologies from overlapping domain specific web sites. In Thirteenth International World Wide Web Conference Proceedings, WWW2004 (pp. 1232-1233)

OntoMiner : Bootstrapping ontologies from overlapping domain specific web sites. / Davulcu, Hasan; Vadrevu, Srinivas; Nagarajan, Saravanakumar.

Thirteenth International World Wide Web Conference Proceedings, WWW2004. 2004. p. 1232-1233.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Davulcu, H, Vadrevu, S & Nagarajan, S 2004, OntoMiner: Bootstrapping ontologies from overlapping domain specific web sites. in Thirteenth International World Wide Web Conference Proceedings, WWW2004. pp. 1232-1233, Thirteenth International World Wide Web Conference Proceedings, WWW2004, New York, NY, United States, 5/17/04.
Davulcu H, Vadrevu S, Nagarajan S. OntoMiner: Bootstrapping ontologies from overlapping domain specific web sites. In Thirteenth International World Wide Web Conference Proceedings, WWW2004. 2004. p. 1232-1233
Davulcu, Hasan ; Vadrevu, Srinivas ; Nagarajan, Saravanakumar. / OntoMiner : Bootstrapping ontologies from overlapping domain specific web sites. Thirteenth International World Wide Web Conference Proceedings, WWW2004. 2004. pp. 1232-1233
@inproceedings{d1d2722afff54ee892efc56c1366643c,
title = "OntoMiner: Bootstrapping ontologies from overlapping domain specific web sites",
abstract = "In this paper, we present automated techniques for boot-strapping and populating specialized domain ontologies by organizing and mining a set of relevant overlapping Web sites provided by the user. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. Experimental evaluation for the News, Travel, and Shopping domains indicates that our algorithms can bootstrap and populate domain specific ontologies with high precision and recall.",
keywords = "Data Mining, Ontology, Semantic Web, Web Mining",
author = "Hasan Davulcu and Srinivas Vadrevu and Saravanakumar Nagarajan",
year = "2004",
language = "English (US)",
isbn = "158113844X",
pages = "1232--1233",
booktitle = "Thirteenth International World Wide Web Conference Proceedings, WWW2004",

}

TY - GEN

T1 - OntoMiner

T2 - Bootstrapping ontologies from overlapping domain specific web sites

AU - Davulcu, Hasan

AU - Vadrevu, Srinivas

AU - Nagarajan, Saravanakumar

PY - 2004

Y1 - 2004

N2 - In this paper, we present automated techniques for boot-strapping and populating specialized domain ontologies by organizing and mining a set of relevant overlapping Web sites provided by the user. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. Experimental evaluation for the News, Travel, and Shopping domains indicates that our algorithms can bootstrap and populate domain specific ontologies with high precision and recall.

AB - In this paper, we present automated techniques for boot-strapping and populating specialized domain ontologies by organizing and mining a set of relevant overlapping Web sites provided by the user. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. Experimental evaluation for the News, Travel, and Shopping domains indicates that our algorithms can bootstrap and populate domain specific ontologies with high precision and recall.

KW - Data Mining

KW - Ontology

KW - Semantic Web

KW - Web Mining

UR - http://www.scopus.com/inward/record.url?scp=19944376511&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=19944376511&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:19944376511

SN - 158113844X

SP - 1232

EP - 1233

BT - Thirteenth International World Wide Web Conference Proceedings, WWW2004

ER -