OntoMiner: Bootstrapping ontologies from overlapping domain specific web sites

Hasan Davulcu; Srinivas Vadrevu; Saravanakumar Nagarajan

OntoMiner: Bootstrapping ontologies from overlapping domain specific web sites

Hasan Davulcu, Srinivas Vadrevu, Saravanakumar Nagarajan

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

6 Scopus citations

Abstract

In this paper, we present automated techniques for boot-strapping and populating specialized domain ontologies by organizing and mining a set of relevant overlapping Web sites provided by the user. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. Experimental evaluation for the News, Travel, and Shopping domains indicates that our algorithms can bootstrap and populate domain specific ontologies with high precision and recall.

Original language	English (US)
Title of host publication	Thirteenth International World Wide Web Conference Proceedings, WWW2004
Pages	1232-1233
Number of pages	2
State	Published - 2004
Event	Thirteenth International World Wide Web Conference Proceedings, WWW2004 - New York, NY, United States Duration: May 17 2004 → May 22 2004

Publication series

Name	Thirteenth International World Wide Web Conference Proceedings, WWW2004

Other

Other	Thirteenth International World Wide Web Conference Proceedings, WWW2004
Country/Territory	United States
City	New York, NY
Period	5/17/04 → 5/22/04

Keywords

Data Mining
Ontology
Semantic Web
Web Mining

ASJC Scopus subject areas

General Engineering

Cite this

OntoMiner: Bootstrapping ontologies from overlapping domain specific web sites. / Davulcu, Hasan; Vadrevu, Srinivas; Nagarajan, Saravanakumar.
Thirteenth International World Wide Web Conference Proceedings, WWW2004. 2004. p. 1232-1233 (Thirteenth International World Wide Web Conference Proceedings, WWW2004).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Davulcu, H, Vadrevu, S & Nagarajan, S 2004, OntoMiner: Bootstrapping ontologies from overlapping domain specific web sites. in Thirteenth International World Wide Web Conference Proceedings, WWW2004. Thirteenth International World Wide Web Conference Proceedings, WWW2004, pp. 1232-1233, Thirteenth International World Wide Web Conference Proceedings, WWW2004, New York, NY, United States, 5/17/04.

@inproceedings{d1d2722afff54ee892efc56c1366643c,

title = "OntoMiner: Bootstrapping ontologies from overlapping domain specific web sites",

abstract = "In this paper, we present automated techniques for boot-strapping and populating specialized domain ontologies by organizing and mining a set of relevant overlapping Web sites provided by the user. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. Experimental evaluation for the News, Travel, and Shopping domains indicates that our algorithms can bootstrap and populate domain specific ontologies with high precision and recall.",

keywords = "Data Mining, Ontology, Semantic Web, Web Mining",

author = "Hasan Davulcu and Srinivas Vadrevu and Saravanakumar Nagarajan",

year = "2004",

language = "English (US)",

isbn = "158113844X",

series = "Thirteenth International World Wide Web Conference Proceedings, WWW2004",

pages = "1232--1233",

booktitle = "Thirteenth International World Wide Web Conference Proceedings, WWW2004",

note = "Thirteenth International World Wide Web Conference Proceedings, WWW2004 ; Conference date: 17-05-2004 Through 22-05-2004",

}

TY - GEN

T1 - OntoMiner

T2 - Thirteenth International World Wide Web Conference Proceedings, WWW2004

AU - Davulcu, Hasan

AU - Vadrevu, Srinivas

AU - Nagarajan, Saravanakumar

PY - 2004

Y1 - 2004

N2 - In this paper, we present automated techniques for boot-strapping and populating specialized domain ontologies by organizing and mining a set of relevant overlapping Web sites provided by the user. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. Experimental evaluation for the News, Travel, and Shopping domains indicates that our algorithms can bootstrap and populate domain specific ontologies with high precision and recall.

AB - In this paper, we present automated techniques for boot-strapping and populating specialized domain ontologies by organizing and mining a set of relevant overlapping Web sites provided by the user. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. Experimental evaluation for the News, Travel, and Shopping domains indicates that our algorithms can bootstrap and populate domain specific ontologies with high precision and recall.

KW - Data Mining

KW - Ontology

KW - Semantic Web

KW - Web Mining

UR - http://www.scopus.com/inward/record.url?scp=19944376511&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=19944376511&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:19944376511

SN - 158113844X

T3 - Thirteenth International World Wide Web Conference Proceedings, WWW2004

SP - 1232

EP - 1233

BT - Thirteenth International World Wide Web Conference Proceedings, WWW2004

Y2 - 17 May 2004 through 22 May 2004

ER -

OntoMiner: Bootstrapping ontologies from overlapping domain specific web sites

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this