Onto miner: Bootstrapping ontologies from overlapping domain specific web sites

Hasan Davulcu, Srinivas Vadrevu, Saravanakumar Nagarajan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

In this paper, we present automated techniques for bootstrapping and populating specialized domain ontologies by organizing and mining a set of relevant overlapping Web sites provided by the user. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. Experimental evaluation for the News, Travel, and Shopping domains indicates that our algorithms can bootstrap and populate domain specific ontologies with high precision and recall.

Original languageEnglish (US)
Title of host publicationProceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004
PublisherAssociation for Computing Machinery, Inc
Pages500-501
Number of pages2
ISBN (Electronic)1581139128, 9781581139129
DOIs
StatePublished - May 19 2004
Event13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004 - New York, United States
Duration: May 19 2004May 21 2004

Other

Other13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004
CountryUnited States
CityNew York
Period5/19/045/21/04

Fingerprint

Miners
Ontology
Websites
HTML
XML
World Wide Web
Labels
Semantics

Keywords

  • Data mining
  • Ontology
  • Semantic web
  • Web mining

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Cite this

Davulcu, H., Vadrevu, S., & Nagarajan, S. (2004). Onto miner: Bootstrapping ontologies from overlapping domain specific web sites. In Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004 (pp. 500-501). Association for Computing Machinery, Inc. https://doi.org/10.1145/1013367.1013545

Onto miner : Bootstrapping ontologies from overlapping domain specific web sites. / Davulcu, Hasan; Vadrevu, Srinivas; Nagarajan, Saravanakumar.

Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004. Association for Computing Machinery, Inc, 2004. p. 500-501.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Davulcu, H, Vadrevu, S & Nagarajan, S 2004, Onto miner: Bootstrapping ontologies from overlapping domain specific web sites. in Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004. Association for Computing Machinery, Inc, pp. 500-501, 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004, New York, United States, 5/19/04. https://doi.org/10.1145/1013367.1013545
Davulcu H, Vadrevu S, Nagarajan S. Onto miner: Bootstrapping ontologies from overlapping domain specific web sites. In Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004. Association for Computing Machinery, Inc. 2004. p. 500-501 https://doi.org/10.1145/1013367.1013545
Davulcu, Hasan ; Vadrevu, Srinivas ; Nagarajan, Saravanakumar. / Onto miner : Bootstrapping ontologies from overlapping domain specific web sites. Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004. Association for Computing Machinery, Inc, 2004. pp. 500-501
@inproceedings{6e2dbb30714f4fdf9eb5adbf24ae5200,
title = "Onto miner: Bootstrapping ontologies from overlapping domain specific web sites",
abstract = "In this paper, we present automated techniques for bootstrapping and populating specialized domain ontologies by organizing and mining a set of relevant overlapping Web sites provided by the user. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. Experimental evaluation for the News, Travel, and Shopping domains indicates that our algorithms can bootstrap and populate domain specific ontologies with high precision and recall.",
keywords = "Data mining, Ontology, Semantic web, Web mining",
author = "Hasan Davulcu and Srinivas Vadrevu and Saravanakumar Nagarajan",
year = "2004",
month = "5",
day = "19",
doi = "10.1145/1013367.1013545",
language = "English (US)",
pages = "500--501",
booktitle = "Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004",
publisher = "Association for Computing Machinery, Inc",

}

TY - GEN

T1 - Onto miner

T2 - Bootstrapping ontologies from overlapping domain specific web sites

AU - Davulcu, Hasan

AU - Vadrevu, Srinivas

AU - Nagarajan, Saravanakumar

PY - 2004/5/19

Y1 - 2004/5/19

N2 - In this paper, we present automated techniques for bootstrapping and populating specialized domain ontologies by organizing and mining a set of relevant overlapping Web sites provided by the user. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. Experimental evaluation for the News, Travel, and Shopping domains indicates that our algorithms can bootstrap and populate domain specific ontologies with high precision and recall.

AB - In this paper, we present automated techniques for bootstrapping and populating specialized domain ontologies by organizing and mining a set of relevant overlapping Web sites provided by the user. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. Experimental evaluation for the News, Travel, and Shopping domains indicates that our algorithms can bootstrap and populate domain specific ontologies with high precision and recall.

KW - Data mining

KW - Ontology

KW - Semantic web

KW - Web mining

UR - http://www.scopus.com/inward/record.url?scp=54549106328&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=54549106328&partnerID=8YFLogxK

U2 - 10.1145/1013367.1013545

DO - 10.1145/1013367.1013545

M3 - Conference contribution

AN - SCOPUS:54549106328

SP - 500

EP - 501

BT - Proceedings of the 13th International World Wide Web Conference on Alternate Track, Papers and Posters, WWW Alt. 2004

PB - Association for Computing Machinery, Inc

ER -