Semantic partitioning of web pages

Srinivas Vadrevu, Fatih Gelgi, Hasan Davulcu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

In this paper we describe the semantic partitioner algorithm, that uses the structural and presentation regularities of the Web pages to automatically transform them into hierarchical content structures. These content structures enable us to automatically annotate labels in the Web pages with their semantic roles, thus yielding meta-data and instance information for the Web pages, Experimental results with the TAP knowledge base and computer science department Web sites, comprising 16,861 Web pages indicate that our algorithm is able gather meta-data accurately from various types of Web pages. The algorithm is able to achieve this performance without any domain specific engineering requirement.

Original languageEnglish (US)
Title of host publicationWeb Information Systems Engineering, WISE 2005 - 6th International Conference on Web Information Systems Engineering, Proceedings
Pages107-118
Number of pages12
DOIs
StatePublished - 2005
Event6th International Conference on Web Information Systems Engineering, WISE 2005 - New York, NY, United States
Duration: Nov 20 2005Nov 22 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3806 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other6th International Conference on Web Information Systems Engineering, WISE 2005
Country/TerritoryUnited States
CityNew York, NY
Period11/20/0511/22/05

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Semantic partitioning of web pages'. Together they form a unique fingerprint.

Cite this