Reasoning for Web document associations and its applications in site map construction

K. Selçuk Candan, Wen Syan Li

Research output: Contribution to journalArticlepeer-review

17 Scopus citations

Abstract

Recently, there is an interest in using associations between Web pages in providing users with pages relevant to what they are currently viewing. We believe that, to enable intelligent decisions, we need to answer the question "for a given set of pages, find out why they are associated". We present a framework for reasoning about Web document associations. We start from the observation that the reasons of the Web page associations are implicitly embedded in the content of the pages as well as the links connecting them. The association reasoning scheme we propose is based on a random walk algorithm. This algorithm can take both link structure and contents into consideration and allows users to specify a focus. We then show how the proposed algorithm, combined with a logical domain identification technique, can be used for Web site summarization and Web site map construction to help users navigate through complex corporate sites. We see that, to achieve this goal, it is essential to recover the Web authors' intentions and superimpose it with the users' retrieval contexts in summarizing Web sites. Therefore, we present a framework, which uses logical neighborhoods, entry pages, and associations of entry pages, in creating context-sensitive summaries and maps of complex Web sites.

Original languageEnglish (US)
Pages (from-to)121-150
Number of pages30
JournalData and Knowledge Engineering
Volume43
Issue number2
DOIs
StatePublished - Nov 2002
Externally publishedYes

Keywords

  • Connectivity
  • Link analysis
  • Random walk
  • Reasoning about associations
  • Topic distillation
  • WWW

ASJC Scopus subject areas

  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Reasoning for Web document associations and its applications in site map construction'. Together they form a unique fingerprint.

Cite this