Query relaxation by structure and semantics for retrieval of logical Web documents

Wen Syan Li, Kasim Candan, Quoc Vu, Divyakant Agrawal

Research output: Contribution to journalArticlepeer-review

23 Scopus citations

Abstract

Since the Web encourages hypertext and hypermedia document authoring (e.g., HTML or XML), Web authors tend to create documents that are composed of multiple pages connected with hyperlinks. A Web document may be authored in multiple ways, such as, 1) all information in one physical page, or 2) a main page and the related information in separate linked pages. Existing Web search engines, however, return only physical pages containing keywords. In this paper, we introduce the concept of information unit, which can be viewed as a logical Web document consisting of multiple physical pages as one atomic retrieval unit. We present an algorithm to efficiently retrieve information units. Our algorithm can perform progressive query processing. These functionalities are essential for information retrieval on the Web and large XML databases. We also present experimental results on synthetic graphs and real Web data.

Original languageEnglish (US)
Pages (from-to)768-791
Number of pages24
JournalIEEE Transactions on Knowledge and Data Engineering
Volume14
Issue number4
DOIs
StatePublished - Jul 2002

Keywords

  • Link structures
  • Progressive processing
  • Query relaxation
  • Web proximity search

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Query relaxation by structure and semantics for retrieval of logical Web documents'. Together they form a unique fingerprint.

Cite this