Abstract
Since WWW encourages hypertext and hypermedia docu-ment authoring (e.g., HTML or XML), Web authors tend to create documents that are composed of multiple pages con-nected with hyperlinks or frames. A Web document may be authored in multiple ways, such as (1) all information in one physical page, or (2) a main page and the related information in separate linked pages. Existing Web search engines, however, return only physical pages. In this paper, we introduce and describe the use of the concept of informa-tion unit, which can be viewed as a logical Web document consisting of multiple physical pages as one atomic retrieval unit. We present an algorithm to eÆciently retrieve infor-mation units. Our algorithm can perform progressive query processing over a Web index by considering both document semantic similarity and link structures. Experimental re-sults on synthetic graphs and real Web data show the ef-fectiveness and usefulness of the proposed information unit retrieval technique.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the 10th International Conference on World Wide Web, WWW 2001 |
Publisher | Association for Computing Machinery, Inc |
Pages | 230-244 |
Number of pages | 15 |
ISBN (Print) | 1581133480, 9781581133486 |
DOIs | |
State | Published - Apr 1 2001 |
Externally published | Yes |
Event | 10th International Conference on World Wide Web, WWW 2001 - Hong Kong, Hong Kong Duration: May 1 2001 → May 5 2001 |
Other
Other | 10th International Conference on World Wide Web, WWW 2001 |
---|---|
Country/Territory | Hong Kong |
City | Hong Kong |
Period | 5/1/01 → 5/5/01 |
Keywords
- Link structures
- Pro-gressive processing
- Query relaxation
- Web proximity search
ASJC Scopus subject areas
- Computer Networks and Communications