Extracting relevant snippets from web documents through language model based text segmentation

Qing Li, Kasim Candan, Yan Qi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

Extracting a query-oriented snippet (or passage) and highlighting the relevant information in long document can help reduce the result navigation cost of end users. While the traditional approach of highlighting matching keywords helps when the search is keyword oriented, finding appropriate snippets to represent matches to more complex queries requires novel techniques that can help characterize the relevance of various parts of a document to the given query, succinctly. In this paper, we present a languagemodel based method for accurately detecting the most relevant passages of a given document. Unlike previous works in passage retrieval which focus on searching relevance nodes for filtering of preoccupied passages, we focus on query-informed segmentation for snippet extraction. The algorithms presented in this paper are currently being deployed in OASIS, a system to help reduce the navigational load of blind users in accessing Web-based digital libraries.

Original languageEnglish (US)
Title of host publicationProceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007
Pages287-290
Number of pages4
DOIs
StatePublished - Dec 1 2007
EventIEEE/WIC/ACM International Conference on Web Intelligence, WI 2007 - Silicon Valley, CA, United States
Duration: Nov 2 2007Nov 5 2007

Publication series

NameProceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007

Other

OtherIEEE/WIC/ACM International Conference on Web Intelligence, WI 2007
CountryUnited States
CitySilicon Valley, CA
Period11/2/0711/5/07

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Extracting relevant snippets from web documents through language model based text segmentation'. Together they form a unique fingerprint.

  • Cite this

    Li, Q., Candan, K., & Qi, Y. (2007). Extracting relevant snippets from web documents through language model based text segmentation. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007 (pp. 287-290). [4427103] (Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007). https://doi.org/10.1109/WI.2007.56