Extracting relevant snippets from web documents through language model based text segmentation

Qing Li, Kasim Candan, Yan Qi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Extracting a query-oriented snippet (or passage) and highlighting the relevant information in long document can help reduce the result navigation cost of end users. While the traditional approach of highlighting matching keywords helps when the search is keyword oriented, finding appropriate snippets to represent matches to more complex queries requires novel techniques that can help characterize the relevance of various parts of a document to the given query, succinctly. In this paper, we present a languagemodel based method for accurately detecting the most relevant passages of a given document. Unlike previous works in passage retrieval which focus on searching relevance nodes for filtering of preoccupied passages, we focus on query-informed segmentation for snippet extraction. The algorithms presented in this paper are currently being deployed in OASIS, a system to help reduce the navigational load of blind users in accessing Web-based digital libraries.

Original languageEnglish (US)
Title of host publicationProceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007
Pages287-290
Number of pages4
DOIs
StatePublished - 2007
EventIEEE/WIC/ACM International Conference on Web Intelligence, WI 2007 - Silicon Valley, CA, United States
Duration: Nov 2 2007Nov 5 2007

Other

OtherIEEE/WIC/ACM International Conference on Web Intelligence, WI 2007
CountryUnited States
CitySilicon Valley, CA
Period11/2/0711/5/07

Fingerprint

Digital libraries
Navigation
Costs

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications

Cite this

Li, Q., Candan, K., & Qi, Y. (2007). Extracting relevant snippets from web documents through language model based text segmentation. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007 (pp. 287-290). [4427103] https://doi.org/10.1109/WI.2007.56

Extracting relevant snippets from web documents through language model based text segmentation. / Li, Qing; Candan, Kasim; Qi, Yan.

Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007. 2007. p. 287-290 4427103.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Li, Q, Candan, K & Qi, Y 2007, Extracting relevant snippets from web documents through language model based text segmentation. in Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007., 4427103, pp. 287-290, IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007, Silicon Valley, CA, United States, 11/2/07. https://doi.org/10.1109/WI.2007.56
Li Q, Candan K, Qi Y. Extracting relevant snippets from web documents through language model based text segmentation. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007. 2007. p. 287-290. 4427103 https://doi.org/10.1109/WI.2007.56
Li, Qing ; Candan, Kasim ; Qi, Yan. / Extracting relevant snippets from web documents through language model based text segmentation. Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007. 2007. pp. 287-290
@inproceedings{4c85ab16b7cf4363938ee14a34756343,
title = "Extracting relevant snippets from web documents through language model based text segmentation",
abstract = "Extracting a query-oriented snippet (or passage) and highlighting the relevant information in long document can help reduce the result navigation cost of end users. While the traditional approach of highlighting matching keywords helps when the search is keyword oriented, finding appropriate snippets to represent matches to more complex queries requires novel techniques that can help characterize the relevance of various parts of a document to the given query, succinctly. In this paper, we present a languagemodel based method for accurately detecting the most relevant passages of a given document. Unlike previous works in passage retrieval which focus on searching relevance nodes for filtering of preoccupied passages, we focus on query-informed segmentation for snippet extraction. The algorithms presented in this paper are currently being deployed in OASIS, a system to help reduce the navigational load of blind users in accessing Web-based digital libraries.",
author = "Qing Li and Kasim Candan and Yan Qi",
year = "2007",
doi = "10.1109/WI.2007.56",
language = "English (US)",
isbn = "0769530265",
pages = "287--290",
booktitle = "Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007",

}

TY - GEN

T1 - Extracting relevant snippets from web documents through language model based text segmentation

AU - Li, Qing

AU - Candan, Kasim

AU - Qi, Yan

PY - 2007

Y1 - 2007

N2 - Extracting a query-oriented snippet (or passage) and highlighting the relevant information in long document can help reduce the result navigation cost of end users. While the traditional approach of highlighting matching keywords helps when the search is keyword oriented, finding appropriate snippets to represent matches to more complex queries requires novel techniques that can help characterize the relevance of various parts of a document to the given query, succinctly. In this paper, we present a languagemodel based method for accurately detecting the most relevant passages of a given document. Unlike previous works in passage retrieval which focus on searching relevance nodes for filtering of preoccupied passages, we focus on query-informed segmentation for snippet extraction. The algorithms presented in this paper are currently being deployed in OASIS, a system to help reduce the navigational load of blind users in accessing Web-based digital libraries.

AB - Extracting a query-oriented snippet (or passage) and highlighting the relevant information in long document can help reduce the result navigation cost of end users. While the traditional approach of highlighting matching keywords helps when the search is keyword oriented, finding appropriate snippets to represent matches to more complex queries requires novel techniques that can help characterize the relevance of various parts of a document to the given query, succinctly. In this paper, we present a languagemodel based method for accurately detecting the most relevant passages of a given document. Unlike previous works in passage retrieval which focus on searching relevance nodes for filtering of preoccupied passages, we focus on query-informed segmentation for snippet extraction. The algorithms presented in this paper are currently being deployed in OASIS, a system to help reduce the navigational load of blind users in accessing Web-based digital libraries.

UR - http://www.scopus.com/inward/record.url?scp=48349103195&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=48349103195&partnerID=8YFLogxK

U2 - 10.1109/WI.2007.56

DO - 10.1109/WI.2007.56

M3 - Conference contribution

AN - SCOPUS:48349103195

SN - 0769530265

SN - 9780769530260

SP - 287

EP - 290

BT - Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007

ER -