Extracting relevant snippets from web documents through language model based text segmentation

Qing Li; Kasim Candan; Yan Qi

doi:10.1109/WI.2007.56

Extracting relevant snippets from web documents through language model based text segmentation

Qing Li, Kasim Candan, Yan Qi

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

6 Scopus citations

Abstract

Extracting a query-oriented snippet (or passage) and highlighting the relevant information in long document can help reduce the result navigation cost of end users. While the traditional approach of highlighting matching keywords helps when the search is keyword oriented, finding appropriate snippets to represent matches to more complex queries requires novel techniques that can help characterize the relevance of various parts of a document to the given query, succinctly. In this paper, we present a languagemodel based method for accurately detecting the most relevant passages of a given document. Unlike previous works in passage retrieval which focus on searching relevance nodes for filtering of preoccupied passages, we focus on query-informed segmentation for snippet extraction. The algorithms presented in this paper are currently being deployed in OASIS, a system to help reduce the navigational load of blind users in accessing Web-based digital libraries.

Original language	English (US)
Title of host publication	Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007
Pages	287-290
Number of pages	4
DOIs	https://doi.org/10.1109/WI.2007.56
State	Published - 2007
Event	IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007 - Silicon Valley, CA, United States Duration: Nov 2 2007 → Nov 5 2007

Publication series

Name	Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007

Other

Other	IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007
Country/Territory	United States
City	Silicon Valley, CA
Period	11/2/07 → 11/5/07

ASJC Scopus subject areas

Artificial Intelligence
Computer Networks and Communications

Access to Document

10.1109/WI.2007.56

Cite this

Li, Q., Candan, K., & Qi, Y. (2007). Extracting relevant snippets from web documents through language model based text segmentation. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007 (pp. 287-290). Article 4427103 (Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007). https://doi.org/10.1109/WI.2007.56

Extracting relevant snippets from web documents through language model based text segmentation. / Li, Qing; Candan, Kasim; Qi, Yan.
Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007. 2007. p. 287-290 4427103 (Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Li, Q, Candan, K & Qi, Y 2007, Extracting relevant snippets from web documents through language model based text segmentation. in Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007., 4427103, Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007, pp. 287-290, IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007, Silicon Valley, CA, United States, 11/2/07. https://doi.org/10.1109/WI.2007.56

@inproceedings{4c85ab16b7cf4363938ee14a34756343,

title = "Extracting relevant snippets from web documents through language model based text segmentation",

abstract = "Extracting a query-oriented snippet (or passage) and highlighting the relevant information in long document can help reduce the result navigation cost of end users. While the traditional approach of highlighting matching keywords helps when the search is keyword oriented, finding appropriate snippets to represent matches to more complex queries requires novel techniques that can help characterize the relevance of various parts of a document to the given query, succinctly. In this paper, we present a languagemodel based method for accurately detecting the most relevant passages of a given document. Unlike previous works in passage retrieval which focus on searching relevance nodes for filtering of preoccupied passages, we focus on query-informed segmentation for snippet extraction. The algorithms presented in this paper are currently being deployed in OASIS, a system to help reduce the navigational load of blind users in accessing Web-based digital libraries.",

author = "Qing Li and Kasim Candan and Yan Qi",

year = "2007",

doi = "10.1109/WI.2007.56",

language = "English (US)",

isbn = "0769530265",

series = "Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007",

pages = "287--290",

booktitle = "Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007",

note = "IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007 ; Conference date: 02-11-2007 Through 05-11-2007",

}

TY - GEN

T1 - Extracting relevant snippets from web documents through language model based text segmentation

AU - Li, Qing

AU - Candan, Kasim

AU - Qi, Yan

PY - 2007

Y1 - 2007

N2 - Extracting a query-oriented snippet (or passage) and highlighting the relevant information in long document can help reduce the result navigation cost of end users. While the traditional approach of highlighting matching keywords helps when the search is keyword oriented, finding appropriate snippets to represent matches to more complex queries requires novel techniques that can help characterize the relevance of various parts of a document to the given query, succinctly. In this paper, we present a languagemodel based method for accurately detecting the most relevant passages of a given document. Unlike previous works in passage retrieval which focus on searching relevance nodes for filtering of preoccupied passages, we focus on query-informed segmentation for snippet extraction. The algorithms presented in this paper are currently being deployed in OASIS, a system to help reduce the navigational load of blind users in accessing Web-based digital libraries.

AB - Extracting a query-oriented snippet (or passage) and highlighting the relevant information in long document can help reduce the result navigation cost of end users. While the traditional approach of highlighting matching keywords helps when the search is keyword oriented, finding appropriate snippets to represent matches to more complex queries requires novel techniques that can help characterize the relevance of various parts of a document to the given query, succinctly. In this paper, we present a languagemodel based method for accurately detecting the most relevant passages of a given document. Unlike previous works in passage retrieval which focus on searching relevance nodes for filtering of preoccupied passages, we focus on query-informed segmentation for snippet extraction. The algorithms presented in this paper are currently being deployed in OASIS, a system to help reduce the navigational load of blind users in accessing Web-based digital libraries.

UR - http://www.scopus.com/inward/record.url?scp=48349103195&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=48349103195&partnerID=8YFLogxK

U2 - 10.1109/WI.2007.56

DO - 10.1109/WI.2007.56

M3 - Conference contribution

AN - SCOPUS:48349103195

SN - 0769530265

SN - 9780769530260

T3 - Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007

SP - 287

EP - 290

BT - Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007

T2 - IEEE/WIC/ACM International Conference on Web Intelligence, WI 2007

Y2 - 2 November 2007 through 5 November 2007

ER -

Extracting relevant snippets from web documents through language model based text segmentation

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this