Skip-and-prune: Cosine-based top-k query processing for efficient context-sensitive document retrieval

Jong Wook Kim; Kasim Candan

doi:10.1145/1559845.1559859

Skip-and-prune: Cosine-based top-k query processing for efficient context-sensitive document retrieval

Jong Wook Kim, Kasim Candan

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

11 Scopus citations

Abstract

Keyword search and ranked retrieval together emerged as popular data access paradigms for various kinds of data, from web pages to XML and relational databases. A user can submit keywords without knowing much (sometimes nothing) about the complex structure underlying a data collection, yet the system can identify, rank, and return a set of relevant matches by exploiting statistics about the distribution and structure of the data. Keyword-based data models are also suitable for capturing user's search context in terms of weights associated to the keywords in the query. Given a search context, the data in the database can also be re-interpreted for semantically correct retrieval. This option, however, is often ignored as the cost of re-assessing the content in the database naively tends to be prohibitive. In this paper, we first argue that top-k query processing can help tackle this challenge by re-assessing only the relevant parts of the database, efficiently. A road-block in this process, however, is that most efficient implementations of top-k query processing assume that the scoring function is monotonic, whereas the cosine-based scoring function needed for re-interpretation of content based on user context is not. In this paper, we develop an efficient top-k query processing algorithm, skip-and-prune (SnP), which is able to process top-k queries under cosine-based non-monotonic scoring functions. We compare the use of proposed algorithm against the alternative implementations of the context-aware retrieval, including naive top-k, accumulator-based inverted files, and full-scan. The experiment results show that while being fast, naive top-k is not an effective solution due to the non-monotonicity of underlying scoring function. The proposed technique, SnP, however, matches the precision of accumulator-based inverted files and full-scan, yet it is orders of magnitude faster than these.

Original language	English (US)
Title of host publication	SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems
Pages	115-126
Number of pages	12
DOIs	https://doi.org/10.1145/1559845.1559859
State	Published - 2009
Event	International Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09 - Providence, RI, United States Duration: Jun 29 2009 → Jul 2 2009

Publication series

Name	SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems

Conference

Conference	International Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09
Country/Territory	United States
City	Providence, RI
Period	6/29/09 → 7/2/09

Keywords

Ranking
Top-K

ASJC Scopus subject areas

Software

Access to Document

10.1145/1559845.1559859

Cite this

Kim, J. W., & Candan, K. (2009). Skip-and-prune: Cosine-based top-k query processing for efficient context-sensitive document retrieval. In SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems (pp. 115-126). (SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems). https://doi.org/10.1145/1559845.1559859

Skip-and-prune: Cosine-based top-k query processing for efficient context-sensitive document retrieval. / Kim, Jong Wook; Candan, Kasim.
SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems. 2009. p. 115-126 (SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Kim, JW & Candan, K 2009, Skip-and-prune: Cosine-based top-k query processing for efficient context-sensitive document retrieval. in SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems. SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems, pp. 115-126, International Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09, Providence, RI, United States, 6/29/09. https://doi.org/10.1145/1559845.1559859

Kim JW, Candan K. Skip-and-prune: Cosine-based top-k query processing for efficient context-sensitive document retrieval. In SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems. 2009. p. 115-126. (SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems). doi: 10.1145/1559845.1559859

Kim, Jong Wook ; Candan, Kasim. / Skip-and-prune : Cosine-based top-k query processing for efficient context-sensitive document retrieval. SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems. 2009. pp. 115-126 (SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems).

@inproceedings{590e97c84d6046a9a1e3c64e525f5c8f,

title = "Skip-and-prune: Cosine-based top-k query processing for efficient context-sensitive document retrieval",

abstract = "Keyword search and ranked retrieval together emerged as popular data access paradigms for various kinds of data, from web pages to XML and relational databases. A user can submit keywords without knowing much (sometimes nothing) about the complex structure underlying a data collection, yet the system can identify, rank, and return a set of relevant matches by exploiting statistics about the distribution and structure of the data. Keyword-based data models are also suitable for capturing user's search context in terms of weights associated to the keywords in the query. Given a search context, the data in the database can also be re-interpreted for semantically correct retrieval. This option, however, is often ignored as the cost of re-assessing the content in the database naively tends to be prohibitive. In this paper, we first argue that top-k query processing can help tackle this challenge by re-assessing only the relevant parts of the database, efficiently. A road-block in this process, however, is that most efficient implementations of top-k query processing assume that the scoring function is monotonic, whereas the cosine-based scoring function needed for re-interpretation of content based on user context is not. In this paper, we develop an efficient top-k query processing algorithm, skip-and-prune (SnP), which is able to process top-k queries under cosine-based non-monotonic scoring functions. We compare the use of proposed algorithm against the alternative implementations of the context-aware retrieval, including naive top-k, accumulator-based inverted files, and full-scan. The experiment results show that while being fast, naive top-k is not an effective solution due to the non-monotonicity of underlying scoring function. The proposed technique, SnP, however, matches the precision of accumulator-based inverted files and full-scan, yet it is orders of magnitude faster than these.",

keywords = "Ranking, Top-K",

author = "Kim, {Jong Wook} and Kasim Candan",

year = "2009",

doi = "10.1145/1559845.1559859",

language = "English (US)",

isbn = "9781605585543",

series = "SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems",

pages = "115--126",

booktitle = "SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems",

note = "International Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09 ; Conference date: 29-06-2009 Through 02-07-2009",

}

TY - GEN

T1 - Skip-and-prune

T2 - International Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09

AU - Kim, Jong Wook

AU - Candan, Kasim

PY - 2009

Y1 - 2009

N2 - Keyword search and ranked retrieval together emerged as popular data access paradigms for various kinds of data, from web pages to XML and relational databases. A user can submit keywords without knowing much (sometimes nothing) about the complex structure underlying a data collection, yet the system can identify, rank, and return a set of relevant matches by exploiting statistics about the distribution and structure of the data. Keyword-based data models are also suitable for capturing user's search context in terms of weights associated to the keywords in the query. Given a search context, the data in the database can also be re-interpreted for semantically correct retrieval. This option, however, is often ignored as the cost of re-assessing the content in the database naively tends to be prohibitive. In this paper, we first argue that top-k query processing can help tackle this challenge by re-assessing only the relevant parts of the database, efficiently. A road-block in this process, however, is that most efficient implementations of top-k query processing assume that the scoring function is monotonic, whereas the cosine-based scoring function needed for re-interpretation of content based on user context is not. In this paper, we develop an efficient top-k query processing algorithm, skip-and-prune (SnP), which is able to process top-k queries under cosine-based non-monotonic scoring functions. We compare the use of proposed algorithm against the alternative implementations of the context-aware retrieval, including naive top-k, accumulator-based inverted files, and full-scan. The experiment results show that while being fast, naive top-k is not an effective solution due to the non-monotonicity of underlying scoring function. The proposed technique, SnP, however, matches the precision of accumulator-based inverted files and full-scan, yet it is orders of magnitude faster than these.

AB - Keyword search and ranked retrieval together emerged as popular data access paradigms for various kinds of data, from web pages to XML and relational databases. A user can submit keywords without knowing much (sometimes nothing) about the complex structure underlying a data collection, yet the system can identify, rank, and return a set of relevant matches by exploiting statistics about the distribution and structure of the data. Keyword-based data models are also suitable for capturing user's search context in terms of weights associated to the keywords in the query. Given a search context, the data in the database can also be re-interpreted for semantically correct retrieval. This option, however, is often ignored as the cost of re-assessing the content in the database naively tends to be prohibitive. In this paper, we first argue that top-k query processing can help tackle this challenge by re-assessing only the relevant parts of the database, efficiently. A road-block in this process, however, is that most efficient implementations of top-k query processing assume that the scoring function is monotonic, whereas the cosine-based scoring function needed for re-interpretation of content based on user context is not. In this paper, we develop an efficient top-k query processing algorithm, skip-and-prune (SnP), which is able to process top-k queries under cosine-based non-monotonic scoring functions. We compare the use of proposed algorithm against the alternative implementations of the context-aware retrieval, including naive top-k, accumulator-based inverted files, and full-scan. The experiment results show that while being fast, naive top-k is not an effective solution due to the non-monotonicity of underlying scoring function. The proposed technique, SnP, however, matches the precision of accumulator-based inverted files and full-scan, yet it is orders of magnitude faster than these.

KW - Ranking

KW - Top-K

UR - http://www.scopus.com/inward/record.url?scp=70849092408&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70849092408&partnerID=8YFLogxK

U2 - 10.1145/1559845.1559859

DO - 10.1145/1559845.1559859

M3 - Conference contribution

AN - SCOPUS:70849092408

SN - 9781605585543

T3 - SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems

SP - 115

EP - 126

BT - SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems

Y2 - 29 June 2009 through 2 July 2009

ER -

Skip-and-prune: Cosine-based top-k query processing for efficient context-sensitive document retrieval

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this