TY - GEN
T1 - Skip-and-prune
T2 - International Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09
AU - Kim, Jong Wook
AU - Candan, Kasim
PY - 2009
Y1 - 2009
N2 - Keyword search and ranked retrieval together emerged as popular data access paradigms for various kinds of data, from web pages to XML and relational databases. A user can submit keywords without knowing much (sometimes nothing) about the complex structure underlying a data collection, yet the system can identify, rank, and return a set of relevant matches by exploiting statistics about the distribution and structure of the data. Keyword-based data models are also suitable for capturing user's search context in terms of weights associated to the keywords in the query. Given a search context, the data in the database can also be re-interpreted for semantically correct retrieval. This option, however, is often ignored as the cost of re-assessing the content in the database naively tends to be prohibitive. In this paper, we first argue that top-k query processing can help tackle this challenge by re-assessing only the relevant parts of the database, efficiently. A road-block in this process, however, is that most efficient implementations of top-k query processing assume that the scoring function is monotonic, whereas the cosine-based scoring function needed for re-interpretation of content based on user context is not. In this paper, we develop an efficient top-k query processing algorithm, skip-and-prune (SnP), which is able to process top-k queries under cosine-based non-monotonic scoring functions. We compare the use of proposed algorithm against the alternative implementations of the context-aware retrieval, including naive top-k, accumulator-based inverted files, and full-scan. The experiment results show that while being fast, naive top-k is not an effective solution due to the non-monotonicity of underlying scoring function. The proposed technique, SnP, however, matches the precision of accumulator-based inverted files and full-scan, yet it is orders of magnitude faster than these.
AB - Keyword search and ranked retrieval together emerged as popular data access paradigms for various kinds of data, from web pages to XML and relational databases. A user can submit keywords without knowing much (sometimes nothing) about the complex structure underlying a data collection, yet the system can identify, rank, and return a set of relevant matches by exploiting statistics about the distribution and structure of the data. Keyword-based data models are also suitable for capturing user's search context in terms of weights associated to the keywords in the query. Given a search context, the data in the database can also be re-interpreted for semantically correct retrieval. This option, however, is often ignored as the cost of re-assessing the content in the database naively tends to be prohibitive. In this paper, we first argue that top-k query processing can help tackle this challenge by re-assessing only the relevant parts of the database, efficiently. A road-block in this process, however, is that most efficient implementations of top-k query processing assume that the scoring function is monotonic, whereas the cosine-based scoring function needed for re-interpretation of content based on user context is not. In this paper, we develop an efficient top-k query processing algorithm, skip-and-prune (SnP), which is able to process top-k queries under cosine-based non-monotonic scoring functions. We compare the use of proposed algorithm against the alternative implementations of the context-aware retrieval, including naive top-k, accumulator-based inverted files, and full-scan. The experiment results show that while being fast, naive top-k is not an effective solution due to the non-monotonicity of underlying scoring function. The proposed technique, SnP, however, matches the precision of accumulator-based inverted files and full-scan, yet it is orders of magnitude faster than these.
KW - Ranking
KW - Top-K
UR - http://www.scopus.com/inward/record.url?scp=70849092408&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70849092408&partnerID=8YFLogxK
U2 - 10.1145/1559845.1559859
DO - 10.1145/1559845.1559859
M3 - Conference contribution
AN - SCOPUS:70849092408
SN - 9781605585543
T3 - SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems
SP - 115
EP - 126
BT - SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems
Y2 - 29 June 2009 through 2 July 2009
ER -