TY - GEN
T1 - Twig2Stack
T2 - 32nd International Conference on Very Large Data Bases, VLDB 2006
AU - Chen, Songting
AU - Li, Hua Gang
AU - Tatemura, Junichi
AU - Hsiung, Wang Pin
AU - Agrawal, Divyakant
AU - Candan, K. Selçuk
PY - 2006
Y1 - 2006
N2 - Tree pattern matching is one of the most fundamental tasks for XML query processing. Holistic twig query processing techniques [4, 16] have been developed to minimize the intermediate results, namely, those root-to-leaf path matches that are not in the final twig results. However, useless path matches cannot be completely avoided, especially when there is a parent-child relationship in the twig query. Furthermore, existing approaches do not consider the fact that in practice, in order to process XPath or XQuery statements, a more powerful form of twig queries, namely, Generalized-Tree-Pattern (GTP) [8] queries, is required. Most existing works on processing GTP queries generally calls for costly post-processing for eliminating redundant data and/or grouping of the matching results. In this paper, we first propose a novel hierarchical stack encoding scheme to compactly represent the twig results. We introduce Twig2stack, a bottom-up algorithm for processing twig queries based on this encoding scheme. Then we show how to efficiently enumerate the query results from the encodings for a given GTP query. To our knowledge, this is the first GTP matching solution that avoids any post path-join, sort, duplicate elimination and grouping operations. Extensive performance studies on various data sets and queries show that the proposed Twig2Stack algorithm not only has -better twig query processing performance than state-of-the-art algorithms, but is also capable of efficiently processing the more complex GTP queries.
AB - Tree pattern matching is one of the most fundamental tasks for XML query processing. Holistic twig query processing techniques [4, 16] have been developed to minimize the intermediate results, namely, those root-to-leaf path matches that are not in the final twig results. However, useless path matches cannot be completely avoided, especially when there is a parent-child relationship in the twig query. Furthermore, existing approaches do not consider the fact that in practice, in order to process XPath or XQuery statements, a more powerful form of twig queries, namely, Generalized-Tree-Pattern (GTP) [8] queries, is required. Most existing works on processing GTP queries generally calls for costly post-processing for eliminating redundant data and/or grouping of the matching results. In this paper, we first propose a novel hierarchical stack encoding scheme to compactly represent the twig results. We introduce Twig2stack, a bottom-up algorithm for processing twig queries based on this encoding scheme. Then we show how to efficiently enumerate the query results from the encodings for a given GTP query. To our knowledge, this is the first GTP matching solution that avoids any post path-join, sort, duplicate elimination and grouping operations. Extensive performance studies on various data sets and queries show that the proposed Twig2Stack algorithm not only has -better twig query processing performance than state-of-the-art algorithms, but is also capable of efficiently processing the more complex GTP queries.
UR - http://www.scopus.com/inward/record.url?scp=38049042201&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=38049042201&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:38049042201
SN - 1595933859
SN - 9781595933850
T3 - VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases
SP - 283
EP - 294
BT - VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases
PB - Association for Computing Machinery
Y2 - 12 September 2006 through 15 September 2006
ER -