TY - GEN
T1 - HCS
T2 - 17th International Conference on Extending Database Technology, EDBT 2014
AU - Nagarkar, Parth
AU - Candan, Kasim
N1 - Funding Information:
This work is supported by NSF grant 116394 "RanKloud: Data Partitioning and Resource Allocation Strategies for Scalable Multimedia and Social Media Analysis"
PY - 2014
Y1 - 2014
N2 - When data are large and query processing workloads consist of data selection and aggregation operations (as in online analytical processing), column-oriented data stores are generally the preferred choice of data organization, because they enable effective data compression, leading to significantly reduced IO. Most columnstore architectures leverage bitmap indices, which themselves can be compressed, for answering queries over data columns. Columndomains (e.g., geographical data, categorical data, biological taxonomies, organizational data) are hierarchical in nature, and it may be more advantageous to create hierarchical bitmap indices, that can help answer queries over different sub-ranges of the domain. However, given a query workload, it is critical to choose the appropriate subset of bitmap indices from the given hierarchy. Thus, in this paper, we introduce the cut-selection problem, which aims to help identify a subset (cut) of the nodes of the domain hierarchy, with the appropriate bitmap indices. We discuss inclusive, exclusive, and hybrid strategies for cut-selection and show that the hybrid strategy can be efficiently computed and returns optimal (in terms of IO) results in cases where there are no memory constraints. We also show that when there is a memory availability constraint, the cut-selection problem becomes difficult and, thus, present efficient cut-selection strategies that return close to optimal results, especially in situations where the memory limitations are very strict (i.e., the data and the hierarchy are much larger than the available memory). Experiment results confirm the efficiency and effectiveness of the proposed cut-selection algorithms.
AB - When data are large and query processing workloads consist of data selection and aggregation operations (as in online analytical processing), column-oriented data stores are generally the preferred choice of data organization, because they enable effective data compression, leading to significantly reduced IO. Most columnstore architectures leverage bitmap indices, which themselves can be compressed, for answering queries over data columns. Columndomains (e.g., geographical data, categorical data, biological taxonomies, organizational data) are hierarchical in nature, and it may be more advantageous to create hierarchical bitmap indices, that can help answer queries over different sub-ranges of the domain. However, given a query workload, it is critical to choose the appropriate subset of bitmap indices from the given hierarchy. Thus, in this paper, we introduce the cut-selection problem, which aims to help identify a subset (cut) of the nodes of the domain hierarchy, with the appropriate bitmap indices. We discuss inclusive, exclusive, and hybrid strategies for cut-selection and show that the hybrid strategy can be efficiently computed and returns optimal (in terms of IO) results in cases where there are no memory constraints. We also show that when there is a memory availability constraint, the cut-selection problem becomes difficult and, thus, present efficient cut-selection strategies that return close to optimal results, especially in situations where the memory limitations are very strict (i.e., the data and the hierarchy are much larger than the available memory). Experiment results confirm the efficiency and effectiveness of the proposed cut-selection algorithms.
UR - http://www.scopus.com/inward/record.url?scp=85014330538&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85014330538&partnerID=8YFLogxK
U2 - 10.5441/002/edbt.2014.26
DO - 10.5441/002/edbt.2014.26
M3 - Conference contribution
AN - SCOPUS:85014330538
T3 - Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings
SP - 271
EP - 282
BT - Advances in Database Technology - EDBT 2014
A2 - Leroy, Vincent
A2 - Christophides, Vassilis
A2 - Christophides, Vassilis
A2 - Idreos, Stratos
A2 - Kementsietsidis, Anastasios
A2 - Garofalakis, Minos
A2 - Amer-Yahia, Sihem
PB - OpenProceedings.org, University of Konstanz, University Library
Y2 - 24 March 2014 through 28 March 2014
ER -