Abstract

When data are large and query processing workloads consist of data selection and aggregation operations (as in online analytical processing), column-oriented data stores are generally the preferred choice of data organization, because they enable effective data compression, leading to significantly reduced IO. Most columnstore architectures leverage bitmap indices, which themselves can be compressed, for answering queries over data columns. Columndomains (e.g., geographical data, categorical data, biological taxonomies, organizational data) are hierarchical in nature, and it may be more advantageous to create hierarchical bitmap indices, that can help answer queries over different sub-ranges of the domain. However, given a query workload, it is critical to choose the appropriate subset of bitmap indices from the given hierarchy. Thus, in this paper, we introduce the cut-selection problem, which aims to help identify a subset (cut) of the nodes of the domain hierarchy, with the appropriate bitmap indices. We discuss inclusive, exclusive, and hybrid strategies for cut-selection and show that the hybrid strategy can be efficiently computed and returns optimal (in terms of IO) results in cases where there are no memory constraints. We also show that when there is a memory availability constraint, the cut-selection problem becomes difficult and, thus, present efficient cut-selection strategies that return close to optimal results, especially in situations where the memory limitations are very strict (i.e., the data and the hierarchy are much larger than the available memory). Experiment results confirm the efficiency and effectiveness of the proposed cut-selection algorithms.

Original languageEnglish (US)
Title of host publicationAdvances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings
PublisherOpenProceedings.org, University of Konstanz, University Library
Pages271-282
Number of pages12
ISBN (Electronic)9783893180653
DOIs
StatePublished - 2014
Event17th International Conference on Extending Database Technology, EDBT 2014 - Athens, Greece
Duration: Mar 24 2014Mar 28 2014

Other

Other17th International Conference on Extending Database Technology, EDBT 2014
CountryGreece
CityAthens
Period3/24/143/28/14

Fingerprint

Query processing
Data storage equipment
Data compression
Taxonomies
Agglomeration
Availability
Processing
Experiments

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Software

Cite this

Nagarkar, P., & Candan, K. (2014). HCS: Hierarchical cut selection for efficiently processing queries on data columns using hierarchical bitmap indices. In Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings (pp. 271-282). OpenProceedings.org, University of Konstanz, University Library. https://doi.org/10.5441/002/edbt.2014.26

HCS : Hierarchical cut selection for efficiently processing queries on data columns using hierarchical bitmap indices. / Nagarkar, Parth; Candan, Kasim.

Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings. OpenProceedings.org, University of Konstanz, University Library, 2014. p. 271-282.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nagarkar, P & Candan, K 2014, HCS: Hierarchical cut selection for efficiently processing queries on data columns using hierarchical bitmap indices. in Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings. OpenProceedings.org, University of Konstanz, University Library, pp. 271-282, 17th International Conference on Extending Database Technology, EDBT 2014, Athens, Greece, 3/24/14. https://doi.org/10.5441/002/edbt.2014.26
Nagarkar P, Candan K. HCS: Hierarchical cut selection for efficiently processing queries on data columns using hierarchical bitmap indices. In Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings. OpenProceedings.org, University of Konstanz, University Library. 2014. p. 271-282 https://doi.org/10.5441/002/edbt.2014.26
Nagarkar, Parth ; Candan, Kasim. / HCS : Hierarchical cut selection for efficiently processing queries on data columns using hierarchical bitmap indices. Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings. OpenProceedings.org, University of Konstanz, University Library, 2014. pp. 271-282
@inproceedings{8e9ffad9cf7a4a9d8fc1bacf15a548a3,
title = "HCS: Hierarchical cut selection for efficiently processing queries on data columns using hierarchical bitmap indices",
abstract = "When data are large and query processing workloads consist of data selection and aggregation operations (as in online analytical processing), column-oriented data stores are generally the preferred choice of data organization, because they enable effective data compression, leading to significantly reduced IO. Most columnstore architectures leverage bitmap indices, which themselves can be compressed, for answering queries over data columns. Columndomains (e.g., geographical data, categorical data, biological taxonomies, organizational data) are hierarchical in nature, and it may be more advantageous to create hierarchical bitmap indices, that can help answer queries over different sub-ranges of the domain. However, given a query workload, it is critical to choose the appropriate subset of bitmap indices from the given hierarchy. Thus, in this paper, we introduce the cut-selection problem, which aims to help identify a subset (cut) of the nodes of the domain hierarchy, with the appropriate bitmap indices. We discuss inclusive, exclusive, and hybrid strategies for cut-selection and show that the hybrid strategy can be efficiently computed and returns optimal (in terms of IO) results in cases where there are no memory constraints. We also show that when there is a memory availability constraint, the cut-selection problem becomes difficult and, thus, present efficient cut-selection strategies that return close to optimal results, especially in situations where the memory limitations are very strict (i.e., the data and the hierarchy are much larger than the available memory). Experiment results confirm the efficiency and effectiveness of the proposed cut-selection algorithms.",
author = "Parth Nagarkar and Kasim Candan",
year = "2014",
doi = "10.5441/002/edbt.2014.26",
language = "English (US)",
pages = "271--282",
booktitle = "Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings",
publisher = "OpenProceedings.org, University of Konstanz, University Library",

}

TY - GEN

T1 - HCS

T2 - Hierarchical cut selection for efficiently processing queries on data columns using hierarchical bitmap indices

AU - Nagarkar, Parth

AU - Candan, Kasim

PY - 2014

Y1 - 2014

N2 - When data are large and query processing workloads consist of data selection and aggregation operations (as in online analytical processing), column-oriented data stores are generally the preferred choice of data organization, because they enable effective data compression, leading to significantly reduced IO. Most columnstore architectures leverage bitmap indices, which themselves can be compressed, for answering queries over data columns. Columndomains (e.g., geographical data, categorical data, biological taxonomies, organizational data) are hierarchical in nature, and it may be more advantageous to create hierarchical bitmap indices, that can help answer queries over different sub-ranges of the domain. However, given a query workload, it is critical to choose the appropriate subset of bitmap indices from the given hierarchy. Thus, in this paper, we introduce the cut-selection problem, which aims to help identify a subset (cut) of the nodes of the domain hierarchy, with the appropriate bitmap indices. We discuss inclusive, exclusive, and hybrid strategies for cut-selection and show that the hybrid strategy can be efficiently computed and returns optimal (in terms of IO) results in cases where there are no memory constraints. We also show that when there is a memory availability constraint, the cut-selection problem becomes difficult and, thus, present efficient cut-selection strategies that return close to optimal results, especially in situations where the memory limitations are very strict (i.e., the data and the hierarchy are much larger than the available memory). Experiment results confirm the efficiency and effectiveness of the proposed cut-selection algorithms.

AB - When data are large and query processing workloads consist of data selection and aggregation operations (as in online analytical processing), column-oriented data stores are generally the preferred choice of data organization, because they enable effective data compression, leading to significantly reduced IO. Most columnstore architectures leverage bitmap indices, which themselves can be compressed, for answering queries over data columns. Columndomains (e.g., geographical data, categorical data, biological taxonomies, organizational data) are hierarchical in nature, and it may be more advantageous to create hierarchical bitmap indices, that can help answer queries over different sub-ranges of the domain. However, given a query workload, it is critical to choose the appropriate subset of bitmap indices from the given hierarchy. Thus, in this paper, we introduce the cut-selection problem, which aims to help identify a subset (cut) of the nodes of the domain hierarchy, with the appropriate bitmap indices. We discuss inclusive, exclusive, and hybrid strategies for cut-selection and show that the hybrid strategy can be efficiently computed and returns optimal (in terms of IO) results in cases where there are no memory constraints. We also show that when there is a memory availability constraint, the cut-selection problem becomes difficult and, thus, present efficient cut-selection strategies that return close to optimal results, especially in situations where the memory limitations are very strict (i.e., the data and the hierarchy are much larger than the available memory). Experiment results confirm the efficiency and effectiveness of the proposed cut-selection algorithms.

UR - http://www.scopus.com/inward/record.url?scp=85014330538&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85014330538&partnerID=8YFLogxK

U2 - 10.5441/002/edbt.2014.26

DO - 10.5441/002/edbt.2014.26

M3 - Conference contribution

SP - 271

EP - 282

BT - Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings

PB - OpenProceedings.org, University of Konstanz, University Library

ER -