Hippo in action: Scalable indexing of a billion New York city taxi trips and beyond

Jia Yu, Raha Moraffah, Mohamed Elsayed

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The paper demonstrates Hippo a lightweight database indexing scheme that significantly reduces the storage and maintenance overhead without compromising much on the query execution performance. Hippo stores disk page ranges instead of tuple pointers in the indexed table to reduce the storage space occupied by the index. It maintains simplified histograms that represent the data distribution and adopts a page grouping technique that groups contiguous pages into page ranges based on the similarity of their index key attribute distributions. When a query is issued, Hippo leverages the page ranges and histogram-based page summaries to recognize those pages such that their tuples are guaranteed not to satisfy the query predicates and then inspects the remaining pages.We demonstrate Hippo using a billion NYC taxi trip records.

Original languageEnglish (US)
Title of host publicationProceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017
PublisherIEEE Computer Society
Pages1413-1414
Number of pages2
ISBN (Electronic)9781509065431
DOIs
StatePublished - May 16 2017
Event33rd IEEE International Conference on Data Engineering, ICDE 2017 - San Diego, United States
Duration: Apr 19 2017Apr 22 2017

Other

Other33rd IEEE International Conference on Data Engineering, ICDE 2017
CountryUnited States
CitySan Diego
Period4/19/174/22/17

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Cite this

Yu, J., Moraffah, R., & Elsayed, M. (2017). Hippo in action: Scalable indexing of a billion New York city taxi trips and beyond. In Proceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017 (pp. 1413-1414). [7930097] IEEE Computer Society. https://doi.org/10.1109/ICDE.2017.201

Hippo in action : Scalable indexing of a billion New York city taxi trips and beyond. / Yu, Jia; Moraffah, Raha; Elsayed, Mohamed.

Proceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017. IEEE Computer Society, 2017. p. 1413-1414 7930097.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yu, J, Moraffah, R & Elsayed, M 2017, Hippo in action: Scalable indexing of a billion New York city taxi trips and beyond. in Proceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017., 7930097, IEEE Computer Society, pp. 1413-1414, 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, United States, 4/19/17. https://doi.org/10.1109/ICDE.2017.201
Yu J, Moraffah R, Elsayed M. Hippo in action: Scalable indexing of a billion New York city taxi trips and beyond. In Proceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017. IEEE Computer Society. 2017. p. 1413-1414. 7930097 https://doi.org/10.1109/ICDE.2017.201
Yu, Jia ; Moraffah, Raha ; Elsayed, Mohamed. / Hippo in action : Scalable indexing of a billion New York city taxi trips and beyond. Proceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017. IEEE Computer Society, 2017. pp. 1413-1414
@inproceedings{628fad5b4f644ee8a7784b572d0d297a,
title = "Hippo in action: Scalable indexing of a billion New York city taxi trips and beyond",
abstract = "The paper demonstrates Hippo a lightweight database indexing scheme that significantly reduces the storage and maintenance overhead without compromising much on the query execution performance. Hippo stores disk page ranges instead of tuple pointers in the indexed table to reduce the storage space occupied by the index. It maintains simplified histograms that represent the data distribution and adopts a page grouping technique that groups contiguous pages into page ranges based on the similarity of their index key attribute distributions. When a query is issued, Hippo leverages the page ranges and histogram-based page summaries to recognize those pages such that their tuples are guaranteed not to satisfy the query predicates and then inspects the remaining pages.We demonstrate Hippo using a billion NYC taxi trip records.",
author = "Jia Yu and Raha Moraffah and Mohamed Elsayed",
year = "2017",
month = "5",
day = "16",
doi = "10.1109/ICDE.2017.201",
language = "English (US)",
pages = "1413--1414",
booktitle = "Proceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017",
publisher = "IEEE Computer Society",
address = "United States",

}

TY - GEN

T1 - Hippo in action

T2 - Scalable indexing of a billion New York city taxi trips and beyond

AU - Yu, Jia

AU - Moraffah, Raha

AU - Elsayed, Mohamed

PY - 2017/5/16

Y1 - 2017/5/16

N2 - The paper demonstrates Hippo a lightweight database indexing scheme that significantly reduces the storage and maintenance overhead without compromising much on the query execution performance. Hippo stores disk page ranges instead of tuple pointers in the indexed table to reduce the storage space occupied by the index. It maintains simplified histograms that represent the data distribution and adopts a page grouping technique that groups contiguous pages into page ranges based on the similarity of their index key attribute distributions. When a query is issued, Hippo leverages the page ranges and histogram-based page summaries to recognize those pages such that their tuples are guaranteed not to satisfy the query predicates and then inspects the remaining pages.We demonstrate Hippo using a billion NYC taxi trip records.

AB - The paper demonstrates Hippo a lightweight database indexing scheme that significantly reduces the storage and maintenance overhead without compromising much on the query execution performance. Hippo stores disk page ranges instead of tuple pointers in the indexed table to reduce the storage space occupied by the index. It maintains simplified histograms that represent the data distribution and adopts a page grouping technique that groups contiguous pages into page ranges based on the similarity of their index key attribute distributions. When a query is issued, Hippo leverages the page ranges and histogram-based page summaries to recognize those pages such that their tuples are guaranteed not to satisfy the query predicates and then inspects the remaining pages.We demonstrate Hippo using a billion NYC taxi trip records.

UR - http://www.scopus.com/inward/record.url?scp=85021200087&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85021200087&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2017.201

DO - 10.1109/ICDE.2017.201

M3 - Conference contribution

AN - SCOPUS:85021200087

SP - 1413

EP - 1414

BT - Proceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017

PB - IEEE Computer Society

ER -