Geospatial data management in apache spark: A tutorial

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The volume of spatial data increases at a staggering rate. This tutorial comprehensively studies how existing works extend Apache Spark to uphold massive-scale spatial data. During this 1.5 hour tutorial, we first provide a background introduction of the characteristics of spatial data and the history of distributed data management systems. A follow-up section presents the common approaches used by the practitioners to extend Spark and introduces the vital components in a generic spatial data management system. The third, fourth and fifth sections then discuss the ongoing efforts and experience in spatial-temporal data, spatial data analytics and streaming spatial data, respectively. The sixth part finally concludes this tutorial to help the audience better grasp the overall content and points out future research directions.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019
PublisherIEEE Computer Society
Pages2060-2063
Number of pages4
ISBN (Electronic)9781538674741
DOIs
StatePublished - Apr 1 2019
Event35th IEEE International Conference on Data Engineering, ICDE 2019 - Macau, China
Duration: Apr 8 2019Apr 11 2019

Publication series

NameProceedings - International Conference on Data Engineering
Volume2019-April
ISSN (Print)1084-4627

Conference

Conference35th IEEE International Conference on Data Engineering, ICDE 2019
CountryChina
CityMacau
Period4/8/194/11/19

Fingerprint

Electric sparks
Information management

Keywords

  • Apache spark
  • Distributed computing
  • Geospatial data

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Cite this

Yu, J., & Elsayed, M. (2019). Geospatial data management in apache spark: A tutorial. In Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019 (pp. 2060-2063). [8731372] (Proceedings - International Conference on Data Engineering; Vol. 2019-April). IEEE Computer Society. https://doi.org/10.1109/ICDE.2019.00239

Geospatial data management in apache spark : A tutorial. / Yu, Jia; Elsayed, Mohamed.

Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019. IEEE Computer Society, 2019. p. 2060-2063 8731372 (Proceedings - International Conference on Data Engineering; Vol. 2019-April).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yu, J & Elsayed, M 2019, Geospatial data management in apache spark: A tutorial. in Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019., 8731372, Proceedings - International Conference on Data Engineering, vol. 2019-April, IEEE Computer Society, pp. 2060-2063, 35th IEEE International Conference on Data Engineering, ICDE 2019, Macau, China, 4/8/19. https://doi.org/10.1109/ICDE.2019.00239
Yu J, Elsayed M. Geospatial data management in apache spark: A tutorial. In Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019. IEEE Computer Society. 2019. p. 2060-2063. 8731372. (Proceedings - International Conference on Data Engineering). https://doi.org/10.1109/ICDE.2019.00239
Yu, Jia ; Elsayed, Mohamed. / Geospatial data management in apache spark : A tutorial. Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019. IEEE Computer Society, 2019. pp. 2060-2063 (Proceedings - International Conference on Data Engineering).
@inproceedings{3199f73246594999bfe0235da4619553,
title = "Geospatial data management in apache spark: A tutorial",
abstract = "The volume of spatial data increases at a staggering rate. This tutorial comprehensively studies how existing works extend Apache Spark to uphold massive-scale spatial data. During this 1.5 hour tutorial, we first provide a background introduction of the characteristics of spatial data and the history of distributed data management systems. A follow-up section presents the common approaches used by the practitioners to extend Spark and introduces the vital components in a generic spatial data management system. The third, fourth and fifth sections then discuss the ongoing efforts and experience in spatial-temporal data, spatial data analytics and streaming spatial data, respectively. The sixth part finally concludes this tutorial to help the audience better grasp the overall content and points out future research directions.",
keywords = "Apache spark, Distributed computing, Geospatial data",
author = "Jia Yu and Mohamed Elsayed",
year = "2019",
month = "4",
day = "1",
doi = "10.1109/ICDE.2019.00239",
language = "English (US)",
series = "Proceedings - International Conference on Data Engineering",
publisher = "IEEE Computer Society",
pages = "2060--2063",
booktitle = "Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019",

}

TY - GEN

T1 - Geospatial data management in apache spark

T2 - A tutorial

AU - Yu, Jia

AU - Elsayed, Mohamed

PY - 2019/4/1

Y1 - 2019/4/1

N2 - The volume of spatial data increases at a staggering rate. This tutorial comprehensively studies how existing works extend Apache Spark to uphold massive-scale spatial data. During this 1.5 hour tutorial, we first provide a background introduction of the characteristics of spatial data and the history of distributed data management systems. A follow-up section presents the common approaches used by the practitioners to extend Spark and introduces the vital components in a generic spatial data management system. The third, fourth and fifth sections then discuss the ongoing efforts and experience in spatial-temporal data, spatial data analytics and streaming spatial data, respectively. The sixth part finally concludes this tutorial to help the audience better grasp the overall content and points out future research directions.

AB - The volume of spatial data increases at a staggering rate. This tutorial comprehensively studies how existing works extend Apache Spark to uphold massive-scale spatial data. During this 1.5 hour tutorial, we first provide a background introduction of the characteristics of spatial data and the history of distributed data management systems. A follow-up section presents the common approaches used by the practitioners to extend Spark and introduces the vital components in a generic spatial data management system. The third, fourth and fifth sections then discuss the ongoing efforts and experience in spatial-temporal data, spatial data analytics and streaming spatial data, respectively. The sixth part finally concludes this tutorial to help the audience better grasp the overall content and points out future research directions.

KW - Apache spark

KW - Distributed computing

KW - Geospatial data

UR - http://www.scopus.com/inward/record.url?scp=85067923448&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067923448&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2019.00239

DO - 10.1109/ICDE.2019.00239

M3 - Conference contribution

AN - SCOPUS:85067923448

T3 - Proceedings - International Conference on Data Engineering

SP - 2060

EP - 2063

BT - Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019

PB - IEEE Computer Society

ER -