A demonstration of GeoSpark: A cluster computing framework for processing big spatial data

Jia Yu, Jinxuan Wu, Mohamed Elsayed

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

This paper demonstrates GEOSPARK a cluster computing framework for developing and processing large-scale spatial data analytics programs. GEOSPARK consists of three main layers: Apache Spark Layer, Spatial RDD Layer and Spatial Query Processing Layer. Apache Spark Layer provides basic Apache Spark functionalities as regular RDD operations. Spatial RDD Layer consists of three novel Spatial Resilient Distributed Datasets (SRDDs) which extend regular Apache Spark RDD to support geometrical and spatial objects with data partitioning and indexing. Spatial Query Processing Layer executes spatial queries (e.g., Spatial Join) on SRDDs. The dynamic status of SRDDs and spatial operations are visualized by GEOSPARK monitoring map interface. We demonstrate GEOSPARK using three spatial analytics applications (spatial aggregation, autocorrelation and co-location) to show how users can easily define their spatial analytics tasks and efficiently process such tasks on large-scale spatial data at interactive performance.

Original languageEnglish (US)
Title of host publication2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1410-1413
Number of pages4
ISBN (Electronic)9781509020195
DOIs
StatePublished - Jun 22 2016
Event32nd IEEE International Conference on Data Engineering, ICDE 2016 - Helsinki, Finland
Duration: May 16 2016May 20 2016

Other

Other32nd IEEE International Conference on Data Engineering, ICDE 2016
CountryFinland
CityHelsinki
Period5/16/165/20/16

Fingerprint

Cluster computing
Electric sparks
Demonstrations
Query processing
Processing
Autocorrelation
Agglomeration
Monitoring

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Computer Graphics and Computer-Aided Design
  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management

Cite this

Yu, J., Wu, J., & Elsayed, M. (2016). A demonstration of GeoSpark: A cluster computing framework for processing big spatial data. In 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016 (pp. 1410-1413). [7498357] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDE.2016.7498357

A demonstration of GeoSpark : A cluster computing framework for processing big spatial data. / Yu, Jia; Wu, Jinxuan; Elsayed, Mohamed.

2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016. Institute of Electrical and Electronics Engineers Inc., 2016. p. 1410-1413 7498357.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yu, J, Wu, J & Elsayed, M 2016, A demonstration of GeoSpark: A cluster computing framework for processing big spatial data. in 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016., 7498357, Institute of Electrical and Electronics Engineers Inc., pp. 1410-1413, 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, 5/16/16. https://doi.org/10.1109/ICDE.2016.7498357
Yu J, Wu J, Elsayed M. A demonstration of GeoSpark: A cluster computing framework for processing big spatial data. In 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016. Institute of Electrical and Electronics Engineers Inc. 2016. p. 1410-1413. 7498357 https://doi.org/10.1109/ICDE.2016.7498357
Yu, Jia ; Wu, Jinxuan ; Elsayed, Mohamed. / A demonstration of GeoSpark : A cluster computing framework for processing big spatial data. 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 1410-1413
@inproceedings{d6b2ddc72f7b4a24b41711288622291b,
title = "A demonstration of GeoSpark: A cluster computing framework for processing big spatial data",
abstract = "This paper demonstrates GEOSPARK a cluster computing framework for developing and processing large-scale spatial data analytics programs. GEOSPARK consists of three main layers: Apache Spark Layer, Spatial RDD Layer and Spatial Query Processing Layer. Apache Spark Layer provides basic Apache Spark functionalities as regular RDD operations. Spatial RDD Layer consists of three novel Spatial Resilient Distributed Datasets (SRDDs) which extend regular Apache Spark RDD to support geometrical and spatial objects with data partitioning and indexing. Spatial Query Processing Layer executes spatial queries (e.g., Spatial Join) on SRDDs. The dynamic status of SRDDs and spatial operations are visualized by GEOSPARK monitoring map interface. We demonstrate GEOSPARK using three spatial analytics applications (spatial aggregation, autocorrelation and co-location) to show how users can easily define their spatial analytics tasks and efficiently process such tasks on large-scale spatial data at interactive performance.",
author = "Jia Yu and Jinxuan Wu and Mohamed Elsayed",
year = "2016",
month = "6",
day = "22",
doi = "10.1109/ICDE.2016.7498357",
language = "English (US)",
pages = "1410--1413",
booktitle = "2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - A demonstration of GeoSpark

T2 - A cluster computing framework for processing big spatial data

AU - Yu, Jia

AU - Wu, Jinxuan

AU - Elsayed, Mohamed

PY - 2016/6/22

Y1 - 2016/6/22

N2 - This paper demonstrates GEOSPARK a cluster computing framework for developing and processing large-scale spatial data analytics programs. GEOSPARK consists of three main layers: Apache Spark Layer, Spatial RDD Layer and Spatial Query Processing Layer. Apache Spark Layer provides basic Apache Spark functionalities as regular RDD operations. Spatial RDD Layer consists of three novel Spatial Resilient Distributed Datasets (SRDDs) which extend regular Apache Spark RDD to support geometrical and spatial objects with data partitioning and indexing. Spatial Query Processing Layer executes spatial queries (e.g., Spatial Join) on SRDDs. The dynamic status of SRDDs and spatial operations are visualized by GEOSPARK monitoring map interface. We demonstrate GEOSPARK using three spatial analytics applications (spatial aggregation, autocorrelation and co-location) to show how users can easily define their spatial analytics tasks and efficiently process such tasks on large-scale spatial data at interactive performance.

AB - This paper demonstrates GEOSPARK a cluster computing framework for developing and processing large-scale spatial data analytics programs. GEOSPARK consists of three main layers: Apache Spark Layer, Spatial RDD Layer and Spatial Query Processing Layer. Apache Spark Layer provides basic Apache Spark functionalities as regular RDD operations. Spatial RDD Layer consists of three novel Spatial Resilient Distributed Datasets (SRDDs) which extend regular Apache Spark RDD to support geometrical and spatial objects with data partitioning and indexing. Spatial Query Processing Layer executes spatial queries (e.g., Spatial Join) on SRDDs. The dynamic status of SRDDs and spatial operations are visualized by GEOSPARK monitoring map interface. We demonstrate GEOSPARK using three spatial analytics applications (spatial aggregation, autocorrelation and co-location) to show how users can easily define their spatial analytics tasks and efficiently process such tasks on large-scale spatial data at interactive performance.

UR - http://www.scopus.com/inward/record.url?scp=84980328112&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84980328112&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2016.7498357

DO - 10.1109/ICDE.2016.7498357

M3 - Conference contribution

AN - SCOPUS:84980328112

SP - 1410

EP - 1413

BT - 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -