GeoSparkViz: A scalable geospatial data visualization framework in the apache spark ecosystem

Jia Yu, Zongsi Zhang, Mohamed Elsayed

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Data Visualization allows users to summarize, analyze and reason about data. A map visualization tool first loads the designated geospatial data, processes the data and then applies the map visualization effect. Guaranteeing detailed and accurate geospatial map visualization (e.g., at multiple zoom levels) requires extremely high-resolution maps. Classic solutions suffer from limited computation resources and hence take a tremendous amount of time to generate maps for large-scale geospatial data. The paper presents GeoSparkViz a large-scale geospatial map visualization framework. GeoSparkViz extends a cluster computing system (Apache Spark in our case) to provide native support for general cartographic design. The proposed system seamlessly integrates with a Spark-based spatial data management system, GeoSpark. It provides the data scientist a holistic system that allows her to perform data management and visualization on spatial data and reduces the overhead of loading the intermediate spatial data generated during the data management phase to the designated map visualization tool. GeoSparkViz also proposes a map tile data partitioning method that achieves load balancing for the map visualization workloads among all nodes in the cluster. Extensive experiments show that GeoSparkViz can generate a high-resolution (i.e., Gigapixel image) Heatmap of 1.7 billion Open-StreetMaps objects and 1.3 billion NYC taxi trips in ≈4 and 5 minutes on a four-node commodity cluster, respectively.

Original languageEnglish (US)
Title of host publicationScientific and Statistical Database Management - 30th International Conference, SSDBM 2018, Proceedings
EditorsMichael Bohlen, Johann Gamper, Peer Kroger, Dimitris Sacharidis
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450365055
DOIs
StatePublished - Jul 9 2018
Event30th International Conference on Scientific and Statistical Database Management, SSDBM 2018 - Bolzano-Bozen, Italy
Duration: Jul 9 2018Jul 11 2018

Other

Other30th International Conference on Scientific and Statistical Database Management, SSDBM 2018
CountryItaly
CityBolzano-Bozen
Period7/9/187/11/18

Fingerprint

Data visualization
Electric sparks
Ecosystems
Visualization
Information management
Cluster computing
Image resolution
Tile
Resource allocation

Keywords

  • Big spatial data
  • Distributed computation
  • Spatial visualization

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Yu, J., Zhang, Z., & Elsayed, M. (2018). GeoSparkViz: A scalable geospatial data visualization framework in the apache spark ecosystem. In M. Bohlen, J. Gamper, P. Kroger, & D. Sacharidis (Eds.), Scientific and Statistical Database Management - 30th International Conference, SSDBM 2018, Proceedings Association for Computing Machinery. https://doi.org/10.1145/3221269.3223040

GeoSparkViz : A scalable geospatial data visualization framework in the apache spark ecosystem. / Yu, Jia; Zhang, Zongsi; Elsayed, Mohamed.

Scientific and Statistical Database Management - 30th International Conference, SSDBM 2018, Proceedings. ed. / Michael Bohlen; Johann Gamper; Peer Kroger; Dimitris Sacharidis. Association for Computing Machinery, 2018.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yu, J, Zhang, Z & Elsayed, M 2018, GeoSparkViz: A scalable geospatial data visualization framework in the apache spark ecosystem. in M Bohlen, J Gamper, P Kroger & D Sacharidis (eds), Scientific and Statistical Database Management - 30th International Conference, SSDBM 2018, Proceedings. Association for Computing Machinery, 30th International Conference on Scientific and Statistical Database Management, SSDBM 2018, Bolzano-Bozen, Italy, 7/9/18. https://doi.org/10.1145/3221269.3223040
Yu J, Zhang Z, Elsayed M. GeoSparkViz: A scalable geospatial data visualization framework in the apache spark ecosystem. In Bohlen M, Gamper J, Kroger P, Sacharidis D, editors, Scientific and Statistical Database Management - 30th International Conference, SSDBM 2018, Proceedings. Association for Computing Machinery. 2018 https://doi.org/10.1145/3221269.3223040
Yu, Jia ; Zhang, Zongsi ; Elsayed, Mohamed. / GeoSparkViz : A scalable geospatial data visualization framework in the apache spark ecosystem. Scientific and Statistical Database Management - 30th International Conference, SSDBM 2018, Proceedings. editor / Michael Bohlen ; Johann Gamper ; Peer Kroger ; Dimitris Sacharidis. Association for Computing Machinery, 2018.
@inproceedings{32bad4b4608b4515955797c764417bb1,
title = "GeoSparkViz: A scalable geospatial data visualization framework in the apache spark ecosystem",
abstract = "Data Visualization allows users to summarize, analyze and reason about data. A map visualization tool first loads the designated geospatial data, processes the data and then applies the map visualization effect. Guaranteeing detailed and accurate geospatial map visualization (e.g., at multiple zoom levels) requires extremely high-resolution maps. Classic solutions suffer from limited computation resources and hence take a tremendous amount of time to generate maps for large-scale geospatial data. The paper presents GeoSparkViz a large-scale geospatial map visualization framework. GeoSparkViz extends a cluster computing system (Apache Spark in our case) to provide native support for general cartographic design. The proposed system seamlessly integrates with a Spark-based spatial data management system, GeoSpark. It provides the data scientist a holistic system that allows her to perform data management and visualization on spatial data and reduces the overhead of loading the intermediate spatial data generated during the data management phase to the designated map visualization tool. GeoSparkViz also proposes a map tile data partitioning method that achieves load balancing for the map visualization workloads among all nodes in the cluster. Extensive experiments show that GeoSparkViz can generate a high-resolution (i.e., Gigapixel image) Heatmap of 1.7 billion Open-StreetMaps objects and 1.3 billion NYC taxi trips in ≈4 and 5 minutes on a four-node commodity cluster, respectively.",
keywords = "Big spatial data, Distributed computation, Spatial visualization",
author = "Jia Yu and Zongsi Zhang and Mohamed Elsayed",
year = "2018",
month = "7",
day = "9",
doi = "10.1145/3221269.3223040",
language = "English (US)",
editor = "Michael Bohlen and Johann Gamper and Peer Kroger and Dimitris Sacharidis",
booktitle = "Scientific and Statistical Database Management - 30th International Conference, SSDBM 2018, Proceedings",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - GeoSparkViz

T2 - A scalable geospatial data visualization framework in the apache spark ecosystem

AU - Yu, Jia

AU - Zhang, Zongsi

AU - Elsayed, Mohamed

PY - 2018/7/9

Y1 - 2018/7/9

N2 - Data Visualization allows users to summarize, analyze and reason about data. A map visualization tool first loads the designated geospatial data, processes the data and then applies the map visualization effect. Guaranteeing detailed and accurate geospatial map visualization (e.g., at multiple zoom levels) requires extremely high-resolution maps. Classic solutions suffer from limited computation resources and hence take a tremendous amount of time to generate maps for large-scale geospatial data. The paper presents GeoSparkViz a large-scale geospatial map visualization framework. GeoSparkViz extends a cluster computing system (Apache Spark in our case) to provide native support for general cartographic design. The proposed system seamlessly integrates with a Spark-based spatial data management system, GeoSpark. It provides the data scientist a holistic system that allows her to perform data management and visualization on spatial data and reduces the overhead of loading the intermediate spatial data generated during the data management phase to the designated map visualization tool. GeoSparkViz also proposes a map tile data partitioning method that achieves load balancing for the map visualization workloads among all nodes in the cluster. Extensive experiments show that GeoSparkViz can generate a high-resolution (i.e., Gigapixel image) Heatmap of 1.7 billion Open-StreetMaps objects and 1.3 billion NYC taxi trips in ≈4 and 5 minutes on a four-node commodity cluster, respectively.

AB - Data Visualization allows users to summarize, analyze and reason about data. A map visualization tool first loads the designated geospatial data, processes the data and then applies the map visualization effect. Guaranteeing detailed and accurate geospatial map visualization (e.g., at multiple zoom levels) requires extremely high-resolution maps. Classic solutions suffer from limited computation resources and hence take a tremendous amount of time to generate maps for large-scale geospatial data. The paper presents GeoSparkViz a large-scale geospatial map visualization framework. GeoSparkViz extends a cluster computing system (Apache Spark in our case) to provide native support for general cartographic design. The proposed system seamlessly integrates with a Spark-based spatial data management system, GeoSpark. It provides the data scientist a holistic system that allows her to perform data management and visualization on spatial data and reduces the overhead of loading the intermediate spatial data generated during the data management phase to the designated map visualization tool. GeoSparkViz also proposes a map tile data partitioning method that achieves load balancing for the map visualization workloads among all nodes in the cluster. Extensive experiments show that GeoSparkViz can generate a high-resolution (i.e., Gigapixel image) Heatmap of 1.7 billion Open-StreetMaps objects and 1.3 billion NYC taxi trips in ≈4 and 5 minutes on a four-node commodity cluster, respectively.

KW - Big spatial data

KW - Distributed computation

KW - Spatial visualization

UR - http://www.scopus.com/inward/record.url?scp=85054936441&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054936441&partnerID=8YFLogxK

U2 - 10.1145/3221269.3223040

DO - 10.1145/3221269.3223040

M3 - Conference contribution

AN - SCOPUS:85054936441

BT - Scientific and Statistical Database Management - 30th International Conference, SSDBM 2018, Proceedings

A2 - Bohlen, Michael

A2 - Gamper, Johann

A2 - Kroger, Peer

A2 - Sacharidis, Dimitris

PB - Association for Computing Machinery

ER -