GeoSparkViz: A scalable geospatial data visualization framework in the apache spark ecosystem

Jia Yu; Zongsi Zhang; Mohamed Elsayed

doi:10.1145/3221269.3223040

GeoSparkViz: A scalable geospatial data visualization framework in the apache spark ecosystem

Jia Yu, Zongsi Zhang, Mohamed Elsayed

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

9 Scopus citations

Abstract

Data Visualization allows users to summarize, analyze and reason about data. A map visualization tool first loads the designated geospatial data, processes the data and then applies the map visualization effect. Guaranteeing detailed and accurate geospatial map visualization (e.g., at multiple zoom levels) requires extremely high-resolution maps. Classic solutions suffer from limited computation resources and hence take a tremendous amount of time to generate maps for large-scale geospatial data. The paper presents GeoSparkViz a large-scale geospatial map visualization framework. GeoSparkViz extends a cluster computing system (Apache Spark in our case) to provide native support for general cartographic design. The proposed system seamlessly integrates with a Spark-based spatial data management system, GeoSpark. It provides the data scientist a holistic system that allows her to perform data management and visualization on spatial data and reduces the overhead of loading the intermediate spatial data generated during the data management phase to the designated map visualization tool. GeoSparkViz also proposes a map tile data partitioning method that achieves load balancing for the map visualization workloads among all nodes in the cluster. Extensive experiments show that GeoSparkViz can generate a high-resolution (i.e., Gigapixel image) Heatmap of 1.7 billion Open-StreetMaps objects and 1.3 billion NYC taxi trips in ≈4 and 5 minutes on a four-node commodity cluster, respectively.

Original language	English (US)
Title of host publication	Scientific and Statistical Database Management - 30th International Conference, SSDBM 2018, Proceedings
Editors	Michael Bohlen, Johann Gamper, Peer Kroger, Dimitris Sacharidis
Publisher	Association for Computing Machinery
ISBN (Electronic)	9781450365055
DOIs	https://doi.org/10.1145/3221269.3223040
State	Published - Jul 9 2018
Event	30th International Conference on Scientific and Statistical Database Management, SSDBM 2018 - Bolzano-Bozen, Italy Duration: Jul 9 2018 → Jul 11 2018

Publication series

Name	ACM International Conference Proceeding Series

Other

Other	30th International Conference on Scientific and Statistical Database Management, SSDBM 2018
Country/Territory	Italy
City	Bolzano-Bozen
Period	7/9/18 → 7/11/18

Keywords

Big spatial data
Distributed computation
Spatial visualization

ASJC Scopus subject areas

Software
Human-Computer Interaction
Computer Vision and Pattern Recognition
Computer Networks and Communications

Access to Document

10.1145/3221269.3223040

Cite this

Yu, J., Zhang, Z., & Elsayed, M. (2018). GeoSparkViz: A scalable geospatial data visualization framework in the apache spark ecosystem. In M. Bohlen, J. Gamper, P. Kroger, & D. Sacharidis (Eds.), Scientific and Statistical Database Management - 30th International Conference, SSDBM 2018, Proceedings (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3221269.3223040

GeoSparkViz: A scalable geospatial data visualization framework in the apache spark ecosystem. / Yu, Jia; Zhang, Zongsi; Elsayed, Mohamed.
Scientific and Statistical Database Management - 30th International Conference, SSDBM 2018, Proceedings. ed. / Michael Bohlen; Johann Gamper; Peer Kroger; Dimitris Sacharidis. Association for Computing Machinery, 2018. (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Yu, J, Zhang, Z & Elsayed, M 2018, GeoSparkViz: A scalable geospatial data visualization framework in the apache spark ecosystem. in M Bohlen, J Gamper, P Kroger & D Sacharidis (eds), Scientific and Statistical Database Management - 30th International Conference, SSDBM 2018, Proceedings. ACM International Conference Proceeding Series, Association for Computing Machinery, 30th International Conference on Scientific and Statistical Database Management, SSDBM 2018, Bolzano-Bozen, Italy, 7/9/18. https://doi.org/10.1145/3221269.3223040

Yu J, Zhang Z, Elsayed M. GeoSparkViz: A scalable geospatial data visualization framework in the apache spark ecosystem. In Bohlen M, Gamper J, Kroger P, Sacharidis D, editors, Scientific and Statistical Database Management - 30th International Conference, SSDBM 2018, Proceedings. Association for Computing Machinery. 2018. (ACM International Conference Proceeding Series). doi: 10.1145/3221269.3223040

Yu, Jia ; Zhang, Zongsi ; Elsayed, Mohamed. / GeoSparkViz : A scalable geospatial data visualization framework in the apache spark ecosystem. Scientific and Statistical Database Management - 30th International Conference, SSDBM 2018, Proceedings. editor / Michael Bohlen ; Johann Gamper ; Peer Kroger ; Dimitris Sacharidis. Association for Computing Machinery, 2018. (ACM International Conference Proceeding Series).

@inproceedings{32bad4b4608b4515955797c764417bb1,

title = "GeoSparkViz: A scalable geospatial data visualization framework in the apache spark ecosystem",

abstract = "Data Visualization allows users to summarize, analyze and reason about data. A map visualization tool first loads the designated geospatial data, processes the data and then applies the map visualization effect. Guaranteeing detailed and accurate geospatial map visualization (e.g., at multiple zoom levels) requires extremely high-resolution maps. Classic solutions suffer from limited computation resources and hence take a tremendous amount of time to generate maps for large-scale geospatial data. The paper presents GeoSparkViz a large-scale geospatial map visualization framework. GeoSparkViz extends a cluster computing system (Apache Spark in our case) to provide native support for general cartographic design. The proposed system seamlessly integrates with a Spark-based spatial data management system, GeoSpark. It provides the data scientist a holistic system that allows her to perform data management and visualization on spatial data and reduces the overhead of loading the intermediate spatial data generated during the data management phase to the designated map visualization tool. GeoSparkViz also proposes a map tile data partitioning method that achieves load balancing for the map visualization workloads among all nodes in the cluster. Extensive experiments show that GeoSparkViz can generate a high-resolution (i.e., Gigapixel image) Heatmap of 1.7 billion Open-StreetMaps objects and 1.3 billion NYC taxi trips in ≈4 and 5 minutes on a four-node commodity cluster, respectively.",

keywords = "Big spatial data, Distributed computation, Spatial visualization",

author = "Jia Yu and Zongsi Zhang and Mohamed Elsayed",

note = "Funding Information: This work is supported in part by the National Science Foundation (NSF) under Grant 1654861, the Salt River Project Agricultural Improvement and Power District (SRP), and the DOD-ARMY Training and Doctrine Command (TRADOC). Publisher Copyright: {\textcopyright} 2018 Association for Computing Machinery.; 30th International Conference on Scientific and Statistical Database Management, SSDBM 2018 ; Conference date: 09-07-2018 Through 11-07-2018",

year = "2018",

month = jul,

day = "9",

doi = "10.1145/3221269.3223040",

language = "English (US)",

series = "ACM International Conference Proceeding Series",

publisher = "Association for Computing Machinery",

editor = "Michael Bohlen and Johann Gamper and Peer Kroger and Dimitris Sacharidis",

booktitle = "Scientific and Statistical Database Management - 30th International Conference, SSDBM 2018, Proceedings",

}

TY - GEN

T1 - GeoSparkViz

T2 - 30th International Conference on Scientific and Statistical Database Management, SSDBM 2018

AU - Yu, Jia

AU - Zhang, Zongsi

AU - Elsayed, Mohamed

N1 - Funding Information: This work is supported in part by the National Science Foundation (NSF) under Grant 1654861, the Salt River Project Agricultural Improvement and Power District (SRP), and the DOD-ARMY Training and Doctrine Command (TRADOC). Publisher Copyright: © 2018 Association for Computing Machinery.

PY - 2018/7/9

Y1 - 2018/7/9

N2 - Data Visualization allows users to summarize, analyze and reason about data. A map visualization tool first loads the designated geospatial data, processes the data and then applies the map visualization effect. Guaranteeing detailed and accurate geospatial map visualization (e.g., at multiple zoom levels) requires extremely high-resolution maps. Classic solutions suffer from limited computation resources and hence take a tremendous amount of time to generate maps for large-scale geospatial data. The paper presents GeoSparkViz a large-scale geospatial map visualization framework. GeoSparkViz extends a cluster computing system (Apache Spark in our case) to provide native support for general cartographic design. The proposed system seamlessly integrates with a Spark-based spatial data management system, GeoSpark. It provides the data scientist a holistic system that allows her to perform data management and visualization on spatial data and reduces the overhead of loading the intermediate spatial data generated during the data management phase to the designated map visualization tool. GeoSparkViz also proposes a map tile data partitioning method that achieves load balancing for the map visualization workloads among all nodes in the cluster. Extensive experiments show that GeoSparkViz can generate a high-resolution (i.e., Gigapixel image) Heatmap of 1.7 billion Open-StreetMaps objects and 1.3 billion NYC taxi trips in ≈4 and 5 minutes on a four-node commodity cluster, respectively.

AB - Data Visualization allows users to summarize, analyze and reason about data. A map visualization tool first loads the designated geospatial data, processes the data and then applies the map visualization effect. Guaranteeing detailed and accurate geospatial map visualization (e.g., at multiple zoom levels) requires extremely high-resolution maps. Classic solutions suffer from limited computation resources and hence take a tremendous amount of time to generate maps for large-scale geospatial data. The paper presents GeoSparkViz a large-scale geospatial map visualization framework. GeoSparkViz extends a cluster computing system (Apache Spark in our case) to provide native support for general cartographic design. The proposed system seamlessly integrates with a Spark-based spatial data management system, GeoSpark. It provides the data scientist a holistic system that allows her to perform data management and visualization on spatial data and reduces the overhead of loading the intermediate spatial data generated during the data management phase to the designated map visualization tool. GeoSparkViz also proposes a map tile data partitioning method that achieves load balancing for the map visualization workloads among all nodes in the cluster. Extensive experiments show that GeoSparkViz can generate a high-resolution (i.e., Gigapixel image) Heatmap of 1.7 billion Open-StreetMaps objects and 1.3 billion NYC taxi trips in ≈4 and 5 minutes on a four-node commodity cluster, respectively.

KW - Big spatial data

KW - Distributed computation

KW - Spatial visualization

UR - http://www.scopus.com/inward/record.url?scp=85054936441&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054936441&partnerID=8YFLogxK

U2 - 10.1145/3221269.3223040

DO - 10.1145/3221269.3223040

M3 - Conference contribution

AN - SCOPUS:85054936441

T3 - ACM International Conference Proceeding Series

BT - Scientific and Statistical Database Management - 30th International Conference, SSDBM 2018, Proceedings

A2 - Bohlen, Michael

A2 - Gamper, Johann

A2 - Kroger, Peer

A2 - Sacharidis, Dimitris

PB - Association for Computing Machinery

Y2 - 9 July 2018 through 11 July 2018

ER -

GeoSparkViz: A scalable geospatial data visualization framework in the apache spark ecosystem

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this