Constructing gazetteers from volunteered Big Geo-Data based on Hadoop

Song Gao, Linna Li, WenWen Li, Krzysztof Janowicz, Yue Zhang

Research output: Contribution to journalArticle

54 Citations (Scopus)

Abstract

Traditional gazetteers are built and maintained by authoritative mapping agencies. In the age of Big Data, it is possible to construct gazetteers in a data-driven approach by mining rich volunteered geographic information (VGI) from the Web. In this research, we build a scalable distributed platform and a high-performance geoprocessing workflow based on the Hadoop ecosystem to harvest crowd-sourced gazetteer entries. Using experiments based on geotagged datasets in Flickr, we find that the MapReduce-based workflow running on the spatially enabled Hadoop cluster can reduce the processing time compared with traditional desktop-based operations by an order of magnitude. We demonstrate how to use such a novel spatial-computing infrastructure to facilitate gazetteer research. In addition, we introduce a provenance-based trust model for quality assurance. This work offers new insights on enriching future gazetteers with the use of Hadoop clusters, and makes contributions in connecting GIS to the cloud computing environment for the next frontier of Big Geo-Data analytics.

Original languageEnglish (US)
JournalComputers, Environment and Urban Systems
DOIs
StateAccepted/In press - 2014

Fingerprint

workflow
quality assurance
provenance
Geographical Information System
GIS
infrastructure
ecosystem
experiment
performance
time
harvest

Keywords

  • Big Geo-Data
  • CyberGIS
  • Gazetteers
  • Hadoop
  • Scalable geoprocessing workflow
  • Volunteered geographic information

ASJC Scopus subject areas

  • Geography, Planning and Development
  • Ecological Modeling
  • Environmental Science(all)

Cite this

Constructing gazetteers from volunteered Big Geo-Data based on Hadoop. / Gao, Song; Li, Linna; Li, WenWen; Janowicz, Krzysztof; Zhang, Yue.

In: Computers, Environment and Urban Systems, 2014.

Research output: Contribution to journalArticle

@article{5d385bc16e7f4bf2bab4355a8110e5e1,
title = "Constructing gazetteers from volunteered Big Geo-Data based on Hadoop",
abstract = "Traditional gazetteers are built and maintained by authoritative mapping agencies. In the age of Big Data, it is possible to construct gazetteers in a data-driven approach by mining rich volunteered geographic information (VGI) from the Web. In this research, we build a scalable distributed platform and a high-performance geoprocessing workflow based on the Hadoop ecosystem to harvest crowd-sourced gazetteer entries. Using experiments based on geotagged datasets in Flickr, we find that the MapReduce-based workflow running on the spatially enabled Hadoop cluster can reduce the processing time compared with traditional desktop-based operations by an order of magnitude. We demonstrate how to use such a novel spatial-computing infrastructure to facilitate gazetteer research. In addition, we introduce a provenance-based trust model for quality assurance. This work offers new insights on enriching future gazetteers with the use of Hadoop clusters, and makes contributions in connecting GIS to the cloud computing environment for the next frontier of Big Geo-Data analytics.",
keywords = "Big Geo-Data, CyberGIS, Gazetteers, Hadoop, Scalable geoprocessing workflow, Volunteered geographic information",
author = "Song Gao and Linna Li and WenWen Li and Krzysztof Janowicz and Yue Zhang",
year = "2014",
doi = "10.1016/j.compenvurbsys.2014.02.004",
language = "English (US)",
journal = "Computers, Environment and Urban Systems",
issn = "0198-9715",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Constructing gazetteers from volunteered Big Geo-Data based on Hadoop

AU - Gao, Song

AU - Li, Linna

AU - Li, WenWen

AU - Janowicz, Krzysztof

AU - Zhang, Yue

PY - 2014

Y1 - 2014

N2 - Traditional gazetteers are built and maintained by authoritative mapping agencies. In the age of Big Data, it is possible to construct gazetteers in a data-driven approach by mining rich volunteered geographic information (VGI) from the Web. In this research, we build a scalable distributed platform and a high-performance geoprocessing workflow based on the Hadoop ecosystem to harvest crowd-sourced gazetteer entries. Using experiments based on geotagged datasets in Flickr, we find that the MapReduce-based workflow running on the spatially enabled Hadoop cluster can reduce the processing time compared with traditional desktop-based operations by an order of magnitude. We demonstrate how to use such a novel spatial-computing infrastructure to facilitate gazetteer research. In addition, we introduce a provenance-based trust model for quality assurance. This work offers new insights on enriching future gazetteers with the use of Hadoop clusters, and makes contributions in connecting GIS to the cloud computing environment for the next frontier of Big Geo-Data analytics.

AB - Traditional gazetteers are built and maintained by authoritative mapping agencies. In the age of Big Data, it is possible to construct gazetteers in a data-driven approach by mining rich volunteered geographic information (VGI) from the Web. In this research, we build a scalable distributed platform and a high-performance geoprocessing workflow based on the Hadoop ecosystem to harvest crowd-sourced gazetteer entries. Using experiments based on geotagged datasets in Flickr, we find that the MapReduce-based workflow running on the spatially enabled Hadoop cluster can reduce the processing time compared with traditional desktop-based operations by an order of magnitude. We demonstrate how to use such a novel spatial-computing infrastructure to facilitate gazetteer research. In addition, we introduce a provenance-based trust model for quality assurance. This work offers new insights on enriching future gazetteers with the use of Hadoop clusters, and makes contributions in connecting GIS to the cloud computing environment for the next frontier of Big Geo-Data analytics.

KW - Big Geo-Data

KW - CyberGIS

KW - Gazetteers

KW - Hadoop

KW - Scalable geoprocessing workflow

KW - Volunteered geographic information

UR - http://www.scopus.com/inward/record.url?scp=84895597100&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84895597100&partnerID=8YFLogxK

U2 - 10.1016/j.compenvurbsys.2014.02.004

DO - 10.1016/j.compenvurbsys.2014.02.004

M3 - Article

JO - Computers, Environment and Urban Systems

JF - Computers, Environment and Urban Systems

SN - 0198-9715

ER -