TY - JOUR
T1 - Lowering the Barriers for Accessing Distributed Geospatial Big Data to Advance Spatial Data Science
T2 - The PolarHub Solution
AU - Li, WenWen
N1 - Funding Information:
This article draws on work supported in part by the following awards: PLR-1349259; BCS-1455349; and PLR-1504432 from the National Science Foundation
Publisher Copyright:
© 2018 by American Association of Geographers.
PY - 2018/5/4
Y1 - 2018/5/4
N2 - Data is the crux of science. The widespread availability of big data today is of particular importance for fostering new forms of geospatial innovation. This article reports a state-of-the-art solution that addresses a key cyberinfrastructure research problem—providing ready access to big, distributed geospatial data resources on the Web. I first formulate this data access problem and introduce its indispensable elements, including identifying the cyberlocation, space and time coverage, theme, and quality of the data set. I then propose strategies to tackle each data access issue and make the data more discoverable and usable for geospatial data users and decision makers. Among these strategies is large-scale Web crawling as a key technique to support automatic collection of online geospatial data that are highly distributed, intrinsically heterogeneous, and known to be dynamic. To better understand the content and scientific meanings of the data, methods including space–time filtering, ontology-based thematic classification, and service quality evaluation are incorporated. To serve a broad scientific user community, these techniques are integrated into an operational data crawling system, PolarHub, which is also an important cyberinfrastructure building block to support effective data discovery. A series of experiments was conducted to demonstrate the outstanding performance of the PolarHub system. This work seems to contribute significantly in building the theoretical and methodological foundation for data-driven geography and the emerging spatial data science.
AB - Data is the crux of science. The widespread availability of big data today is of particular importance for fostering new forms of geospatial innovation. This article reports a state-of-the-art solution that addresses a key cyberinfrastructure research problem—providing ready access to big, distributed geospatial data resources on the Web. I first formulate this data access problem and introduce its indispensable elements, including identifying the cyberlocation, space and time coverage, theme, and quality of the data set. I then propose strategies to tackle each data access issue and make the data more discoverable and usable for geospatial data users and decision makers. Among these strategies is large-scale Web crawling as a key technique to support automatic collection of online geospatial data that are highly distributed, intrinsically heterogeneous, and known to be dynamic. To better understand the content and scientific meanings of the data, methods including space–time filtering, ontology-based thematic classification, and service quality evaluation are incorporated. To serve a broad scientific user community, these techniques are integrated into an operational data crawling system, PolarHub, which is also an important cyberinfrastructure building block to support effective data discovery. A series of experiments was conducted to demonstrate the outstanding performance of the PolarHub system. This work seems to contribute significantly in building the theoretical and methodological foundation for data-driven geography and the emerging spatial data science.
KW - Web crawling
KW - cyberinfrastructure
KW - geospatial big data
KW - semantic classification
KW - spatial data science
UR - http://www.scopus.com/inward/record.url?scp=85033713493&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85033713493&partnerID=8YFLogxK
U2 - 10.1080/24694452.2017.1373625
DO - 10.1080/24694452.2017.1373625
M3 - Article
AN - SCOPUS:85033713493
SN - 2469-4452
VL - 108
SP - 773
EP - 793
JO - Annals of the American Association of Geographers
JF - Annals of the American Association of Geographers
IS - 3
ER -