7 Citations (Scopus)

Abstract

The advancement of geospatial interoperability research has fostered the proliferation of geospatial resources that are shared and made publicly available on the Web. However, their increasingly availability has made the identification of the web signature of voluminous geospatial resources a major challenge. In this paper, we introduce our solution of a new cyberinfrastructure platform, the PolarHub, that conducts large-scale web crawling to discover distributed geospatial data and service resources and accomplish this goal efficiently and effectively. The PolarHub is built-upon a service-oriented architecture (SOA) and adopts Data Access Object (DAO)-based software design pattern to ensure the extendibility of the software system. The proposed meta-search-based seed selection and pattern-matching based crawling strategy facilitates the rapid resource identification and discovery through constraining the search scope on the Web. In addition, PolarHub introduces the use of advanced asynchronous communication strategy, which combines client-pull and server-push to ensure high efficiency of the crawling system. These unique design features of PolarHub enable a high performance, scalable, sustainable, collaborative, and interactive platform for active geospatial data discovery. Because of OGC's widespread adoption, OGC-compliant web services become the primary search target of PolarHub. Currently, the PolarHub system is up and running and is serving various scientific community that demands geospatial data. We consider PolarHub a significant contribution to the field of information retrieval and geospatial interoperability.

Original languageEnglish (US)
Pages (from-to)195-207
Number of pages13
JournalComputers, Environment and Urban Systems
Volume59
DOIs
StatePublished - Sep 1 2016

Fingerprint

engine
resource
resources
software
data access
scientific community
information retrieval
proliferation
communication
seed
efficiency
services
performance

Keywords

  • Big data access
  • Cyberinfrastructure
  • Geospatial interoperability
  • PolarHub
  • Scalability

ASJC Scopus subject areas

  • Ecological Modeling
  • Environmental Science(all)
  • Geography, Planning and Development
  • Urban Studies

Cite this

PolarHub : A large-scale web crawling engine for OGC service discovery in cyberinfrastructure. / Li, WenWen; Wang, Sizhe; Bhatia, Vidit.

In: Computers, Environment and Urban Systems, Vol. 59, 01.09.2016, p. 195-207.

Research output: Contribution to journalArticle

@article{35133b68e7d849a980380bf8de928d89,
title = "PolarHub: A large-scale web crawling engine for OGC service discovery in cyberinfrastructure",
abstract = "The advancement of geospatial interoperability research has fostered the proliferation of geospatial resources that are shared and made publicly available on the Web. However, their increasingly availability has made the identification of the web signature of voluminous geospatial resources a major challenge. In this paper, we introduce our solution of a new cyberinfrastructure platform, the PolarHub, that conducts large-scale web crawling to discover distributed geospatial data and service resources and accomplish this goal efficiently and effectively. The PolarHub is built-upon a service-oriented architecture (SOA) and adopts Data Access Object (DAO)-based software design pattern to ensure the extendibility of the software system. The proposed meta-search-based seed selection and pattern-matching based crawling strategy facilitates the rapid resource identification and discovery through constraining the search scope on the Web. In addition, PolarHub introduces the use of advanced asynchronous communication strategy, which combines client-pull and server-push to ensure high efficiency of the crawling system. These unique design features of PolarHub enable a high performance, scalable, sustainable, collaborative, and interactive platform for active geospatial data discovery. Because of OGC's widespread adoption, OGC-compliant web services become the primary search target of PolarHub. Currently, the PolarHub system is up and running and is serving various scientific community that demands geospatial data. We consider PolarHub a significant contribution to the field of information retrieval and geospatial interoperability.",
keywords = "Big data access, Cyberinfrastructure, Geospatial interoperability, PolarHub, Scalability",
author = "WenWen Li and Sizhe Wang and Vidit Bhatia",
year = "2016",
month = "9",
day = "1",
doi = "10.1016/j.compenvurbsys.2016.07.004",
language = "English (US)",
volume = "59",
pages = "195--207",
journal = "Computers, Environment and Urban Systems",
issn = "0198-9715",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - PolarHub

T2 - A large-scale web crawling engine for OGC service discovery in cyberinfrastructure

AU - Li, WenWen

AU - Wang, Sizhe

AU - Bhatia, Vidit

PY - 2016/9/1

Y1 - 2016/9/1

N2 - The advancement of geospatial interoperability research has fostered the proliferation of geospatial resources that are shared and made publicly available on the Web. However, their increasingly availability has made the identification of the web signature of voluminous geospatial resources a major challenge. In this paper, we introduce our solution of a new cyberinfrastructure platform, the PolarHub, that conducts large-scale web crawling to discover distributed geospatial data and service resources and accomplish this goal efficiently and effectively. The PolarHub is built-upon a service-oriented architecture (SOA) and adopts Data Access Object (DAO)-based software design pattern to ensure the extendibility of the software system. The proposed meta-search-based seed selection and pattern-matching based crawling strategy facilitates the rapid resource identification and discovery through constraining the search scope on the Web. In addition, PolarHub introduces the use of advanced asynchronous communication strategy, which combines client-pull and server-push to ensure high efficiency of the crawling system. These unique design features of PolarHub enable a high performance, scalable, sustainable, collaborative, and interactive platform for active geospatial data discovery. Because of OGC's widespread adoption, OGC-compliant web services become the primary search target of PolarHub. Currently, the PolarHub system is up and running and is serving various scientific community that demands geospatial data. We consider PolarHub a significant contribution to the field of information retrieval and geospatial interoperability.

AB - The advancement of geospatial interoperability research has fostered the proliferation of geospatial resources that are shared and made publicly available on the Web. However, their increasingly availability has made the identification of the web signature of voluminous geospatial resources a major challenge. In this paper, we introduce our solution of a new cyberinfrastructure platform, the PolarHub, that conducts large-scale web crawling to discover distributed geospatial data and service resources and accomplish this goal efficiently and effectively. The PolarHub is built-upon a service-oriented architecture (SOA) and adopts Data Access Object (DAO)-based software design pattern to ensure the extendibility of the software system. The proposed meta-search-based seed selection and pattern-matching based crawling strategy facilitates the rapid resource identification and discovery through constraining the search scope on the Web. In addition, PolarHub introduces the use of advanced asynchronous communication strategy, which combines client-pull and server-push to ensure high efficiency of the crawling system. These unique design features of PolarHub enable a high performance, scalable, sustainable, collaborative, and interactive platform for active geospatial data discovery. Because of OGC's widespread adoption, OGC-compliant web services become the primary search target of PolarHub. Currently, the PolarHub system is up and running and is serving various scientific community that demands geospatial data. We consider PolarHub a significant contribution to the field of information retrieval and geospatial interoperability.

KW - Big data access

KW - Cyberinfrastructure

KW - Geospatial interoperability

KW - PolarHub

KW - Scalability

UR - http://www.scopus.com/inward/record.url?scp=84978758102&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84978758102&partnerID=8YFLogxK

U2 - 10.1016/j.compenvurbsys.2016.07.004

DO - 10.1016/j.compenvurbsys.2016.07.004

M3 - Article

AN - SCOPUS:84978758102

VL - 59

SP - 195

EP - 207

JO - Computers, Environment and Urban Systems

JF - Computers, Environment and Urban Systems

SN - 0198-9715

ER -