An active crawler for discovering geospatial Web services and their distribution pattern - A case study of OGC Web Map Service

Wenwen Li, Chaowei Yanga, Chongjun Yang

Research output: Contribution to journalArticle

48 Scopus citations

Abstract

The increased popularity of standards for geospatial interoperability has led to an increasing number of geospatial Web services (GWSs), such as Web Map Services (WMSs), becoming publicly available on the Internet. However, finding the services in a quick and precise fashion is still a challenge. Traditional methods collect the services through centralized registries, where services can be manually registered. But the metadata of the registered services cannot be updated timely. This paper addresses the above challenges by developing an effective crawler to discover and update the services in (1) proposing an accumulated term frequency (ATF)-based conditional probability model for prioritized crawling, (2) utilizing concurrent multi-threading technique, and (3) adopting an automatic mechanism to update the metadata of identified services. Experiments show that the proposed crawler achieves good performance in both crawling efficiency and results' coverage/liveliness. In addition, an interesting finding regarding the distribution pattern of WMSs is discussed. We expect this research to contribute to automatic GWS discovery over the large-scale and dynamic World Wide Web and the promotion of operational interoperable distributed geospatial services.

Original languageEnglish (US)
Pages (from-to)1127-1147
Number of pages21
JournalInternational Journal of Geographical Information Science
Volume24
Issue number8
DOIs
StatePublished - Aug 1 2010
Externally publishedYes

    Fingerprint

Keywords

  • Accumulated term frequency (ATF)
  • Clumped distribution
  • Conditional probability
  • Crawler
  • Geospatial Web service (GWS)
  • Web Map Service (WMS)

ASJC Scopus subject areas

  • Information Systems
  • Geography, Planning and Development
  • Library and Information Sciences

Cite this