Transform Geospatial Knowledge Discovery Through an Interoperable Data Fusion Framework in the OGC Testbed 10

Project: Research project

Project Details


Transform Geospatial Knowledge Discovery Through an Interoperable Data Fusion Framework in the OGC Testbed 10 ASU Participation in OGC OWS-10 Initiative SG: title in SB- Transform Geospatial Knowledge Discovery Through an Interoperable Data Fusion Framework in 1 Overview 21 Century is the era of Big Spatial Data. The large number of Earth Observing satellites and widespread availability of sensor networks have made unprecedented amount of georeferenced information available. It was reported that more data have been created in the last three years than in all past 40,000 years (TeraData Inc.). The leading Earth Observing agency, NASA, has collected billions of gigatypes of data through its over 30 satellite missions and over 70 sensors on board. This number is kept increasing at about 5 terabytes on a daily basis; the USGS has deployed over 6000 real-time stream gauges to support river forecasts by the NOAA National Weather Service; a twin-engine Boeing jet will generate over 200 terabytes of transportation related data for a single, five-hour cross-country flight from Phoenix to Washingon, D.C. These rapidly increasing data not only facilitate scientific research to make deeper understanding of our planet, but also are changing our everyday life. Benefitting from the readily available location-aware communications and the recent advancement of social media techniques, billions of people, serving as citizen sensors, are now capable of creating vast volumes of geo-tagged data on various aspects of human life, from conversations about mundane chores of daily life and fluctuation of emotions, to discussions on major scientific breakthroughs and debate on critical political decisions. These data have significantly enriched the sources of geographic information, and help people to better understand the world beyond their geographic horizon. Among these citizen sensors, Volunteer Geographic Information (VGI), has gradually been taking the lead as the most voluminous source of geographic data and provide immense potential to radically change mapping. For example, there were 20 million geographic features in the database of Wikimapia at time of writing, which is more than many of the worlds largest gazetteers. In addition to features with explicit locational information stored in geodatabases, places are also mentioned and discussed in social media, blogs, and news forums, etc., but many of the places referenced in this way do not appear in official gazetteers. This type of unstructured geographic information is rich and abundant, with a great potential to benefit GIScience research and spatial decision-making. However, at the same time, Big Data brings us big challenges. First, the big geospatial data, created by different government agencies and institutions, or citizen sensors, are widely dispersed in the cyberspace. Second and more importantly, these data are typically produced using local standards with respect to formats, metadata structure, etc, resulting in a high degree of heterogeneity and thus hampering the goal for achieving geospatial interoperability. This problem is especially serious with the VGI data, the data structure of which tends to be more arbitrary. Third, uncertainty is ubiquitous in geospatial data in any real world applications. The same geospatial feature in different datasets may be positioned differently. This discrepancy and positioning error hindered the integration of disparate data resources for further spatial analysis. Therefore, in order to make better use of the massive amount of geospatial data for intelligent geospatial decision-making and knowledge discovery, it becomes a critical task for GIScience community to (1) tackle these problems to increase the accessibility, interoperability and accuracy of distributed geospatial data, and (2) characterize the need for new technologies, standards, policies, and institutions to enable the location aware society. (OGC) is such an organization of 480 universities, government agencies and companies to develop open standards to support interoperable solutions that 'geo-enable' the Web, location-based services and mainstream IT. As a non-profit organization, OGC has recently released its call for proposal for OGC Testbed 10 interoperability initiative to advance OGCs open framework to enhance geospatial interoperability across the geospatial community. This call involves the rapid development, testing, validation and adoption of open, community-based standards specifications. A focus topic in thread Cross-Community Interoperability in this year's call is to fuse distributed heterogeneous geospatial data from official NGA (National Geospatial Agency), USGS (United States Geological Survey) databases and the VGI sources. This is a key research topic and match perfectly with the expertise of our team, composed by Dr. Wenwen Li (PI; ASU), who is a leading researcher in geospatial cyberinfrastructure, geospatial web services and semantic interoperability; Dr. Linna Li (Co-PI; Cal State), who is an expert in geospatial data conflation and VGI; and Dr. Yaxing Wei (Co-PI), who is a senior geospatial scientist at Oak Ridge National Lab. Figure 1 demonstrates the proposed interoperable data fusion framework adapted from Annex B Figure 1-10. Green boxes with bold texts are the components (VGI component, WFS for VGI and Conflation WPS) we propose to develop during OWS-10. Light pink boxes are pools of service modules or service chains. Modules in light yellow are the databases/data sources that already existed, such as data from VGI site Flicker, or the NGA WFS. White components are other fundable deliverables which could potentially collaborate with our proposed modules for a big picture of geospatial data sharing, integration and discovery. We present a data scenario as follows: First, user contributed geospatial feature sets from VGI sources are retrieved through remote API calls. Two main VGI sites, the Flickr photo-sharing website and the open-content collaborative mapping site Wikimapia are selected, because of their richness in geographical objects and large number of contributors. Because georeferenced tweets are only a very small part (~2%) of all twitter data, Twitter will only be used as an auxiliary source to provide sentiment about a geographic object through analyzing geo-enabled tweets. After harvesting data from both VGI sites, the geographic objects will first be preprocessed to remove duplications. For geographic data from Flickr, as only point features are available, these point features will be composited into line feature or polygon feature by grouping similar semantic tags. Detailed discussions on feature combination and integration can be seen in Section 2.1.1, the VGI component. The product of the VGI components the point, line, polygon features will then be fed into the next component in the workflow to publish available features into OGC WFS to make these data interoperable with other feature data, such as NGA WFS and the USGS WFS. As the development of the WFS for VGI component is heavily dependent on the progress of the VGI component, we will adopt waterfall development model. As the geographic features are progressively processed by the VGI component, the features that the WFS services provide will be gradually made available. Note that, data from Flickr and Wikimapia will be processed and be deployed as separate WFSs rather than one WFS instance, though both WFSs may contain same features. The identification of same or similar features in different datasets will be accomplished by our third proposed component, the Conflation WPS. This service would not only perform upon the official feature sets, such as the NGA WFS and the USGS WFS, but also on the two VGI WFSs. Conflating the two VGI data sources at this stage rather than earlier within the VGI component is to ensure that each dataset is made highly interoperable as WFS, and the conflation service only needs to talk with WFS/GML rather than the original raw data format for the VGI data. The entire processing chain, from establishing the VGI component, to the deployment of OGC WFS for VGI data, to the conflation of official and VGI data sources, will yields multiple geospatial data product, such as the VGI-enriched global gazetteer and an accuracy-enhanced geospatial feature store. Users will be able to search feature-of-interest, such as place names, through a front-end user interface. This OGC-service-oriented design and implementation of essential data fusion modules have great potential to foster the standardization of VGI data services, which are rapidly growing, increasingly valuable, but are far from being interoperable. The conflation process we introduce will ensure higher data quality and contribute substantially to the seamless fusion of disparate geospatial datasets. Moreover, making conflation module OGC-compliant improves the widespread sharing and reusability of the tool, and reduces duplicated efforts in software development.
Effective start/end date10/15/138/31/14


  • Open Geospatial Consortium: $17,500.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.