Enhancing Open Search in OGC CSW Catalogues to Improve Accessibility of Distribution Enhancing Open Search in OGC CSW Catalogues to Improve Accessibility of Distributed Geospatial Data (OGC) Catalogue Service for the Web (CSW) (OGC 07-006r1, 2007) is a specification developed to improve the accessibility of distributed geospatial resources. The CSW standard is also the legacy of Douglas Nebert, a pioneer of geospatial interoperability. A CSW-compliant catalog service allows the publication, indexing, searching and harvesting of many types of OGC services as well as other geospatial resources from a variety of data providers. A catalogue can be considered as a database, storing the metadata of registered resources (both data and services). To standardize representation of data semantics in the database, including classes, operations, and relationships, OGC approved ebRIM as the base information model. After its release in 2007, the OGC CSW has received widespread adoption in the geospatial community. Implementations of CSW catalogs include the ESRI Geoportal, Compusults WES portal, GEOSS CSR, and the catalog application GeoNetwork. These solutions have been widely applied to support governmental-level catalog projects, such as the US GEOSS Clearinghouse, Dutch National Georegistry, SwissTopo geocat.ch, ecoMundus (Network for Environmental Information and Data), New Zealands catalogue of publicly funded geospatial data, South African Environmental Observation Network, and many others. The CSW specification defines a standard geospatial information discovery interface and enables cross-catalogue communications. Though popular, the search mechanism of the cataloguing services is still based on full-text search - the Apache Lucene (http://lucene.apache.org) technique. This technique has the problem in identifying relevant datasets when an inexact search keyword is used. Although some researches (Fox et al. 2009; Bowers et al. 2004; Droegemeier et al. 2005; Movva et al. 2008) have been conducted to integrate thesaurus to improve and refine search process, overall search performance has not been appreciably improved (Li et al. 2012b), because these solutions rely heavily on the logical representation in the thesaurus/ontologies. When the knowledge in the ontology is not comprehensive enough to cover the user-interested queries, the search performance will be affected. To overcome the aforementioned problems, in OGC Testbed 11, we will investigate the state-of-the-art semantic analysis/machine learning techniques that will effectively mine from metadata records to identify linkage between concepts, terms to further improve the data search component of CSW. Specifically, we plan to: 1. Extend the search interface and the widely adopted ebXML Registery Information Model (ebRIM) (OASIS, 2005) of the current OGC CSW (OpenGIS Catalogue Service for Web) specification to enable the management and discovery of geospatial data leveraging their inter-semantic-connections. 2. Research and apply machine-learning approach to extract concepts from structured text-based documents (e.g. ISO 19115 metadata documents) and build inter-linkages among them. The PI Wenwen Li is an active participant in OGC Testbed 10, she has completed a CCI thread on geospatial data conflation WPS in the cloud and has also implemented a WFS for NGA DNC data. Li has been working on geospatial interoperability for several years, and is an expert in the CSW implementation. Of particular note, Li has led a team to develop a CSW catalog, which won the international competition and this solution was eventually selected as the supporting technique to build GEOSS (Global Earth Observation System of Systems) clearinghouse in 2010. Meanwhile, Li is an expert in geospatial (semantic) search; she has developed several novel solutions (Li et al. 2008a,b; Li et al. 2009; Li et al. 2010; Li et al. 2011; Li et al. 2012a,b; Li et al. 2014) on utilizing domain ontology, semantic analysis and data mining techniques to improve the performance of geospatial search in the context of geospatial catalog. Li has also led the research and development of a number of cyberinfrastructure projects, including the GEOSS clearinghouse, the USGS Arctic Spatial Data Infrastructure (Li et al. 2011), and the NSF Polar Cyberinfrastructure (Li et al. 2014). In all these projects, CSW catalog is an essential component to support the discovery of distributed geospatial data and processing services (OGC WMS, WFS, WCS, WPS, etc).
