GEM: An Efficient Entity Matching Framework for Geospatial Data

Setu Shah, Vamsi Meduri, Mohamed Sarwat

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Identifying various mentions of the same real-world locations is known as spatial entity matching. GEM is an end-to-end Geospatial EM framework that matches polygon geometry entities in addition to point geometry type. Blocking, feature vector creation, and classification are the core steps of our system. GEM comprises of an efficient and lightweight blocking technique, GeoPrune, that uses the geohash encoding mechanism. We re-purpose the spatial proximality operators from Apache Sedona to create semantically rich spatial feature vectors. The classification step in GEM is a pluggable component, which consumes a unique feature vector and determines whether the geolocations match or not. We conduct experiments with three classifiers upon multiple large-scale geospatial datasets consisting of both spatial and relational attributes. GEM achieves an F-measure of 1.0 for a point x point dataset with 176k total pairs, which is 42% higher than a state-of-the-art spatial EM baseline. It achieves F-measures of 0.966 and 0.993 for the point x polygon dataset with 302M total pairs, and the polygon x polygon dataset with 16M total pairs respectively.

Original languageEnglish (US)
Title of host publication29th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL 2021
EditorsXiaofeng Meng, Fusheng Wang, Chang-Tien Lu, Yan Huang, Shashi Shekhar, Xing Xie
PublisherAssociation for Computing Machinery
Pages346-349
Number of pages4
ISBN (Electronic)9781450386647
DOIs
StatePublished - Nov 2 2021
Event29th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL 2021 - Virtual, Online, China
Duration: Nov 2 2021Nov 5 2021

Publication series

NameGIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems

Conference

Conference29th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL 2021
Country/TerritoryChina
CityVirtual, Online
Period11/2/2111/5/21

Keywords

  • Apache Sedona
  • geohash
  • spatial blocking
  • spatial entity matching

ASJC Scopus subject areas

  • Earth-Surface Processes
  • Computer Science Applications
  • Modeling and Simulation
  • Computer Graphics and Computer-Aided Design
  • Information Systems

Fingerprint

Dive into the research topics of 'GEM: An Efficient Entity Matching Framework for Geospatial Data'. Together they form a unique fingerprint.

Cite this