1 Citation (Scopus)

Abstract

Geographically Weighted Regression (GWR) is a widely used tool for exploring spatial heterogeneity of processes over geographic space. GWR computes location-specific parameter estimates, which makes its calibration process computationally intensive. The maximum number of data points that can be handled by current open-source GWR software is approximately 15,000 observations on a standard desktop. In the era of big data, this places a severe limitation on the use of GWR. To overcome this limitation, we propose a highly scalable, open-source FastGWR implementation based on Python and the Message Passing Interface (MPI) that scales to the order of millions of observations. FastGWR optimizes memory usage along with parallelization to boost performance significantly. To illustrate the performance of FastGWR, a hedonic house price model is calibrated on approximately 1.3 million single-family residential properties from a Zillow dataset for the city of Los Angeles, which is the first effort to apply GWR to a dataset of this size. The results show that FastGWR scales linearly as the number of cores within the High-Performance Computing (HPC) environment increases. It also outperforms currently available open-sourced GWR software packages with drastic speed reductions–up to thousands of times faster–on a standard desktop.

Original languageEnglish (US)
JournalInternational Journal of Geographical Information Science
DOIs
StateAccepted/In press - Jan 1 2018

Fingerprint

software
regression
Message passing
Software packages
Calibration
calibration
Data storage equipment
performance
Big data
family
city
speed
price
parameter

Keywords

  • Geographically Weighted Regression
  • GWR
  • parallel computing
  • spatial analysis
  • statistical software

ASJC Scopus subject areas

  • Information Systems
  • Geography, Planning and Development
  • Library and Information Sciences

Cite this

@article{b636935bafc44f66962bf2fde4e97e9b,
title = "Fast Geographically Weighted Regression (FastGWR): a scalable algorithm to investigate spatial process heterogeneity in millions of observations",
abstract = "Geographically Weighted Regression (GWR) is a widely used tool for exploring spatial heterogeneity of processes over geographic space. GWR computes location-specific parameter estimates, which makes its calibration process computationally intensive. The maximum number of data points that can be handled by current open-source GWR software is approximately 15,000 observations on a standard desktop. In the era of big data, this places a severe limitation on the use of GWR. To overcome this limitation, we propose a highly scalable, open-source FastGWR implementation based on Python and the Message Passing Interface (MPI) that scales to the order of millions of observations. FastGWR optimizes memory usage along with parallelization to boost performance significantly. To illustrate the performance of FastGWR, a hedonic house price model is calibrated on approximately 1.3 million single-family residential properties from a Zillow dataset for the city of Los Angeles, which is the first effort to apply GWR to a dataset of this size. The results show that FastGWR scales linearly as the number of cores within the High-Performance Computing (HPC) environment increases. It also outperforms currently available open-sourced GWR software packages with drastic speed reductions–up to thousands of times faster–on a standard desktop.",
keywords = "Geographically Weighted Regression, GWR, parallel computing, spatial analysis, statistical software",
author = "Ziqi Li and Stewart Fotheringham and WenWen Li and Taylor Oshan",
year = "2018",
month = "1",
day = "1",
doi = "10.1080/13658816.2018.1521523",
language = "English (US)",
journal = "International Journal of Geographical Information Science",
issn = "1365-8816",
publisher = "Taylor and Francis Ltd.",

}

TY - JOUR

T1 - Fast Geographically Weighted Regression (FastGWR)

T2 - a scalable algorithm to investigate spatial process heterogeneity in millions of observations

AU - Li, Ziqi

AU - Fotheringham, Stewart

AU - Li, WenWen

AU - Oshan, Taylor

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Geographically Weighted Regression (GWR) is a widely used tool for exploring spatial heterogeneity of processes over geographic space. GWR computes location-specific parameter estimates, which makes its calibration process computationally intensive. The maximum number of data points that can be handled by current open-source GWR software is approximately 15,000 observations on a standard desktop. In the era of big data, this places a severe limitation on the use of GWR. To overcome this limitation, we propose a highly scalable, open-source FastGWR implementation based on Python and the Message Passing Interface (MPI) that scales to the order of millions of observations. FastGWR optimizes memory usage along with parallelization to boost performance significantly. To illustrate the performance of FastGWR, a hedonic house price model is calibrated on approximately 1.3 million single-family residential properties from a Zillow dataset for the city of Los Angeles, which is the first effort to apply GWR to a dataset of this size. The results show that FastGWR scales linearly as the number of cores within the High-Performance Computing (HPC) environment increases. It also outperforms currently available open-sourced GWR software packages with drastic speed reductions–up to thousands of times faster–on a standard desktop.

AB - Geographically Weighted Regression (GWR) is a widely used tool for exploring spatial heterogeneity of processes over geographic space. GWR computes location-specific parameter estimates, which makes its calibration process computationally intensive. The maximum number of data points that can be handled by current open-source GWR software is approximately 15,000 observations on a standard desktop. In the era of big data, this places a severe limitation on the use of GWR. To overcome this limitation, we propose a highly scalable, open-source FastGWR implementation based on Python and the Message Passing Interface (MPI) that scales to the order of millions of observations. FastGWR optimizes memory usage along with parallelization to boost performance significantly. To illustrate the performance of FastGWR, a hedonic house price model is calibrated on approximately 1.3 million single-family residential properties from a Zillow dataset for the city of Los Angeles, which is the first effort to apply GWR to a dataset of this size. The results show that FastGWR scales linearly as the number of cores within the High-Performance Computing (HPC) environment increases. It also outperforms currently available open-sourced GWR software packages with drastic speed reductions–up to thousands of times faster–on a standard desktop.

KW - Geographically Weighted Regression

KW - GWR

KW - parallel computing

KW - spatial analysis

KW - statistical software

UR - http://www.scopus.com/inward/record.url?scp=85054525344&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054525344&partnerID=8YFLogxK

U2 - 10.1080/13658816.2018.1521523

DO - 10.1080/13658816.2018.1521523

M3 - Article

JO - International Journal of Geographical Information Science

JF - International Journal of Geographical Information Science

SN - 1365-8816

ER -