An evaluation of sampling and full enumeration strategies for Fisher Jenks classification in big data settings

Sergio J. Rey, Philip Stephens, Jason Laura

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Large data contexts present a number of challenges to optimal choropleth map classifiers. Application of optimal classifiers to a sample of the attribute space is one proposed solution. The properties of alternative sampling-based classification methods are examined through a series of Monte Carlo simulations. The impacts of spatial autocorrelation, number of desired classes, and form of sampling are shown to have significant impacts on the accuracy of map classifications. Tradeoffs between improved speed of the sampling approaches and loss of accuracy are also considered. The results suggest the possibility of guiding the choice of classification scheme as a function of the properties of large data sets.

Original languageEnglish (US)
JournalTransactions in GIS
DOIs
StateAccepted/In press - 2016

Fingerprint

sampling
autocorrelation
simulation
evaluation
loss
attribute
method
speed

ASJC Scopus subject areas

  • Earth and Planetary Sciences(all)

Cite this

An evaluation of sampling and full enumeration strategies for Fisher Jenks classification in big data settings. / Rey, Sergio J.; Stephens, Philip; Laura, Jason.

In: Transactions in GIS, 2016.

Research output: Contribution to journalArticle

@article{69c5af6b8dfe4ebf9ca5fd02e311937c,
title = "An evaluation of sampling and full enumeration strategies for Fisher Jenks classification in big data settings",
abstract = "Large data contexts present a number of challenges to optimal choropleth map classifiers. Application of optimal classifiers to a sample of the attribute space is one proposed solution. The properties of alternative sampling-based classification methods are examined through a series of Monte Carlo simulations. The impacts of spatial autocorrelation, number of desired classes, and form of sampling are shown to have significant impacts on the accuracy of map classifications. Tradeoffs between improved speed of the sampling approaches and loss of accuracy are also considered. The results suggest the possibility of guiding the choice of classification scheme as a function of the properties of large data sets.",
author = "Rey, {Sergio J.} and Philip Stephens and Jason Laura",
year = "2016",
doi = "10.1111/tgis.12236",
language = "English (US)",
journal = "Transactions in GIS",
issn = "1361-1682",
publisher = "Wiley-Blackwell",

}

TY - JOUR

T1 - An evaluation of sampling and full enumeration strategies for Fisher Jenks classification in big data settings

AU - Rey, Sergio J.

AU - Stephens, Philip

AU - Laura, Jason

PY - 2016

Y1 - 2016

N2 - Large data contexts present a number of challenges to optimal choropleth map classifiers. Application of optimal classifiers to a sample of the attribute space is one proposed solution. The properties of alternative sampling-based classification methods are examined through a series of Monte Carlo simulations. The impacts of spatial autocorrelation, number of desired classes, and form of sampling are shown to have significant impacts on the accuracy of map classifications. Tradeoffs between improved speed of the sampling approaches and loss of accuracy are also considered. The results suggest the possibility of guiding the choice of classification scheme as a function of the properties of large data sets.

AB - Large data contexts present a number of challenges to optimal choropleth map classifiers. Application of optimal classifiers to a sample of the attribute space is one proposed solution. The properties of alternative sampling-based classification methods are examined through a series of Monte Carlo simulations. The impacts of spatial autocorrelation, number of desired classes, and form of sampling are shown to have significant impacts on the accuracy of map classifications. Tradeoffs between improved speed of the sampling approaches and loss of accuracy are also considered. The results suggest the possibility of guiding the choice of classification scheme as a function of the properties of large data sets.

UR - http://www.scopus.com/inward/record.url?scp=84994885339&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994885339&partnerID=8YFLogxK

U2 - 10.1111/tgis.12236

DO - 10.1111/tgis.12236

M3 - Article

JO - Transactions in GIS

JF - Transactions in GIS

SN - 1361-1682

ER -