Tree species abundance predictions in a tropical agricultural landscape with a supervised classification model and imbalanced data

Sarah J. Graves, Gregory P. Asner, Roberta E. Martin, Christopher B. Anderson, Matthew S. Colgan, Leila Kalantari, Stephanie A. Bohlman

Research output: Contribution to journalArticle

31 Citations (Scopus)

Abstract

Mapping species through classification of imaging spectroscopy data is facilitating research to understand tree species distributions at increasingly greater spatial scales. Classification requires a dataset of field observations matched to the image, which will often reflect natural species distributions, resulting in an imbalanced dataset with many samples for common species and few samples for less common species. Despite the high prevalence of imbalanced datasets in multiclass species predictions, the effect on species prediction accuracy and landscape species abundance has not yet been quantified. First, we trained and assessed the accuracy of a support vector machine (SVM) model with a highly imbalanced dataset of 20 tropical species and one mixed-species class of 24 species identified in a hyperspectral image mosaic (350-2500 nm) of Panamanian farmland and secondary forest fragments. The model, with an overall accuracy of 62% ± 2.3% and F-score of 59% ± 2.7%, was applied to the full image mosaic (23,000 ha at a 2-m resolution) to produce a species prediction map, which suggested that this tropical agricultural landscape is more diverse than what has been presented in field-based studies. Second, we quantified the effect of class imbalance on model accuracy. Model assessment showed a trend where species with more samples were consistently over predicted while species with fewer samples were under predicted. Standardizing sample size reduced model accuracy, but also reduced the level of species over- and under-prediction. This study advances operational species mapping of diverse tropical landscapes by detailing the effect of imbalanced data on classification accuracy and providing estimates of tree species abundance in an agricultural landscape. Species maps using data and methods presented here can be used in landscape analyses of species distributions to understand human or environmental effects, in addition to focusing conservation efforts in areas with high tree cover and diversity.

Original languageEnglish (US)
Article number161
JournalRemote Sensing
Volume8
Issue number2
DOIs
StatePublished - Jan 1 2016
Externally publishedYes

Fingerprint

image classification
agricultural land
prediction
secondary forest

Keywords

  • Agriculture
  • Class imbalance
  • Imaging spectroscopy
  • Operational species mapping
  • Support vector machine
  • Tropics

ASJC Scopus subject areas

  • Earth and Planetary Sciences(all)

Cite this

Tree species abundance predictions in a tropical agricultural landscape with a supervised classification model and imbalanced data. / Graves, Sarah J.; Asner, Gregory P.; Martin, Roberta E.; Anderson, Christopher B.; Colgan, Matthew S.; Kalantari, Leila; Bohlman, Stephanie A.

In: Remote Sensing, Vol. 8, No. 2, 161, 01.01.2016.

Research output: Contribution to journalArticle

Graves, Sarah J. ; Asner, Gregory P. ; Martin, Roberta E. ; Anderson, Christopher B. ; Colgan, Matthew S. ; Kalantari, Leila ; Bohlman, Stephanie A. / Tree species abundance predictions in a tropical agricultural landscape with a supervised classification model and imbalanced data. In: Remote Sensing. 2016 ; Vol. 8, No. 2.
@article{fb1d2f906cbc4a0f84e6a8b3e1d97388,
title = "Tree species abundance predictions in a tropical agricultural landscape with a supervised classification model and imbalanced data",
abstract = "Mapping species through classification of imaging spectroscopy data is facilitating research to understand tree species distributions at increasingly greater spatial scales. Classification requires a dataset of field observations matched to the image, which will often reflect natural species distributions, resulting in an imbalanced dataset with many samples for common species and few samples for less common species. Despite the high prevalence of imbalanced datasets in multiclass species predictions, the effect on species prediction accuracy and landscape species abundance has not yet been quantified. First, we trained and assessed the accuracy of a support vector machine (SVM) model with a highly imbalanced dataset of 20 tropical species and one mixed-species class of 24 species identified in a hyperspectral image mosaic (350-2500 nm) of Panamanian farmland and secondary forest fragments. The model, with an overall accuracy of 62{\%} ± 2.3{\%} and F-score of 59{\%} ± 2.7{\%}, was applied to the full image mosaic (23,000 ha at a 2-m resolution) to produce a species prediction map, which suggested that this tropical agricultural landscape is more diverse than what has been presented in field-based studies. Second, we quantified the effect of class imbalance on model accuracy. Model assessment showed a trend where species with more samples were consistently over predicted while species with fewer samples were under predicted. Standardizing sample size reduced model accuracy, but also reduced the level of species over- and under-prediction. This study advances operational species mapping of diverse tropical landscapes by detailing the effect of imbalanced data on classification accuracy and providing estimates of tree species abundance in an agricultural landscape. Species maps using data and methods presented here can be used in landscape analyses of species distributions to understand human or environmental effects, in addition to focusing conservation efforts in areas with high tree cover and diversity.",
keywords = "Agriculture, Class imbalance, Imaging spectroscopy, Operational species mapping, Support vector machine, Tropics",
author = "Graves, {Sarah J.} and Asner, {Gregory P.} and Martin, {Roberta E.} and Anderson, {Christopher B.} and Colgan, {Matthew S.} and Leila Kalantari and Bohlman, {Stephanie A.}",
year = "2016",
month = "1",
day = "1",
doi = "10.3390/rs8020161",
language = "English (US)",
volume = "8",
journal = "Remote Sensing",
issn = "2072-4292",
publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",
number = "2",

}

TY - JOUR

T1 - Tree species abundance predictions in a tropical agricultural landscape with a supervised classification model and imbalanced data

AU - Graves, Sarah J.

AU - Asner, Gregory P.

AU - Martin, Roberta E.

AU - Anderson, Christopher B.

AU - Colgan, Matthew S.

AU - Kalantari, Leila

AU - Bohlman, Stephanie A.

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Mapping species through classification of imaging spectroscopy data is facilitating research to understand tree species distributions at increasingly greater spatial scales. Classification requires a dataset of field observations matched to the image, which will often reflect natural species distributions, resulting in an imbalanced dataset with many samples for common species and few samples for less common species. Despite the high prevalence of imbalanced datasets in multiclass species predictions, the effect on species prediction accuracy and landscape species abundance has not yet been quantified. First, we trained and assessed the accuracy of a support vector machine (SVM) model with a highly imbalanced dataset of 20 tropical species and one mixed-species class of 24 species identified in a hyperspectral image mosaic (350-2500 nm) of Panamanian farmland and secondary forest fragments. The model, with an overall accuracy of 62% ± 2.3% and F-score of 59% ± 2.7%, was applied to the full image mosaic (23,000 ha at a 2-m resolution) to produce a species prediction map, which suggested that this tropical agricultural landscape is more diverse than what has been presented in field-based studies. Second, we quantified the effect of class imbalance on model accuracy. Model assessment showed a trend where species with more samples were consistently over predicted while species with fewer samples were under predicted. Standardizing sample size reduced model accuracy, but also reduced the level of species over- and under-prediction. This study advances operational species mapping of diverse tropical landscapes by detailing the effect of imbalanced data on classification accuracy and providing estimates of tree species abundance in an agricultural landscape. Species maps using data and methods presented here can be used in landscape analyses of species distributions to understand human or environmental effects, in addition to focusing conservation efforts in areas with high tree cover and diversity.

AB - Mapping species through classification of imaging spectroscopy data is facilitating research to understand tree species distributions at increasingly greater spatial scales. Classification requires a dataset of field observations matched to the image, which will often reflect natural species distributions, resulting in an imbalanced dataset with many samples for common species and few samples for less common species. Despite the high prevalence of imbalanced datasets in multiclass species predictions, the effect on species prediction accuracy and landscape species abundance has not yet been quantified. First, we trained and assessed the accuracy of a support vector machine (SVM) model with a highly imbalanced dataset of 20 tropical species and one mixed-species class of 24 species identified in a hyperspectral image mosaic (350-2500 nm) of Panamanian farmland and secondary forest fragments. The model, with an overall accuracy of 62% ± 2.3% and F-score of 59% ± 2.7%, was applied to the full image mosaic (23,000 ha at a 2-m resolution) to produce a species prediction map, which suggested that this tropical agricultural landscape is more diverse than what has been presented in field-based studies. Second, we quantified the effect of class imbalance on model accuracy. Model assessment showed a trend where species with more samples were consistently over predicted while species with fewer samples were under predicted. Standardizing sample size reduced model accuracy, but also reduced the level of species over- and under-prediction. This study advances operational species mapping of diverse tropical landscapes by detailing the effect of imbalanced data on classification accuracy and providing estimates of tree species abundance in an agricultural landscape. Species maps using data and methods presented here can be used in landscape analyses of species distributions to understand human or environmental effects, in addition to focusing conservation efforts in areas with high tree cover and diversity.

KW - Agriculture

KW - Class imbalance

KW - Imaging spectroscopy

KW - Operational species mapping

KW - Support vector machine

KW - Tropics

UR - http://www.scopus.com/inward/record.url?scp=84962601050&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962601050&partnerID=8YFLogxK

U2 - 10.3390/rs8020161

DO - 10.3390/rs8020161

M3 - Article

AN - SCOPUS:84962601050

VL - 8

JO - Remote Sensing

JF - Remote Sensing

SN - 2072-4292

IS - 2

M1 - 161

ER -