A tale of two "Forests": Random Forest machine learning aids tropical Forest carbon mapping

Joseph Mascaro; Gregory P. Asner; David E. Knapp; Ty Kennedy-Bowdoin; Roberta E. Martin; Christopher Anderson; Mark Higgins; K. Dana Chadwick

doi:10.1371/journal.pone.0085993

A tale of two "Forests": Random Forest machine learning aids tropical Forest carbon mapping

Joseph Mascaro, Gregory P. Asner, David E. Knapp, Ty Kennedy-Bowdoin, Roberta E. Martin, Christopher Anderson, Mark Higgins, K. Dana Chadwick

Research output: Contribution to journal › Article › peer-review

120 Scopus citations

Abstract

Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including - in the latter case - x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called "out-of-bag"), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha^-1 when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation.

Original language	English (US)
Article number	e85993
Journal	PloS one
Volume	9
Issue number	1
DOIs	https://doi.org/10.1371/journal.pone.0085993
State	Published - Jan 28 2014
Externally published	Yes

ASJC Scopus subject areas

General Biochemistry, Genetics and Molecular Biology
General Agricultural and Biological Sciences
General

Access to Document

10.1371/journal.pone.0085993

Cite this

@article{ea2d5d819de94467a6dd91d030eec040,

title = "A tale of two {"}Forests{"}: Random Forest machine learning aids tropical Forest carbon mapping",

abstract = "Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including - in the latter case - x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called {"}out-of-bag{"}), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha-1 when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation.",

author = "Joseph Mascaro and Asner, {Gregory P.} and Knapp, {David E.} and Ty Kennedy-Bowdoin and Martin, {Roberta E.} and Christopher Anderson and Mark Higgins and Chadwick, {K. Dana}",

year = "2014",

month = jan,

day = "28",

doi = "10.1371/journal.pone.0085993",

language = "English (US)",

volume = "9",

journal = "PloS one",

issn = "1932-6203",

publisher = "Public Library of Science",

number = "1",

}

TY - JOUR

T1 - A tale of two "Forests"

T2 - Random Forest machine learning aids tropical Forest carbon mapping

AU - Mascaro, Joseph

AU - Asner, Gregory P.

AU - Knapp, David E.

AU - Kennedy-Bowdoin, Ty

AU - Martin, Roberta E.

AU - Anderson, Christopher

AU - Higgins, Mark

AU - Chadwick, K. Dana

PY - 2014/1/28

Y1 - 2014/1/28

N2 - Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including - in the latter case - x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called "out-of-bag"), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha-1 when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation.

AB - Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including - in the latter case - x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called "out-of-bag"), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha-1 when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation.

UR - http://www.scopus.com/inward/record.url?scp=84900311439&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84900311439&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0085993

DO - 10.1371/journal.pone.0085993

M3 - Article

C2 - 24489686

AN - SCOPUS:84900311439

SN - 1932-6203

VL - 9

JO - PloS one

JF - PloS one

IS - 1

M1 - e85993

ER -

A tale of two "Forests": Random Forest machine learning aids tropical Forest carbon mapping

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this