TY - JOUR
T1 - Improved prediction of tree species richness and interpretability of environmental drivers using a machine learning approach
AU - Brugere, Lian
AU - Kwon, Youngsang
AU - Frazier, Amy E.
AU - Kedron, Peter
N1 - Funding Information:
L. Brugere was supported by the Department of Earth Sciences at the University of Memphis (UofM). The authors appreciate the generous use of the High-Performance Computing Center and the computer lab at Center for Applied Earth Science and Engineering Research of the UofM. A. Frazier is supported by United States National Science Foundation grant # 2225076 .
Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2023/7/1
Y1 - 2023/7/1
N2 - Biodiversity is in decline globally and predicting species diversity is critically important if current trends are to be reversed. Tree species richness (TSR) has long been a key measure of biodiversity, but considerable uncertainties exist in current models, particularly given the classic statistical assumptions and poor ecological interpretability of machine learning outcomes. Here, we test several ecologically interpretable machine learning approaches to predict TSR and interpret the driving environmental factors in the continental United States. We develop two artificial neural networks (ANN) and one random forest (RF) model to predict TSR using Forest Inventory and Analysis data and 20 environmental covariates and compare them to a classic generalized linear model (GLM). Models were evaluated on an independent, unseen testing dataset using R2 and Mean Absolute Error (MAE) and residual spatial autocorrelation analysis. An Interpretable Machine Learning approach, SHapley Additive exPlanations (SHAP), was adopted to explain the major environmental factors driving TSR. Compared to a baseline GLM (R2 = 0.7; MAE = 4.7), the ANN and RF models achieved R2 greater than 0.9 and MAE<3.1. Additionally, the ANN and RF models produced less spatially clustered TSR residuals than the GLM. SHAP analysis suggested that TSR is best predicted by Aridity Index, Forest Area, Altitude, Mean Precipitation of the Driest Quarter and Mean Annual Temperature. SHAP further revealed a non-linear relationship of environmental covariates with TSR and complex interactions that were not revealed by the GLM. The study highlights the need for conservation efforts of forest areas and reducing precipitation-related physiological stress on tree species in low forested but arid regions. The machine learning approach used here is transferrable for studies of biodiversity for other organisms or prediction of TSR under future climatic scenarios.
AB - Biodiversity is in decline globally and predicting species diversity is critically important if current trends are to be reversed. Tree species richness (TSR) has long been a key measure of biodiversity, but considerable uncertainties exist in current models, particularly given the classic statistical assumptions and poor ecological interpretability of machine learning outcomes. Here, we test several ecologically interpretable machine learning approaches to predict TSR and interpret the driving environmental factors in the continental United States. We develop two artificial neural networks (ANN) and one random forest (RF) model to predict TSR using Forest Inventory and Analysis data and 20 environmental covariates and compare them to a classic generalized linear model (GLM). Models were evaluated on an independent, unseen testing dataset using R2 and Mean Absolute Error (MAE) and residual spatial autocorrelation analysis. An Interpretable Machine Learning approach, SHapley Additive exPlanations (SHAP), was adopted to explain the major environmental factors driving TSR. Compared to a baseline GLM (R2 = 0.7; MAE = 4.7), the ANN and RF models achieved R2 greater than 0.9 and MAE<3.1. Additionally, the ANN and RF models produced less spatially clustered TSR residuals than the GLM. SHAP analysis suggested that TSR is best predicted by Aridity Index, Forest Area, Altitude, Mean Precipitation of the Driest Quarter and Mean Annual Temperature. SHAP further revealed a non-linear relationship of environmental covariates with TSR and complex interactions that were not revealed by the GLM. The study highlights the need for conservation efforts of forest areas and reducing precipitation-related physiological stress on tree species in low forested but arid regions. The machine learning approach used here is transferrable for studies of biodiversity for other organisms or prediction of TSR under future climatic scenarios.
KW - Deep learning
KW - FIA
KW - Generalized linear model
KW - Neural networks
KW - Random forest
KW - Tree species richness modeling
UR - http://www.scopus.com/inward/record.url?scp=85153101798&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85153101798&partnerID=8YFLogxK
U2 - 10.1016/j.foreco.2023.120972
DO - 10.1016/j.foreco.2023.120972
M3 - Article
AN - SCOPUS:85153101798
SN - 0378-1127
VL - 539
JO - Forest Ecology and Management
JF - Forest Ecology and Management
M1 - 120972
ER -