TY - JOUR
T1 - Regional ensemble modeling reduces uncertainty for digital soil mapping
AU - Brungard, Colby
AU - Nauman, Travis
AU - Duniway, Mike
AU - Veblen, Kari
AU - Nehring, Kyle
AU - White, David
AU - Salley, Shawn
AU - Anchang, Julius
N1 - Funding Information:
Contact the corresponding author, Dr. Colby Brungard who is responsible for coordinating the release of any data associated with this product, if you would like access to the resulting spatial predictions. This work was supported by the Utah Division of Wildlife Resources grant # 160832 , the Bureau of Land Management , US Geological Survey Ecosystems Mission Area , and the USDA National Institute of Food and Agriculture . USDA: Mention of a trade name, proprietary product, or vendor is for information only and does not guarantee or warrant the product by the US Government and does not imply its approval to the exclusion of other products or vendors that may also be suitable. The USDA is an equal opportunity provider and employer. USGS: Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Publisher Copyright:
© 2021
PY - 2021/9/1
Y1 - 2021/9/1
N2 - Recent country and continental-scale digital soil mapping efforts have used a single model to predict soil properties across large regions. However, different ecophysiographic regions within large-extent areas are likely to have different soil-landscape relationships so models built specifically for these regions may more accurately capture these relationships relative to a ‘global’ model. We ask the question: Is a single ‘global’ model sufficient or are regionally-specific models useful for accurate digital soil mapping? We test this question by modeling soil depth classes across the 432,000 km2 upper Colorado River Basin in the Western USA using a single global model, multiple ecophysiographic models, and ensembles of the ecophysiographic models. Effective soil depth class observations (n = 12,194) were derived from multiple soil databases. Fifty-seven environmental covariates were derived from a 30 m digital elevation model, climate data, satellite imagery, and aeroradiometric data. Three independent land classifications were used to stratify the area. Two expert-derived land classifications, USDA Major Land Resource Areas (MLRA) and US-EPA Level III ecoregions, divided the study area into multiple ecophysiographic regions based on vegetation and broad-scale physiographic differences. The third land classification divided the study area into broad landforms. Soil depth observations were split into separate training (n = 10,470) and validation (n = 1,724) datasets. First, a ‘global’ random forest model was used to model soil depth classes using all training observations and covariates. ‘Global’ denotes a model built with all training data across the extent of the area, not a model at world extent. Second, the land classifications were used to subset the observations into ecophysiographic sub-datasets and random forest models were refit for each region. Models fit by ecophysiographic region are referred to as regional models. Thirdly, predictions from each regional model were fused into regional-ensemble models. Accuracy, Brier scores, and Shannon's entropy were used to compare model accuracy and uncertainty. Regional ecophysiographic models were also compared to models built for geographic areas that were defined solely to be approximately equal in area. Training dataset density and the imbalance ratio were investigated to determine if data characteristics influenced regional accuracy/uncertainty metrics. Accuracy for the global model using the validation set was 62.8%. Regional model accuracies ranged between 56.1% and 75.0%. We found: 1) useful inter-regional differences in global model accuracy were revealed when the global model was validated by region, 2) no consistent relationship between training observation density and accuracy/uncertainty metrics, 3) no meaningful differences in accuracy and uncertainty metrics between physiographic and geographic regions, 4) ensembles of regionally-specific models were approximately as accurate as global models, and 5) both region-specific models and ensembles of regional models were less uncertain than the global model. Overall, we recommend the use of soil depth class predictions made from MLRA regional ensemble models because this prediction had higher accuracy than the ecoregion ensemble model prediction, but lower uncertainty than both the global model and the landform ensemble model predictions. We answer our question: Ensembles of regionally-specific models are approximately as accurate as global models, but result in less uncertainty.
AB - Recent country and continental-scale digital soil mapping efforts have used a single model to predict soil properties across large regions. However, different ecophysiographic regions within large-extent areas are likely to have different soil-landscape relationships so models built specifically for these regions may more accurately capture these relationships relative to a ‘global’ model. We ask the question: Is a single ‘global’ model sufficient or are regionally-specific models useful for accurate digital soil mapping? We test this question by modeling soil depth classes across the 432,000 km2 upper Colorado River Basin in the Western USA using a single global model, multiple ecophysiographic models, and ensembles of the ecophysiographic models. Effective soil depth class observations (n = 12,194) were derived from multiple soil databases. Fifty-seven environmental covariates were derived from a 30 m digital elevation model, climate data, satellite imagery, and aeroradiometric data. Three independent land classifications were used to stratify the area. Two expert-derived land classifications, USDA Major Land Resource Areas (MLRA) and US-EPA Level III ecoregions, divided the study area into multiple ecophysiographic regions based on vegetation and broad-scale physiographic differences. The third land classification divided the study area into broad landforms. Soil depth observations were split into separate training (n = 10,470) and validation (n = 1,724) datasets. First, a ‘global’ random forest model was used to model soil depth classes using all training observations and covariates. ‘Global’ denotes a model built with all training data across the extent of the area, not a model at world extent. Second, the land classifications were used to subset the observations into ecophysiographic sub-datasets and random forest models were refit for each region. Models fit by ecophysiographic region are referred to as regional models. Thirdly, predictions from each regional model were fused into regional-ensemble models. Accuracy, Brier scores, and Shannon's entropy were used to compare model accuracy and uncertainty. Regional ecophysiographic models were also compared to models built for geographic areas that were defined solely to be approximately equal in area. Training dataset density and the imbalance ratio were investigated to determine if data characteristics influenced regional accuracy/uncertainty metrics. Accuracy for the global model using the validation set was 62.8%. Regional model accuracies ranged between 56.1% and 75.0%. We found: 1) useful inter-regional differences in global model accuracy were revealed when the global model was validated by region, 2) no consistent relationship between training observation density and accuracy/uncertainty metrics, 3) no meaningful differences in accuracy and uncertainty metrics between physiographic and geographic regions, 4) ensembles of regionally-specific models were approximately as accurate as global models, and 5) both region-specific models and ensembles of regional models were less uncertain than the global model. Overall, we recommend the use of soil depth class predictions made from MLRA regional ensemble models because this prediction had higher accuracy than the ecoregion ensemble model prediction, but lower uncertainty than both the global model and the landform ensemble model predictions. We answer our question: Ensembles of regionally-specific models are approximately as accurate as global models, but result in less uncertainty.
KW - Ecoregions
KW - Landforms
KW - Machine learning
KW - Major land resource areas
KW - Regionalization
KW - Soil survey
UR - http://www.scopus.com/inward/record.url?scp=85103429618&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85103429618&partnerID=8YFLogxK
U2 - 10.1016/j.geoderma.2021.114998
DO - 10.1016/j.geoderma.2021.114998
M3 - Article
AN - SCOPUS:85103429618
VL - 397
JO - Geoderma
JF - Geoderma
SN - 0016-7061
M1 - 114998
ER -