Important Features for Complex Systems with Transient Effects

Project: Research project

Project Details

Description

Complex systems generate rich, large data sets with dozens to hundreds (to even thousands) of variables and decision-makers are often challenged to learn from such information. These complex systems include dynamic, coupled natural, engineered, and human systems. The complexity that results from the high-dimensional data creates a bottleneck{both concep- tually for an understanding of the system and technically because the important variables are dicult to distinguish in prediction models. Furthermore, actionable responses require a focus to the important variables. Consequently, feature (variable) selection is becoming critical. The objective here is to confront the challenging problem with high-dimensional, redundant, dirty data, mixed categorical and numerical predictors (and responses), nonlinear models, interactions, eects of dierent magnitudes and scales, transient eects, and so forth. The plan is to also leverage the computational capabilities that are widely disseminated for a modern, comprehensive approach. This focus to variables is a powerful starting point to transform the data to knowledge. The ubiquity of sensors, nonlinearities, dirty data, and so forth, demands a exible, au- tonomous, initial learning method be available. Furthermore, an actionable analysis depends on an organized summary of the most important inputs. Preliminary work has demonstrated the feasibility of the methods. Here we propose to organize, rene, and extend these preliminary results. Plans are to improve the feature se- lection method under development through a study of important details, extend the method to fuse data with transient eects that may be temporal, spatial, or both, and obtain more detailed information as well as organize the results. We use a hybrid ensemble strategy for feature selection that combines both serial and parallel ensembles of decision trees. We estimating variable importance using a parallel ensemble of trees with the split weight re- estimation on hold-out samples (to obtain a more accurate and unbiased estimate of variable importance in each tree). We compare variable importance against articially constructed noise variables using a formal statistical test, and we iteratively remove the eect of identied important variables to allow detection of less important variables. We propose to improve these steps and add data fusion and customized learning for transient eects. The now widely disseminated computational resources are to be exploited. The goal is to focus to important variables for interpretation and actionable analyses in interdisciplinary collaborations.
StatusFinished
Effective start/end date3/16/093/15/12

Funding

  • DOD-NAVY: Office of Naval Research (ONR): $452,366.00

Fingerprint Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.