Abstract
Public health surveillance is a special case of the general problem that monitors counts (or rates) of events for changes. Modern data complements event counts with many additional measurements (such as geographic, demographic, and others) that comprise high-dimensional covariates. This leads to an important challenge to detect a change that only occurs within a region, initially unspecified, defined by these covariates. Current methods used to handle covariate information are limited to low-dimensional data. The approach presented in this article transforms the problem to supervised learning, so that an appropriate learner and signal criteria can then be defined. A feature selection algorithm is used to identify covariates that contribute to a model (either individually or through interactions) and this is used to generate a signal based on formal statistical inference. A measure of statistical significance is also included to control false alarms. Graphical plots are used to isolate change locations in covariate space. Results on a variety of simulated examples are provided.
Original language | English (US) |
---|---|
Pages (from-to) | 770-789 |
Number of pages | 20 |
Journal | IIE Transactions (Institute of Industrial Engineers) |
Volume | 46 |
Issue number | 8 |
DOIs | |
State | Published - Aug 3 2014 |
Keywords
- Epidemiology
- data mining
- decision trees
- ensembles
- feature selection
ASJC Scopus subject areas
- Industrial and Manufacturing Engineering