Public health surveillance with ensemble-based supervised learning

Saylisse Dávila; George Runger; Eugene Tuv

doi:10.1080/0740817X.2014.894806

Public health surveillance with ensemble-based supervised learning

Saylisse Dávila, George Runger, Eugene Tuv

Research output: Contribution to journal › Article › peer-review

4 Scopus citations

Abstract

Public health surveillance is a special case of the general problem that monitors counts (or rates) of events for changes. Modern data complements event counts with many additional measurements (such as geographic, demographic, and others) that comprise high-dimensional covariates. This leads to an important challenge to detect a change that only occurs within a region, initially unspecified, defined by these covariates. Current methods used to handle covariate information are limited to low-dimensional data. The approach presented in this article transforms the problem to supervised learning, so that an appropriate learner and signal criteria can then be defined. A feature selection algorithm is used to identify covariates that contribute to a model (either individually or through interactions) and this is used to generate a signal based on formal statistical inference. A measure of statistical significance is also included to control false alarms. Graphical plots are used to isolate change locations in covariate space. Results on a variety of simulated examples are provided.

Original language	English (US)
Pages (from-to)	770-789
Number of pages	20
Journal	IIE Transactions (Institute of Industrial Engineers)
Volume	46
Issue number	8
DOIs	https://doi.org/10.1080/0740817X.2014.894806
State	Published - Aug 3 2014

Keywords

Epidemiology
data mining
decision trees
ensembles
feature selection

ASJC Scopus subject areas

Industrial and Manufacturing Engineering

Access to Document

10.1080/0740817X.2014.894806

Cite this

@article{0395e47218f247e68001829ce165aee7,

title = "Public health surveillance with ensemble-based supervised learning",

abstract = "Public health surveillance is a special case of the general problem that monitors counts (or rates) of events for changes. Modern data complements event counts with many additional measurements (such as geographic, demographic, and others) that comprise high-dimensional covariates. This leads to an important challenge to detect a change that only occurs within a region, initially unspecified, defined by these covariates. Current methods used to handle covariate information are limited to low-dimensional data. The approach presented in this article transforms the problem to supervised learning, so that an appropriate learner and signal criteria can then be defined. A feature selection algorithm is used to identify covariates that contribute to a model (either individually or through interactions) and this is used to generate a signal based on formal statistical inference. A measure of statistical significance is also included to control false alarms. Graphical plots are used to isolate change locations in covariate space. Results on a variety of simulated examples are provided.",

keywords = "Epidemiology, data mining, decision trees, ensembles, feature selection",

author = "Saylisse D{\'a}vila and George Runger and Eugene Tuv",

note = "Funding Information: This material is based upon work supported by the National Science Foundation under Grant 0743160 and the Office of Naval Research under grant N000140910656.",

year = "2014",

month = aug,

day = "3",

doi = "10.1080/0740817X.2014.894806",

language = "English (US)",

volume = "46",

pages = "770--789",

journal = "IIE Transactions (Institute of Industrial Engineers)",

issn = "0740-817X",

publisher = "Taylor and Francis Ltd.",

number = "8",

}

TY - JOUR

T1 - Public health surveillance with ensemble-based supervised learning

AU - Dávila, Saylisse

AU - Runger, George

AU - Tuv, Eugene

N1 - Funding Information: This material is based upon work supported by the National Science Foundation under Grant 0743160 and the Office of Naval Research under grant N000140910656.

PY - 2014/8/3

Y1 - 2014/8/3

N2 - Public health surveillance is a special case of the general problem that monitors counts (or rates) of events for changes. Modern data complements event counts with many additional measurements (such as geographic, demographic, and others) that comprise high-dimensional covariates. This leads to an important challenge to detect a change that only occurs within a region, initially unspecified, defined by these covariates. Current methods used to handle covariate information are limited to low-dimensional data. The approach presented in this article transforms the problem to supervised learning, so that an appropriate learner and signal criteria can then be defined. A feature selection algorithm is used to identify covariates that contribute to a model (either individually or through interactions) and this is used to generate a signal based on formal statistical inference. A measure of statistical significance is also included to control false alarms. Graphical plots are used to isolate change locations in covariate space. Results on a variety of simulated examples are provided.

AB - Public health surveillance is a special case of the general problem that monitors counts (or rates) of events for changes. Modern data complements event counts with many additional measurements (such as geographic, demographic, and others) that comprise high-dimensional covariates. This leads to an important challenge to detect a change that only occurs within a region, initially unspecified, defined by these covariates. Current methods used to handle covariate information are limited to low-dimensional data. The approach presented in this article transforms the problem to supervised learning, so that an appropriate learner and signal criteria can then be defined. A feature selection algorithm is used to identify covariates that contribute to a model (either individually or through interactions) and this is used to generate a signal based on formal statistical inference. A measure of statistical significance is also included to control false alarms. Graphical plots are used to isolate change locations in covariate space. Results on a variety of simulated examples are provided.

KW - Epidemiology

KW - data mining

KW - decision trees

KW - ensembles

KW - feature selection

UR - http://www.scopus.com/inward/record.url?scp=84899848105&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84899848105&partnerID=8YFLogxK

U2 - 10.1080/0740817X.2014.894806

DO - 10.1080/0740817X.2014.894806

M3 - Article

AN - SCOPUS:84899848105

SN - 0740-817X

VL - 46

SP - 770

EP - 789

JO - IIE Transactions (Institute of Industrial Engineers)

JF - IIE Transactions (Institute of Industrial Engineers)

IS - 8

ER -

Public health surveillance with ensemble-based supervised learning

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this