Abstract

A common goal of most public health surveillance programs is to detect disease outbreaks before they become a threat to the public. In this work, we propose a novel and computationally feasible approach to this problem. By tackling public health surveillance with a supervised learner that can handle high-dimensional, mixed-type data, and even missing values; we developed a method that can accurately detect changes in disease incidence rates, even in high-dimensions. We use probability estimates from random forests to develop an alternative signal criterion that can detect when there is a concentration of disease incidences within a particular geographic region and/or subpopulation that is unlikely to have occurred by chance. A series of simulated experiments suggest this method is able to accurately detect the presence of disease clusters, on average, 88% of time. Simulated results also suggest a feasible combination of the method's parameters that can significantly reduce the computational complexity of the method to an average system time of 1.9 minutes (s = 0.48 minutes) for a data set containing 1,000 incidences running on an Intel Core i5 processor.

Original languageEnglish (US)
Title of host publicationIIE Annual Conference and Expo 2013
PublisherInstitute of Industrial Engineers
Pages2551-2560
Number of pages10
StatePublished - 2013
EventIIE Annual Conference and Expo 2013 - San Juan, Puerto Rico
Duration: May 18 2013May 22 2013

Other

OtherIIE Annual Conference and Expo 2013
CountryPuerto Rico
CitySan Juan
Period5/18/135/22/13

Fingerprint

Public health
Computational complexity
Experiments

ASJC Scopus subject areas

  • Industrial and Manufacturing Engineering

Cite this

Dávila, S., Runger, G., Tuv, E., & Pacheco, P. (2013). High-dimensional disease outbreak detection using tree-based ensembles. In IIE Annual Conference and Expo 2013 (pp. 2551-2560). Institute of Industrial Engineers.

High-dimensional disease outbreak detection using tree-based ensembles. / Dávila, Saylisse; Runger, George; Tuv, Eugene; Pacheco, Paola.

IIE Annual Conference and Expo 2013. Institute of Industrial Engineers, 2013. p. 2551-2560.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dávila, S, Runger, G, Tuv, E & Pacheco, P 2013, High-dimensional disease outbreak detection using tree-based ensembles. in IIE Annual Conference and Expo 2013. Institute of Industrial Engineers, pp. 2551-2560, IIE Annual Conference and Expo 2013, San Juan, Puerto Rico, 5/18/13.
Dávila S, Runger G, Tuv E, Pacheco P. High-dimensional disease outbreak detection using tree-based ensembles. In IIE Annual Conference and Expo 2013. Institute of Industrial Engineers. 2013. p. 2551-2560
Dávila, Saylisse ; Runger, George ; Tuv, Eugene ; Pacheco, Paola. / High-dimensional disease outbreak detection using tree-based ensembles. IIE Annual Conference and Expo 2013. Institute of Industrial Engineers, 2013. pp. 2551-2560
@inproceedings{7d8cf6730d7b491c8ef0da04039b5207,
title = "High-dimensional disease outbreak detection using tree-based ensembles",
abstract = "A common goal of most public health surveillance programs is to detect disease outbreaks before they become a threat to the public. In this work, we propose a novel and computationally feasible approach to this problem. By tackling public health surveillance with a supervised learner that can handle high-dimensional, mixed-type data, and even missing values; we developed a method that can accurately detect changes in disease incidence rates, even in high-dimensions. We use probability estimates from random forests to develop an alternative signal criterion that can detect when there is a concentration of disease incidences within a particular geographic region and/or subpopulation that is unlikely to have occurred by chance. A series of simulated experiments suggest this method is able to accurately detect the presence of disease clusters, on average, 88{\%} of time. Simulated results also suggest a feasible combination of the method's parameters that can significantly reduce the computational complexity of the method to an average system time of 1.9 minutes (s = 0.48 minutes) for a data set containing 1,000 incidences running on an Intel Core i5 processor.",
author = "Saylisse D{\'a}vila and George Runger and Eugene Tuv and Paola Pacheco",
year = "2013",
language = "English (US)",
pages = "2551--2560",
booktitle = "IIE Annual Conference and Expo 2013",
publisher = "Institute of Industrial Engineers",

}

TY - GEN

T1 - High-dimensional disease outbreak detection using tree-based ensembles

AU - Dávila, Saylisse

AU - Runger, George

AU - Tuv, Eugene

AU - Pacheco, Paola

PY - 2013

Y1 - 2013

N2 - A common goal of most public health surveillance programs is to detect disease outbreaks before they become a threat to the public. In this work, we propose a novel and computationally feasible approach to this problem. By tackling public health surveillance with a supervised learner that can handle high-dimensional, mixed-type data, and even missing values; we developed a method that can accurately detect changes in disease incidence rates, even in high-dimensions. We use probability estimates from random forests to develop an alternative signal criterion that can detect when there is a concentration of disease incidences within a particular geographic region and/or subpopulation that is unlikely to have occurred by chance. A series of simulated experiments suggest this method is able to accurately detect the presence of disease clusters, on average, 88% of time. Simulated results also suggest a feasible combination of the method's parameters that can significantly reduce the computational complexity of the method to an average system time of 1.9 minutes (s = 0.48 minutes) for a data set containing 1,000 incidences running on an Intel Core i5 processor.

AB - A common goal of most public health surveillance programs is to detect disease outbreaks before they become a threat to the public. In this work, we propose a novel and computationally feasible approach to this problem. By tackling public health surveillance with a supervised learner that can handle high-dimensional, mixed-type data, and even missing values; we developed a method that can accurately detect changes in disease incidence rates, even in high-dimensions. We use probability estimates from random forests to develop an alternative signal criterion that can detect when there is a concentration of disease incidences within a particular geographic region and/or subpopulation that is unlikely to have occurred by chance. A series of simulated experiments suggest this method is able to accurately detect the presence of disease clusters, on average, 88% of time. Simulated results also suggest a feasible combination of the method's parameters that can significantly reduce the computational complexity of the method to an average system time of 1.9 minutes (s = 0.48 minutes) for a data set containing 1,000 incidences running on an Intel Core i5 processor.

UR - http://www.scopus.com/inward/record.url?scp=84900334917&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84900334917&partnerID=8YFLogxK

M3 - Conference contribution

SP - 2551

EP - 2560

BT - IIE Annual Conference and Expo 2013

PB - Institute of Industrial Engineers

ER -