Abstract

A common goal of most public health surveillance programs is to detect disease outbreaks before they become a threat to the public. In this work, we propose a novel and computationally feasible approach to this problem. By tackling public health surveillance with a supervised learner that can handle high-dimensional, mixed-type data, and even missing values; we developed a method that can accurately detect changes in disease incidence rates, even in high-dimensions. We use probability estimates from random forests to develop an alternative signal criterion that can detect when there is a concentration of disease incidences within a particular geographic region and/or subpopulation that is unlikely to have occurred by chance. A series of simulated experiments suggest this method is able to accurately detect the presence of disease clusters, on average, 88% of time. Simulated results also suggest a feasible combination of the method's parameters that can significantly reduce the computational complexity of the method to an average system time of 1.9 minutes (s = 0.48 minutes) for a data set containing 1,000 incidences running on an Intel Core i5 processor.

Original languageEnglish (US)
Title of host publicationIIE Annual Conference and Expo 2013
PublisherInstitute of Industrial Engineers
Pages2551-2560
Number of pages10
StatePublished - 2013
EventIIE Annual Conference and Expo 2013 - San Juan, Puerto Rico
Duration: May 18 2013May 22 2013

Other

OtherIIE Annual Conference and Expo 2013
CountryPuerto Rico
CitySan Juan
Period5/18/135/22/13

ASJC Scopus subject areas

  • Industrial and Manufacturing Engineering

Fingerprint Dive into the research topics of 'High-dimensional disease outbreak detection using tree-based ensembles'. Together they form a unique fingerprint.

Cite this