Regional influenza prediction with sampling twitter data and PDE model

Yufang Wang, Kuai Xu, Yun Kang, Haiyan Wang, Feng Wang, Adrian Avram

Research output: Contribution to journalArticle

Abstract

The large volume of geotagged Twitter streaming data on flu epidemics provides chances for researchers to explore, model, and predict the trends of flu cases in a timely manner. However, the explosive growth of data from social media makes data sampling a natural choice. In this paper, we develop a method for influenza prediction based on the real‐time tweet data from social media, and this method ensures real‐time prediction and is applicable to sampling data. Specifically, we first simulate the sampling process of flu tweets, and then develop a specific partial differential equation (PDE) model to characterize and predict the aggregated flu tweet volumes. Our PDE model incorporates the effects of flu spreading, flu recovery, and active human interventions for reducing flu. Our extensive simulation results show that this PDE model can almost eliminate the data reduction effects from the sampling process: It requires lesser historical data but achieves stronger prediction results with a relative accuracy of over 90% on the 1% sampling data. Even for the more aggressive data sampling ratios such as 0.1% and 0.01% sampling, our model is still able to achieve relative accuracies of 85% and 83%, respectively. These promising results highlight the ability of our mechanistic PDE model in predicting temporal–spatial patterns of flu trends even in the scenario of small sampling Twitter data.

Original languageEnglish (US)
Article number678
JournalInternational journal of environmental research and public health
Volume17
Issue number3
DOIs
StatePublished - Feb 1 2020

    Fingerprint

Keywords

  • Flu prediction
  • PDE model
  • Sampling tweets data

ASJC Scopus subject areas

  • Public Health, Environmental and Occupational Health
  • Health, Toxicology and Mutagenesis

Cite this