Abstract
Recent widely publicized data breaches have exposed the personal information of hundreds of millions of people. Some reports point to alarming increases in both the size and frequency of data breaches, spurring institutions around the world to address what appears to be a worsening situation. But, is the problem actually growing worse? In this article, we study a popular public dataset and develop Bayesian Generalized Linear Models to investigate trends in data breaches. Analysis of the model shows that neither size nor frequency of data breaches has increased over the past decade. We find that the increases that have attracted attention can be explained by the heavytailed statistical distributions underlying the dataset. Specifically, we find that the size of data breaches is well modeled by the log-normal family of distributions and that the daily frequency of breaches is described by a negative binomial distribution. These distributions may provide clues to the generative mechanisms that are responsible for the breaches. Additionally, our model predicts the likelihood of breaches of a particular size in the future. For example, we find that between 15 September 2015 and 16 September 2016 there is only a 53.6% chance of a breach of 10 million records or more in the USA. Regardless of any trend, data breaches are costly, and we combine the model with two different cost models to project that in the next 3 years breaches could cost up to $179 billion.
Original language | English (US) |
---|---|
Pages (from-to) | 3-14 |
Number of pages | 12 |
Journal | Journal of Cybersecurity |
Volume | 2 |
Issue number | 1 |
DOIs | |
State | Published - Dec 1 2016 |
Externally published | Yes |
Keywords
- Bayesian linear model
- Data breaches
- Heavy tails
- Log-normal
- Negative binomial
ASJC Scopus subject areas
- Software
- Computer Science (miscellaneous)
- Social Psychology
- Information Systems
- Safety, Risk, Reliability and Quality
- Safety Research
- Hardware and Architecture
- Political Science and International Relations
- Computer Science Applications
- Computer Networks and Communications
- Law