Abstract
Consider a data publishing setting for a dataset composed by both private and non-private features. The publisher uses an empirical distribution, estimated from n i.i.d. samples, to design a privacy mechanism which is applied to new fresh samples afterward. In this paper, we study the discrepancy between the privacy-utility guarantees for the empirical distribution, used to design the privacy mechanism, and those for the true distribution, experienced by the privacy mechanism in practice. We first show that, for any privacy mechanism, these discrepancies vanish at speed O(1√n) with high probability. These bounds follow from our main technical results regarding the Lipschitz continuity of the considered information leakage measures. Then we prove that the optimal privacy mechanisms for the empirical distribution approach the corresponding mechanisms for the true distribution as the sample size n increases, thereby establishing the statistical consistency of the optimal privacy mechanisms. Finally, we introduce and study uniform privacy mechanisms which, by construction, provide privacy to all the distributions within a neighborhood of the estimated distribution and, thereby, guarantee privacy for the true distribution with high probability.
Original language | English (US) |
---|---|
Article number | 8825803 |
Pages (from-to) | 1949-1978 |
Number of pages | 30 |
Journal | IEEE Transactions on Information Theory |
Volume | 66 |
Issue number | 4 |
DOIs | |
State | Published - Apr 2020 |
Keywords
- Robustness
- information leakage measures
- large deviations
- privacy-utility trade-off
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Library and Information Sciences