TY - JOUR
T1 - On the robustness of information-theoretic privacy measures and mechanisms
AU - Diaz, Mario
AU - Wang, Hao
AU - Calmon, Flavio P.
AU - Sankar, Lalitha
N1 - Funding Information:
Manuscript received November 14, 2018; revised June 14, 2019; accepted August 17, 2019. Date of publication September 5, 2019; date of current version March 17, 2020. This work was supported in part by the National Science Foundation under Grant CCF-1845852, CCF-1350914, CIF-1815361, CIF-1901243 and in part by a seed grant towards a Center for Data Privacy from Arizona State University. This paper was presented at the 2018 IEEE International Symposium on Information Theory (ISIT) [1].
Publisher Copyright:
© 1963-2012 IEEE.
PY - 2020/4
Y1 - 2020/4
N2 - Consider a data publishing setting for a dataset composed by both private and non-private features. The publisher uses an empirical distribution, estimated from n i.i.d. samples, to design a privacy mechanism which is applied to new fresh samples afterward. In this paper, we study the discrepancy between the privacy-utility guarantees for the empirical distribution, used to design the privacy mechanism, and those for the true distribution, experienced by the privacy mechanism in practice. We first show that, for any privacy mechanism, these discrepancies vanish at speed O(1√n) with high probability. These bounds follow from our main technical results regarding the Lipschitz continuity of the considered information leakage measures. Then we prove that the optimal privacy mechanisms for the empirical distribution approach the corresponding mechanisms for the true distribution as the sample size n increases, thereby establishing the statistical consistency of the optimal privacy mechanisms. Finally, we introduce and study uniform privacy mechanisms which, by construction, provide privacy to all the distributions within a neighborhood of the estimated distribution and, thereby, guarantee privacy for the true distribution with high probability.
AB - Consider a data publishing setting for a dataset composed by both private and non-private features. The publisher uses an empirical distribution, estimated from n i.i.d. samples, to design a privacy mechanism which is applied to new fresh samples afterward. In this paper, we study the discrepancy between the privacy-utility guarantees for the empirical distribution, used to design the privacy mechanism, and those for the true distribution, experienced by the privacy mechanism in practice. We first show that, for any privacy mechanism, these discrepancies vanish at speed O(1√n) with high probability. These bounds follow from our main technical results regarding the Lipschitz continuity of the considered information leakage measures. Then we prove that the optimal privacy mechanisms for the empirical distribution approach the corresponding mechanisms for the true distribution as the sample size n increases, thereby establishing the statistical consistency of the optimal privacy mechanisms. Finally, we introduce and study uniform privacy mechanisms which, by construction, provide privacy to all the distributions within a neighborhood of the estimated distribution and, thereby, guarantee privacy for the true distribution with high probability.
KW - Robustness
KW - information leakage measures
KW - large deviations
KW - privacy-utility trade-off
UR - http://www.scopus.com/inward/record.url?scp=85082175902&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85082175902&partnerID=8YFLogxK
U2 - 10.1109/TIT.2019.2939472
DO - 10.1109/TIT.2019.2939472
M3 - Article
AN - SCOPUS:85082175902
VL - 66
SP - 1949
EP - 1978
JO - IRE Professional Group on Information Theory
JF - IRE Professional Group on Information Theory
SN - 0018-9448
IS - 4
M1 - 8825803
ER -