TY - GEN
T1 - Deep reinforcement learning-based text anonymization against private-attribute inference
AU - Mosallanezhad, Ahmadreza
AU - Beigi, Ghazaleh
AU - Liu, Huan
N1 - Funding Information:
The authors would like to thank Jundong Li for his help throughout the paper. This material is based upon the work supported, in part, by NSF 1614576, ARO W911NF-15-1-0328 and ONR N00014-17-1-2605.
Publisher Copyright:
© 2019 Association for Computational Linguistics
PY - 2019
Y1 - 2019
N2 - User-generated textual data is rich in content and has been used in many user behavioral modeling tasks. However, it could also leak user private-attribute information that they may not want to disclose such as age and location. User's privacy concerns mandate data publishers to protect privacy. One effective way is to anonymize the textual data. In this paper, we study the problem of textual data anonymization and propose a novel Reinforcement Learning-based Text Anonymizor, RLTA, which addresses the problem of private-attribute leakage while preserving the utility of textual data. Our approach first extracts a latent representation of the original text w.r.t. a given task, then leverages deep reinforcement learning to automatically learn an optimal strategy for manipulating text representations w.r.t. the received privacy and utility feedback. Experiments show the effectiveness of this approach in terms of preserving both privacy and utility.
AB - User-generated textual data is rich in content and has been used in many user behavioral modeling tasks. However, it could also leak user private-attribute information that they may not want to disclose such as age and location. User's privacy concerns mandate data publishers to protect privacy. One effective way is to anonymize the textual data. In this paper, we study the problem of textual data anonymization and propose a novel Reinforcement Learning-based Text Anonymizor, RLTA, which addresses the problem of private-attribute leakage while preserving the utility of textual data. Our approach first extracts a latent representation of the original text w.r.t. a given task, then leverages deep reinforcement learning to automatically learn an optimal strategy for manipulating text representations w.r.t. the received privacy and utility feedback. Experiments show the effectiveness of this approach in terms of preserving both privacy and utility.
UR - http://www.scopus.com/inward/record.url?scp=85084292760&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084292760&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85084292760
T3 - EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference
SP - 2360
EP - 2369
BT - EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference
PB - Association for Computational Linguistics
T2 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019
Y2 - 3 November 2019 through 7 November 2019
ER -