Deep reinforcement learning-based text anonymization against private-attribute inference

Ahmadreza Mosallanezhad; Ghazaleh Beigi; Huan Liu

Deep reinforcement learning-based text anonymization against private-attribute inference

Ahmadreza Mosallanezhad, Ghazaleh Beigi, Huan Liu

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

User-generated textual data is rich in content and has been used in many user behavioral modeling tasks. However, it could also leak user private-attribute information that they may not want to disclose such as age and location. User's privacy concerns mandate data publishers to protect privacy. One effective way is to anonymize the textual data. In this paper, we study the problem of textual data anonymization and propose a novel Reinforcement Learning-based Text Anonymizor, RLTA, which addresses the problem of private-attribute leakage while preserving the utility of textual data. Our approach first extracts a latent representation of the original text w.r.t. a given task, then leverages deep reinforcement learning to automatically learn an optimal strategy for manipulating text representations w.r.t. the received privacy and utility feedback. Experiments show the effectiveness of this approach in terms of preserving both privacy and utility.

Original language	English (US)
Title of host publication	EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference
Publisher	Association for Computational Linguistics
Pages	2360-2369
Number of pages	10
ISBN (Electronic)	9781950737901
State	Published - 2019
Event	2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019 - Hong Kong, China Duration: Nov 3 2019 → Nov 7 2019

Publication series

Name	EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference

Conference

Conference	2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019
Country/Territory	China
City	Hong Kong
Period	11/3/19 → 11/7/19

ASJC Scopus subject areas

Computational Theory and Mathematics
Computer Science Applications
Information Systems

Cite this

Mosallanezhad, A., Beigi, G., & Liu, H. (2019). Deep reinforcement learning-based text anonymization against private-attribute inference. In EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp. 2360-2369). (EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference). Association for Computational Linguistics.

Deep reinforcement learning-based text anonymization against private-attribute inference. / Mosallanezhad, Ahmadreza; Beigi, Ghazaleh; Liu, Huan.
EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference. Association for Computational Linguistics, 2019. p. 2360-2369 (EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Mosallanezhad, A, Beigi, G & Liu, H 2019, Deep reinforcement learning-based text anonymization against private-attribute inference. in EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference. EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, Association for Computational Linguistics, pp. 2360-2369, 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, 11/3/19.

Mosallanezhad A, Beigi G, Liu H. Deep reinforcement learning-based text anonymization against private-attribute inference. In EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference. Association for Computational Linguistics. 2019. p. 2360-2369. (EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference).

Mosallanezhad, Ahmadreza ; Beigi, Ghazaleh ; Liu, Huan. / Deep reinforcement learning-based text anonymization against private-attribute inference. EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference. Association for Computational Linguistics, 2019. pp. 2360-2369 (EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference).

@inproceedings{562d3382d07043b0add1dba64564215b,

title = "Deep reinforcement learning-based text anonymization against private-attribute inference",

abstract = "User-generated textual data is rich in content and has been used in many user behavioral modeling tasks. However, it could also leak user private-attribute information that they may not want to disclose such as age and location. User's privacy concerns mandate data publishers to protect privacy. One effective way is to anonymize the textual data. In this paper, we study the problem of textual data anonymization and propose a novel Reinforcement Learning-based Text Anonymizor, RLTA, which addresses the problem of private-attribute leakage while preserving the utility of textual data. Our approach first extracts a latent representation of the original text w.r.t. a given task, then leverages deep reinforcement learning to automatically learn an optimal strategy for manipulating text representations w.r.t. the received privacy and utility feedback. Experiments show the effectiveness of this approach in terms of preserving both privacy and utility.",

author = "Ahmadreza Mosallanezhad and Ghazaleh Beigi and Huan Liu",

note = "Funding Information: The authors would like to thank Jundong Li for his help throughout the paper. This material is based upon the work supported, in part, by NSF 1614576, ARO W911NF-15-1-0328 and ONR N00014-17-1-2605. Publisher Copyright: {\textcopyright} 2019 Association for Computational Linguistics; 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019 ; Conference date: 03-11-2019 Through 07-11-2019",

year = "2019",

language = "English (US)",

series = "EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference",

publisher = "Association for Computational Linguistics",

pages = "2360--2369",

booktitle = "EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference",

}

TY - GEN

T1 - Deep reinforcement learning-based text anonymization against private-attribute inference

AU - Mosallanezhad, Ahmadreza

AU - Beigi, Ghazaleh

AU - Liu, Huan

N1 - Funding Information: The authors would like to thank Jundong Li for his help throughout the paper. This material is based upon the work supported, in part, by NSF 1614576, ARO W911NF-15-1-0328 and ONR N00014-17-1-2605. Publisher Copyright: © 2019 Association for Computational Linguistics

PY - 2019

Y1 - 2019

N2 - User-generated textual data is rich in content and has been used in many user behavioral modeling tasks. However, it could also leak user private-attribute information that they may not want to disclose such as age and location. User's privacy concerns mandate data publishers to protect privacy. One effective way is to anonymize the textual data. In this paper, we study the problem of textual data anonymization and propose a novel Reinforcement Learning-based Text Anonymizor, RLTA, which addresses the problem of private-attribute leakage while preserving the utility of textual data. Our approach first extracts a latent representation of the original text w.r.t. a given task, then leverages deep reinforcement learning to automatically learn an optimal strategy for manipulating text representations w.r.t. the received privacy and utility feedback. Experiments show the effectiveness of this approach in terms of preserving both privacy and utility.

AB - User-generated textual data is rich in content and has been used in many user behavioral modeling tasks. However, it could also leak user private-attribute information that they may not want to disclose such as age and location. User's privacy concerns mandate data publishers to protect privacy. One effective way is to anonymize the textual data. In this paper, we study the problem of textual data anonymization and propose a novel Reinforcement Learning-based Text Anonymizor, RLTA, which addresses the problem of private-attribute leakage while preserving the utility of textual data. Our approach first extracts a latent representation of the original text w.r.t. a given task, then leverages deep reinforcement learning to automatically learn an optimal strategy for manipulating text representations w.r.t. the received privacy and utility feedback. Experiments show the effectiveness of this approach in terms of preserving both privacy and utility.

UR - http://www.scopus.com/inward/record.url?scp=85084292760&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85084292760&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85084292760

T3 - EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference

SP - 2360

EP - 2369

BT - EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference

PB - Association for Computational Linguistics

T2 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019

Y2 - 3 November 2019 through 7 November 2019

ER -

Deep reinforcement learning-based text anonymization against private-attribute inference

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this