TY - GEN
T1 - ATTRIBUTABLE WATERMARKING OF SPEECH GENERATIVE MODELS
AU - Cho, Yongbaek
AU - Kim, Changhoon
AU - Yang, Yezhou
AU - Ren, Yi
N1 - Funding Information:
This work is partially supported by the National Science Foundation under Grant No. 2101052 and by an Amazon AWS Machine Learning Research Award (MLRA). Any opinions, findings, and conclusions expressed in this material are those of the author(s) and do not reflect the views of the funding entities.
Funding Information:
YC, CK and YY are with the Active Perception Group at the School of Computing and Augmented Intelligence, Arizona State University. YR is with the School for Engineering of Matter, Transport, and Energy, Arizona State University. This work is partially supported by the National Science Foundation under Grant No. 2101052 and by an Amazon AWS Machine Learning Research Award (MLRA). Any opinions, findings, and conclusions expressed in this material are those of the author(s) and do not reflect the views of the funding entities.
Publisher Copyright:
© 2022 IEEE
PY - 2022
Y1 - 2022
N2 - Generative models are now capable of synthesizing images, speeches, and videos that are hardly distinguishable from authentic contents. Such capabilities cause concerns such as malicious impersonation and IP theft. This paper investigates a solution for model attribution, i.e., the classification of synthetic contents by their source models via watermarks embedded in the contents. Building on past success of model attribution in the image domain, we discuss algorithmic improvements for generating user-end speech models that empirically achieve high attribution accuracy, while maintaining high generation quality. We show the tradeoff between attributability and generation quality under a variety of attacks on generated speech signals attempting to remove the watermarks, and the feasibility of learning robust watermarks against these attacks. Watermarked speech samples are available at https://attdemo.github.io/attdemofull.github.io.
AB - Generative models are now capable of synthesizing images, speeches, and videos that are hardly distinguishable from authentic contents. Such capabilities cause concerns such as malicious impersonation and IP theft. This paper investigates a solution for model attribution, i.e., the classification of synthetic contents by their source models via watermarks embedded in the contents. Building on past success of model attribution in the image domain, we discuss algorithmic improvements for generating user-end speech models that empirically achieve high attribution accuracy, while maintaining high generation quality. We show the tradeoff between attributability and generation quality under a variety of attacks on generated speech signals attempting to remove the watermarks, and the feasibility of learning robust watermarks against these attacks. Watermarked speech samples are available at https://attdemo.github.io/attdemofull.github.io.
KW - Model Attribution
KW - Speech Generation
KW - Speech Watermarking
KW - Voice Impersonation
UR - http://www.scopus.com/inward/record.url?scp=85131250648&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85131250648&partnerID=8YFLogxK
U2 - 10.1109/ICASSP43922.2022.9746578
DO - 10.1109/ICASSP43922.2022.9746578
M3 - Conference contribution
AN - SCOPUS:85131250648
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 3069
EP - 3073
BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
Y2 - 23 May 2022 through 27 May 2022
ER -