ATTRIBUTABLE WATERMARKING OF SPEECH GENERATIVE MODELS

Yongbaek Cho; Changhoon Kim; Yezhou Yang; Yi Ren

doi:10.1109/ICASSP43922.2022.9746578

ATTRIBUTABLE WATERMARKING OF SPEECH GENERATIVE MODELS

Yongbaek Cho, Changhoon Kim, Yezhou Yang, Yi Ren

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Scopus citations

Abstract

Generative models are now capable of synthesizing images, speeches, and videos that are hardly distinguishable from authentic contents. Such capabilities cause concerns such as malicious impersonation and IP theft. This paper investigates a solution for model attribution, i.e., the classification of synthetic contents by their source models via watermarks embedded in the contents. Building on past success of model attribution in the image domain, we discuss algorithmic improvements for generating user-end speech models that empirically achieve high attribution accuracy, while maintaining high generation quality. We show the tradeoff between attributability and generation quality under a variety of attacks on generated speech signals attempting to remove the watermarks, and the feasibility of learning robust watermarks against these attacks. Watermarked speech samples are available at https://attdemo.github.io/attdemofull.github.io.

Original language	English (US)
Title of host publication	2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	3069-3073
Number of pages	5
ISBN (Electronic)	9781665405409
DOIs	https://doi.org/10.1109/ICASSP43922.2022.9746578
State	Published - 2022
Event	47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapore Duration: May 23 2022 → May 27 2022

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	2022-May
ISSN (Print)	1520-6149

Conference

Conference	47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
Country/Territory	Singapore
City	Virtual, Online
Period	5/23/22 → 5/27/22

Keywords

Model Attribution
Speech Generation
Speech Watermarking
Voice Impersonation

ASJC Scopus subject areas

Software
Signal Processing
Electrical and Electronic Engineering

Access to Document

10.1109/ICASSP43922.2022.9746578

Cite this

Cho, Y., Kim, C., Yang, Y., & Ren, Y. (2022). ATTRIBUTABLE WATERMARKING OF SPEECH GENERATIVE MODELS. In 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings (pp. 3069-3073). (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2022-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP43922.2022.9746578

ATTRIBUTABLE WATERMARKING OF SPEECH GENERATIVE MODELS. / Cho, Yongbaek; Kim, Changhoon; Yang, Yezhou et al.
2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. p. 3069-3073 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2022-May).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Cho, Y, Kim, C, Yang, Y & Ren, Y 2022, ATTRIBUTABLE WATERMARKING OF SPEECH GENERATIVE MODELS. in 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2022-May, Institute of Electrical and Electronics Engineers Inc., pp. 3069-3073, 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022, Virtual, Online, Singapore, 5/23/22. https://doi.org/10.1109/ICASSP43922.2022.9746578

Cho Y, Kim C, Yang Y , Ren Y. ATTRIBUTABLE WATERMARKING OF SPEECH GENERATIVE MODELS. In 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2022. p. 3069-3073. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP43922.2022.9746578

Cho, Yongbaek ; Kim, Changhoon ; Yang, Yezhou et al. / ATTRIBUTABLE WATERMARKING OF SPEECH GENERATIVE MODELS. 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. pp. 3069-3073 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{e1924c48a85d477390a928e5cfb59f1a,

title = "ATTRIBUTABLE WATERMARKING OF SPEECH GENERATIVE MODELS",

abstract = "Generative models are now capable of synthesizing images, speeches, and videos that are hardly distinguishable from authentic contents. Such capabilities cause concerns such as malicious impersonation and IP theft. This paper investigates a solution for model attribution, i.e., the classification of synthetic contents by their source models via watermarks embedded in the contents. Building on past success of model attribution in the image domain, we discuss algorithmic improvements for generating user-end speech models that empirically achieve high attribution accuracy, while maintaining high generation quality. We show the tradeoff between attributability and generation quality under a variety of attacks on generated speech signals attempting to remove the watermarks, and the feasibility of learning robust watermarks against these attacks. Watermarked speech samples are available at https://attdemo.github.io/attdemofull.github.io.",

keywords = "Model Attribution, Speech Generation, Speech Watermarking, Voice Impersonation",

author = "Yongbaek Cho and Changhoon Kim and Yezhou Yang and Yi Ren",

note = "Funding Information: This work is partially supported by the National Science Foundation under Grant No. 2101052 and by an Amazon AWS Machine Learning Research Award (MLRA). Any opinions, findings, and conclusions expressed in this material are those of the author(s) and do not reflect the views of the funding entities. Funding Information: YC, CK and YY are with the Active Perception Group at the School of Computing and Augmented Intelligence, Arizona State University. YR is with the School for Engineering of Matter, Transport, and Energy, Arizona State University. This work is partially supported by the National Science Foundation under Grant No. 2101052 and by an Amazon AWS Machine Learning Research Award (MLRA). Any opinions, findings, and conclusions expressed in this material are those of the author(s) and do not reflect the views of the funding entities. Publisher Copyright: {\textcopyright} 2022 IEEE; 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 ; Conference date: 23-05-2022 Through 27-05-2022",

year = "2022",

doi = "10.1109/ICASSP43922.2022.9746578",

language = "English (US)",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "3069--3073",

booktitle = "2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings",

}

TY - GEN

T1 - ATTRIBUTABLE WATERMARKING OF SPEECH GENERATIVE MODELS

AU - Cho, Yongbaek

AU - Kim, Changhoon

AU - Yang, Yezhou

AU - Ren, Yi

N1 - Funding Information: This work is partially supported by the National Science Foundation under Grant No. 2101052 and by an Amazon AWS Machine Learning Research Award (MLRA). Any opinions, findings, and conclusions expressed in this material are those of the author(s) and do not reflect the views of the funding entities. Funding Information: YC, CK and YY are with the Active Perception Group at the School of Computing and Augmented Intelligence, Arizona State University. YR is with the School for Engineering of Matter, Transport, and Energy, Arizona State University. This work is partially supported by the National Science Foundation under Grant No. 2101052 and by an Amazon AWS Machine Learning Research Award (MLRA). Any opinions, findings, and conclusions expressed in this material are those of the author(s) and do not reflect the views of the funding entities. Publisher Copyright: © 2022 IEEE

PY - 2022

Y1 - 2022

N2 - Generative models are now capable of synthesizing images, speeches, and videos that are hardly distinguishable from authentic contents. Such capabilities cause concerns such as malicious impersonation and IP theft. This paper investigates a solution for model attribution, i.e., the classification of synthetic contents by their source models via watermarks embedded in the contents. Building on past success of model attribution in the image domain, we discuss algorithmic improvements for generating user-end speech models that empirically achieve high attribution accuracy, while maintaining high generation quality. We show the tradeoff between attributability and generation quality under a variety of attacks on generated speech signals attempting to remove the watermarks, and the feasibility of learning robust watermarks against these attacks. Watermarked speech samples are available at https://attdemo.github.io/attdemofull.github.io.

AB - Generative models are now capable of synthesizing images, speeches, and videos that are hardly distinguishable from authentic contents. Such capabilities cause concerns such as malicious impersonation and IP theft. This paper investigates a solution for model attribution, i.e., the classification of synthetic contents by their source models via watermarks embedded in the contents. Building on past success of model attribution in the image domain, we discuss algorithmic improvements for generating user-end speech models that empirically achieve high attribution accuracy, while maintaining high generation quality. We show the tradeoff between attributability and generation quality under a variety of attacks on generated speech signals attempting to remove the watermarks, and the feasibility of learning robust watermarks against these attacks. Watermarked speech samples are available at https://attdemo.github.io/attdemofull.github.io.

KW - Model Attribution

KW - Speech Generation

KW - Speech Watermarking

KW - Voice Impersonation

UR - http://www.scopus.com/inward/record.url?scp=85131250648&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85131250648&partnerID=8YFLogxK

U2 - 10.1109/ICASSP43922.2022.9746578

DO - 10.1109/ICASSP43922.2022.9746578

M3 - Conference contribution

AN - SCOPUS:85131250648

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 3069

EP - 3073

BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022

Y2 - 23 May 2022 through 27 May 2022

ER -

ATTRIBUTABLE WATERMARKING OF SPEECH GENERATIVE MODELS

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this