Investigating the Effects of Word Substitution Errors on Sentence Embeddings

Rohit Voleti; Julie M. Liss; Visar Berisha

doi:10.1109/ICASSP.2019.8683367

Investigating the Effects of Word Substitution Errors on Sentence Embeddings

Rohit Voleti, Julie M. Liss, Visar Berisha

Health Solutions, College of (CHS)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

10 Scopus citations

Abstract

A key initial step in several natural language processing (NLP) tasks involves embedding phrases of text to vectors of real numbers that preserve semantic meaning. To that end, several methods have been recently proposed with impressive results on semantic similarity tasks. However, all of these approaches assume that perfect transcripts are available when generating the embeddings. While this is a reasonable assumption for analysis of written text, it is limiting for analysis of transcribed text. In this paper we investigate the effects of word substitution errors, such as those coming from automatic speech recognition errors (ASR), on several state-of-the-art sentence embedding methods. To do this, we propose a new simulator that allows the experimenter to induce ASR-plausible word substitution errors in a corpus at a desired word error rate. We use this simulator to evaluate the robustness of several sentence embedding methods. Our results show that pre-trained neural sentence encoders are both robust to ASR errors and perform well on textual similarity tasks after errors are introduced. Meanwhile, unweighted averages of word vectors perform well with perfect transcriptions, but their performance degrades rapidly on textual similarity tasks for text with word substitution errors.

Original language	English (US)
Title of host publication	2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	7315-7319
Number of pages	5
ISBN (Electronic)	9781479981311
DOIs	https://doi.org/10.1109/ICASSP.2019.8683367
State	Published - May 2019
Event	44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Brighton, United Kingdom Duration: May 12 2019 → May 17 2019

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	2019-May
ISSN (Print)	1520-6149

Conference

Conference	44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
Country/Territory	United Kingdom
City	Brighton
Period	5/12/19 → 5/17/19

Keywords

ASR Error Simulator
Natural Language Processing
Semantic Embedding
Sentence Embeddings
Speech Recognition

ASJC Scopus subject areas

Software
Signal Processing
Electrical and Electronic Engineering

Access to Document

10.1109/ICASSP.2019.8683367

Cite this

Voleti, R., Liss, J. M., & Berisha, V. (2019). Investigating the Effects of Word Substitution Errors on Sentence Embeddings. In 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings (pp. 7315-7319). Article 8683367 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2019-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2019.8683367

Investigating the Effects of Word Substitution Errors on Sentence Embeddings. / Voleti, Rohit; Liss, Julie M.; Berisha, Visar.
2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. p. 7315-7319 8683367 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2019-May).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Voleti, R, Liss, JM & Berisha, V 2019, Investigating the Effects of Word Substitution Errors on Sentence Embeddings. in 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings., 8683367, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2019-May, Institute of Electrical and Electronics Engineers Inc., pp. 7315-7319, 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019, Brighton, United Kingdom, 5/12/19. https://doi.org/10.1109/ICASSP.2019.8683367

Voleti R, Liss JM , Berisha V. Investigating the Effects of Word Substitution Errors on Sentence Embeddings. In 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2019. p. 7315-7319. 8683367. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2019.8683367

Voleti, Rohit ; Liss, Julie M. ; Berisha, Visar. / Investigating the Effects of Word Substitution Errors on Sentence Embeddings. 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 7315-7319 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{596c2528d4ed4ce9b2c0e056cf5385f7,

title = "Investigating the Effects of Word Substitution Errors on Sentence Embeddings",

abstract = "A key initial step in several natural language processing (NLP) tasks involves embedding phrases of text to vectors of real numbers that preserve semantic meaning. To that end, several methods have been recently proposed with impressive results on semantic similarity tasks. However, all of these approaches assume that perfect transcripts are available when generating the embeddings. While this is a reasonable assumption for analysis of written text, it is limiting for analysis of transcribed text. In this paper we investigate the effects of word substitution errors, such as those coming from automatic speech recognition errors (ASR), on several state-of-the-art sentence embedding methods. To do this, we propose a new simulator that allows the experimenter to induce ASR-plausible word substitution errors in a corpus at a desired word error rate. We use this simulator to evaluate the robustness of several sentence embedding methods. Our results show that pre-trained neural sentence encoders are both robust to ASR errors and perform well on textual similarity tasks after errors are introduced. Meanwhile, unweighted averages of word vectors perform well with perfect transcriptions, but their performance degrades rapidly on textual similarity tasks for text with word substitution errors.",

keywords = "ASR Error Simulator, Natural Language Processing, Semantic Embedding, Sentence Embeddings, Speech Recognition",

author = "Rohit Voleti and Liss, {Julie M.} and Visar Berisha",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 ; Conference date: 12-05-2019 Through 17-05-2019",

year = "2019",

month = may,

doi = "10.1109/ICASSP.2019.8683367",

language = "English (US)",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "7315--7319",

booktitle = "2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings",

}

TY - GEN

T1 - Investigating the Effects of Word Substitution Errors on Sentence Embeddings

AU - Voleti, Rohit

AU - Liss, Julie M.

AU - Berisha, Visar

PY - 2019/5

Y1 - 2019/5

N2 - A key initial step in several natural language processing (NLP) tasks involves embedding phrases of text to vectors of real numbers that preserve semantic meaning. To that end, several methods have been recently proposed with impressive results on semantic similarity tasks. However, all of these approaches assume that perfect transcripts are available when generating the embeddings. While this is a reasonable assumption for analysis of written text, it is limiting for analysis of transcribed text. In this paper we investigate the effects of word substitution errors, such as those coming from automatic speech recognition errors (ASR), on several state-of-the-art sentence embedding methods. To do this, we propose a new simulator that allows the experimenter to induce ASR-plausible word substitution errors in a corpus at a desired word error rate. We use this simulator to evaluate the robustness of several sentence embedding methods. Our results show that pre-trained neural sentence encoders are both robust to ASR errors and perform well on textual similarity tasks after errors are introduced. Meanwhile, unweighted averages of word vectors perform well with perfect transcriptions, but their performance degrades rapidly on textual similarity tasks for text with word substitution errors.

AB - A key initial step in several natural language processing (NLP) tasks involves embedding phrases of text to vectors of real numbers that preserve semantic meaning. To that end, several methods have been recently proposed with impressive results on semantic similarity tasks. However, all of these approaches assume that perfect transcripts are available when generating the embeddings. While this is a reasonable assumption for analysis of written text, it is limiting for analysis of transcribed text. In this paper we investigate the effects of word substitution errors, such as those coming from automatic speech recognition errors (ASR), on several state-of-the-art sentence embedding methods. To do this, we propose a new simulator that allows the experimenter to induce ASR-plausible word substitution errors in a corpus at a desired word error rate. We use this simulator to evaluate the robustness of several sentence embedding methods. Our results show that pre-trained neural sentence encoders are both robust to ASR errors and perform well on textual similarity tasks after errors are introduced. Meanwhile, unweighted averages of word vectors perform well with perfect transcriptions, but their performance degrades rapidly on textual similarity tasks for text with word substitution errors.

KW - ASR Error Simulator

KW - Natural Language Processing

KW - Semantic Embedding

KW - Sentence Embeddings

KW - Speech Recognition

UR - http://www.scopus.com/inward/record.url?scp=85068991322&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068991322&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2019.8683367

DO - 10.1109/ICASSP.2019.8683367

M3 - Conference contribution

AN - SCOPUS:85068991322

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 7315

EP - 7319

BT - 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019

Y2 - 12 May 2019 through 17 May 2019

ER -

Investigating the Effects of Word Substitution Errors on Sentence Embeddings

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this