Say what? A dataset for exploring the error patterns that two ASR engines make

Meredith Moore; Michael Saxon; Hemanth Venkateswara; Visar Berisha; Sethuraman Panchanathan

doi:10.21437/Interspeech.2019-3096

Say what? A dataset for exploring the error patterns that two ASR engines make

Meredith Moore, Michael Saxon, Hemanth Venkateswara, Visar Berisha, Sethuraman Panchanathan

Research output: Contribution to journal › Conference article › peer-review

11 Scopus citations

Abstract

We present a new metadataset which provides insight into where and how two ASR systems make errors on several different speech datasets. By making this data readily available to researchers, we hope to stimulate research in the area of WER estimation models, in order to gain a deeper understanding of how intelligibility is encoded in speech. Using this dataset, we attempt to estimate intelligibility using a state-of-the-art model for speech quality estimation and found that this model did not work to model speech intelligibility. This finding sheds light on the relationship between how speech quality is encoded in acoustic features and how intelligibility is encoded. It shows that we have a lot more to learn in how to effectively model intelligibility. It is our hope that the metadataset we present will stimulate research into creating systems that more effectively model intelligibility.

Original language	English (US)
Pages (from-to)	2528-2532
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2019-September
DOIs	https://doi.org/10.21437/Interspeech.2019-3096
State	Published - 2019
Event	20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria Duration: Sep 15 2019 → Sep 19 2019

Keywords

Auditory perception
Automatic speech recognition
Error detection
Estimation models
Intelligibility
Quality

ASJC Scopus subject areas

Language and Linguistics
Human-Computer Interaction
Signal Processing
Software
Modeling and Simulation

Access to Document

10.21437/Interspeech.2019-3096

Cite this

Say what? A dataset for exploring the error patterns that two ASR engines make. / Moore, Meredith; Saxon, Michael; Venkateswara, Hemanth et al.
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2019-September, 2019, p. 2528-2532.

Research output: Contribution to journal › Conference article › peer-review

@article{ec9a1c2ad2024d22a0efb7b90f010a26,

title = "Say what? A dataset for exploring the error patterns that two ASR engines make",

abstract = "We present a new metadataset which provides insight into where and how two ASR systems make errors on several different speech datasets. By making this data readily available to researchers, we hope to stimulate research in the area of WER estimation models, in order to gain a deeper understanding of how intelligibility is encoded in speech. Using this dataset, we attempt to estimate intelligibility using a state-of-the-art model for speech quality estimation and found that this model did not work to model speech intelligibility. This finding sheds light on the relationship between how speech quality is encoded in acoustic features and how intelligibility is encoded. It shows that we have a lot more to learn in how to effectively model intelligibility. It is our hope that the metadataset we present will stimulate research into creating systems that more effectively model intelligibility.",

keywords = "Auditory perception, Automatic speech recognition, Error detection, Estimation models, Intelligibility, Quality",

author = "Meredith Moore and Michael Saxon and Hemanth Venkateswara and Visar Berisha and Sethuraman Panchanathan",

note = "Funding Information: We wish to acknowledge the National Science Foundation (NSF) and their generous support through the NSF Graduate Research Fellowship program, as well as Arizona State University{\textquoteright}s Center for Cognitive Ubiquitous Computing. Funding Information: We wish to acknowledge the National Science Foundation (NSF) and their generous support through the NSF Graduate Research Fellowship program, as well as Arizona State University's Center for Cognitive Ubiquitous Computing. Publisher Copyright: Copyright {\textcopyright} 2019 ISCA; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 ; Conference date: 15-09-2019 Through 19-09-2019",

year = "2019",

doi = "10.21437/Interspeech.2019-3096",

language = "English (US)",

volume = "2019-September",

pages = "2528--2532",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Say what? A dataset for exploring the error patterns that two ASR engines make

AU - Moore, Meredith

AU - Saxon, Michael

AU - Venkateswara, Hemanth

AU - Berisha, Visar

AU - Panchanathan, Sethuraman

N1 - Funding Information: We wish to acknowledge the National Science Foundation (NSF) and their generous support through the NSF Graduate Research Fellowship program, as well as Arizona State University’s Center for Cognitive Ubiquitous Computing. Funding Information: We wish to acknowledge the National Science Foundation (NSF) and their generous support through the NSF Graduate Research Fellowship program, as well as Arizona State University's Center for Cognitive Ubiquitous Computing. Publisher Copyright: Copyright © 2019 ISCA

PY - 2019

Y1 - 2019

N2 - We present a new metadataset which provides insight into where and how two ASR systems make errors on several different speech datasets. By making this data readily available to researchers, we hope to stimulate research in the area of WER estimation models, in order to gain a deeper understanding of how intelligibility is encoded in speech. Using this dataset, we attempt to estimate intelligibility using a state-of-the-art model for speech quality estimation and found that this model did not work to model speech intelligibility. This finding sheds light on the relationship between how speech quality is encoded in acoustic features and how intelligibility is encoded. It shows that we have a lot more to learn in how to effectively model intelligibility. It is our hope that the metadataset we present will stimulate research into creating systems that more effectively model intelligibility.

AB - We present a new metadataset which provides insight into where and how two ASR systems make errors on several different speech datasets. By making this data readily available to researchers, we hope to stimulate research in the area of WER estimation models, in order to gain a deeper understanding of how intelligibility is encoded in speech. Using this dataset, we attempt to estimate intelligibility using a state-of-the-art model for speech quality estimation and found that this model did not work to model speech intelligibility. This finding sheds light on the relationship between how speech quality is encoded in acoustic features and how intelligibility is encoded. It shows that we have a lot more to learn in how to effectively model intelligibility. It is our hope that the metadataset we present will stimulate research into creating systems that more effectively model intelligibility.

KW - Auditory perception

KW - Automatic speech recognition

KW - Error detection

KW - Estimation models

KW - Intelligibility

KW - Quality

UR - http://www.scopus.com/inward/record.url?scp=85074730761&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074730761&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2019-3096

DO - 10.21437/Interspeech.2019-3096

M3 - Conference article

AN - SCOPUS:85074730761

SN - 2308-457X

VL - 2019-September

SP - 2528

EP - 2532

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019

Y2 - 15 September 2019 through 19 September 2019

ER -

Say what? A dataset for exploring the error patterns that two ASR engines make

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this