Say what? A dataset for exploring the error patterns that two ASR engines make

Research output: Contribution to journalConference article

Abstract

We present a new metadataset which provides insight into where and how two ASR systems make errors on several different speech datasets. By making this data readily available to researchers, we hope to stimulate research in the area of WER estimation models, in order to gain a deeper understanding of how intelligibility is encoded in speech. Using this dataset, we attempt to estimate intelligibility using a state-of-the-art model for speech quality estimation and found that this model did not work to model speech intelligibility. This finding sheds light on the relationship between how speech quality is encoded in acoustic features and how intelligibility is encoded. It shows that we have a lot more to learn in how to effectively model intelligibility. It is our hope that the metadataset we present will stimulate research into creating systems that more effectively model intelligibility.

Original languageEnglish (US)
Pages (from-to)2528-2532
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2019-September
DOIs
StatePublished - Jan 1 2019
Event20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria
Duration: Sep 15 2019Sep 19 2019

Fingerprint

Engine
Engines
Speech intelligibility
Model
Speech Intelligibility
Acoustics
Intelligibility
Speech
Estimate

Keywords

  • Auditory perception
  • Automatic speech recognition
  • Error detection
  • Estimation models
  • Intelligibility
  • Quality

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Cite this

@article{ec9a1c2ad2024d22a0efb7b90f010a26,
title = "Say what? A dataset for exploring the error patterns that two ASR engines make",
abstract = "We present a new metadataset which provides insight into where and how two ASR systems make errors on several different speech datasets. By making this data readily available to researchers, we hope to stimulate research in the area of WER estimation models, in order to gain a deeper understanding of how intelligibility is encoded in speech. Using this dataset, we attempt to estimate intelligibility using a state-of-the-art model for speech quality estimation and found that this model did not work to model speech intelligibility. This finding sheds light on the relationship between how speech quality is encoded in acoustic features and how intelligibility is encoded. It shows that we have a lot more to learn in how to effectively model intelligibility. It is our hope that the metadataset we present will stimulate research into creating systems that more effectively model intelligibility.",
keywords = "Auditory perception, Automatic speech recognition, Error detection, Estimation models, Intelligibility, Quality",
author = "Meredith Moore and Michael Saxon and Hemanth Venkateswara and Visar Berisha and Sethuraman Panchanathan",
year = "2019",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2019-3096",
language = "English (US)",
volume = "2019-September",
pages = "2528--2532",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Say what? A dataset for exploring the error patterns that two ASR engines make

AU - Moore, Meredith

AU - Saxon, Michael

AU - Venkateswara, Hemanth

AU - Berisha, Visar

AU - Panchanathan, Sethuraman

PY - 2019/1/1

Y1 - 2019/1/1

N2 - We present a new metadataset which provides insight into where and how two ASR systems make errors on several different speech datasets. By making this data readily available to researchers, we hope to stimulate research in the area of WER estimation models, in order to gain a deeper understanding of how intelligibility is encoded in speech. Using this dataset, we attempt to estimate intelligibility using a state-of-the-art model for speech quality estimation and found that this model did not work to model speech intelligibility. This finding sheds light on the relationship between how speech quality is encoded in acoustic features and how intelligibility is encoded. It shows that we have a lot more to learn in how to effectively model intelligibility. It is our hope that the metadataset we present will stimulate research into creating systems that more effectively model intelligibility.

AB - We present a new metadataset which provides insight into where and how two ASR systems make errors on several different speech datasets. By making this data readily available to researchers, we hope to stimulate research in the area of WER estimation models, in order to gain a deeper understanding of how intelligibility is encoded in speech. Using this dataset, we attempt to estimate intelligibility using a state-of-the-art model for speech quality estimation and found that this model did not work to model speech intelligibility. This finding sheds light on the relationship between how speech quality is encoded in acoustic features and how intelligibility is encoded. It shows that we have a lot more to learn in how to effectively model intelligibility. It is our hope that the metadataset we present will stimulate research into creating systems that more effectively model intelligibility.

KW - Auditory perception

KW - Automatic speech recognition

KW - Error detection

KW - Estimation models

KW - Intelligibility

KW - Quality

UR - http://www.scopus.com/inward/record.url?scp=85074730761&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074730761&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2019-3096

DO - 10.21437/Interspeech.2019-3096

M3 - Conference article

AN - SCOPUS:85074730761

VL - 2019-September

SP - 2528

EP - 2532

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -