Say what? A dataset for exploring the error patterns that two ASR engines make

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

We present a new metadataset which provides insight into where and how two ASR systems make errors on several different speech datasets. By making this data readily available to researchers, we hope to stimulate research in the area of WER estimation models, in order to gain a deeper understanding of how intelligibility is encoded in speech. Using this dataset, we attempt to estimate intelligibility using a state-of-the-art model for speech quality estimation and found that this model did not work to model speech intelligibility. This finding sheds light on the relationship between how speech quality is encoded in acoustic features and how intelligibility is encoded. It shows that we have a lot more to learn in how to effectively model intelligibility. It is our hope that the metadataset we present will stimulate research into creating systems that more effectively model intelligibility.

Original languageEnglish (US)
Pages (from-to)2528-2532
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2019-September
DOIs
StatePublished - Jan 1 2019
Event20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria
Duration: Sep 15 2019Sep 19 2019

Keywords

  • Auditory perception
  • Automatic speech recognition
  • Error detection
  • Estimation models
  • Intelligibility
  • Quality

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint Dive into the research topics of 'Say what? A dataset for exploring the error patterns that two ASR engines make'. Together they form a unique fingerprint.

Cite this