Say what? A dataset for exploring the error patterns that two ASR engines make

Meredith Moore, Michael Saxon, Hemanth Venkateswara, Visar Berisha, Sethuraman Panchanathan

Research output: Contribution to journalConference articlepeer-review

11 Scopus citations

Abstract

We present a new metadataset which provides insight into where and how two ASR systems make errors on several different speech datasets. By making this data readily available to researchers, we hope to stimulate research in the area of WER estimation models, in order to gain a deeper understanding of how intelligibility is encoded in speech. Using this dataset, we attempt to estimate intelligibility using a state-of-the-art model for speech quality estimation and found that this model did not work to model speech intelligibility. This finding sheds light on the relationship between how speech quality is encoded in acoustic features and how intelligibility is encoded. It shows that we have a lot more to learn in how to effectively model intelligibility. It is our hope that the metadataset we present will stimulate research into creating systems that more effectively model intelligibility.

Keywords

  • Auditory perception
  • Automatic speech recognition
  • Error detection
  • Estimation models
  • Intelligibility
  • Quality

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Say what? A dataset for exploring the error patterns that two ASR engines make'. Together they form a unique fingerprint.

Cite this