TY - JOUR
T1 - Whistle-blowing ASRs
T2 - 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018
AU - Moore, Meredith
AU - Demakethepalli Venkateswara, Hemanth
AU - Panchanathan, Sethuraman
N1 - Funding Information:
We offer our sincere gratitude to the National Science Foundation’s Alliance for Person-Centered Accessible Technologies Interdisciplinary Graduate Education Traineeship (APAcT IGERT), as well as the National Science Foundation’s Graduate Research Fellowship, without which, this research would not have been possible.
Publisher Copyright:
© 2018 International Speech Communication Association. All rights reserved.
PY - 2018
Y1 - 2018
N2 - Speech is a complex process that can break in many different ways and lead to a variety of voice disorders. Dysarthria is a voice disorder where individuals are unable to control one or more of the aspects of speech'the articulation, breathing, voicing, or prosody'leading to less intelligible speech. In this paper, we evaluate the accuracy of state-of-the-art automatic speech recognition systems (ASRs) on two dysarthric speech datasets and compare the results to ASR performance on control speech. The limits of ASR performance using different voices have not been explored since the field has shifted from generative models of speech recognition to deep neural network architectures. To test how far the field has come in recognizing disordered speech, we test two different ASR systems: (1) Carnegie Mellon University's Sphinx Open Source Recognition, and (2) Google®Speech Recognition. While (1) uses generative models of speech recognition, (2) uses deep neural networks. As expected, while (2) achieved lower word error rates (WER) on dysarthric speech than (1), control speech had a WER 59% lower than dysarthric speech. Future studies should be focused not only on making ASRs robust to environmental noise, but also more robust to different voices.
AB - Speech is a complex process that can break in many different ways and lead to a variety of voice disorders. Dysarthria is a voice disorder where individuals are unable to control one or more of the aspects of speech'the articulation, breathing, voicing, or prosody'leading to less intelligible speech. In this paper, we evaluate the accuracy of state-of-the-art automatic speech recognition systems (ASRs) on two dysarthric speech datasets and compare the results to ASR performance on control speech. The limits of ASR performance using different voices have not been explored since the field has shifted from generative models of speech recognition to deep neural network architectures. To test how far the field has come in recognizing disordered speech, we test two different ASR systems: (1) Carnegie Mellon University's Sphinx Open Source Recognition, and (2) Google®Speech Recognition. While (1) uses generative models of speech recognition, (2) uses deep neural networks. As expected, while (2) achieved lower word error rates (WER) on dysarthric speech than (1), control speech had a WER 59% lower than dysarthric speech. Future studies should be focused not only on making ASRs robust to environmental noise, but also more robust to different voices.
KW - Dysarthric speech
KW - Speech recognition
KW - Voice disorders
UR - http://www.scopus.com/inward/record.url?scp=85054992004&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054992004&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2018-2391
DO - 10.21437/Interspeech.2018-2391
M3 - Conference article
AN - SCOPUS:85054992004
SN - 2308-457X
VL - 2018-September
SP - 466
EP - 470
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Y2 - 2 September 2018 through 6 September 2018
ER -