Whistle-blowing ASRs: Evaluating the need for more inclusive automatic speech recognition systems

Meredith Moore; Hemanth Demakethepalli Venkateswara; Sethuraman Panchanathan

doi:10.21437/Interspeech.2018-2391

Whistle-blowing ASRs: Evaluating the need for more inclusive automatic speech recognition systems

Meredith Moore, Hemanth Demakethepalli Venkateswara, Sethuraman Panchanathan

Research output: Contribution to journal › Conference article › peer-review

25 Scopus citations

Abstract

Speech is a complex process that can break in many different ways and lead to a variety of voice disorders. Dysarthria is a voice disorder where individuals are unable to control one or more of the aspects of speech'the articulation, breathing, voicing, or prosody'leading to less intelligible speech. In this paper, we evaluate the accuracy of state-of-the-art automatic speech recognition systems (ASRs) on two dysarthric speech datasets and compare the results to ASR performance on control speech. The limits of ASR performance using different voices have not been explored since the field has shifted from generative models of speech recognition to deep neural network architectures. To test how far the field has come in recognizing disordered speech, we test two different ASR systems: (1) Carnegie Mellon University's Sphinx Open Source Recognition, and (2) Google®Speech Recognition. While (1) uses generative models of speech recognition, (2) uses deep neural networks. As expected, while (2) achieved lower word error rates (WER) on dysarthric speech than (1), control speech had a WER 59% lower than dysarthric speech. Future studies should be focused not only on making ASRs robust to environmental noise, but also more robust to different voices.

Original language	English (US)
Pages (from-to)	466-470
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2018-September
DOIs	https://doi.org/10.21437/Interspeech.2018-2391
State	Published - 2018
Event	19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India Duration: Sep 2 2018 → Sep 6 2018

Keywords

Dysarthric speech
Speech recognition
Voice disorders

ASJC Scopus subject areas

Language and Linguistics
Human-Computer Interaction
Signal Processing
Software
Modeling and Simulation

Access to Document

10.21437/Interspeech.2018-2391

Cite this

Whistle-blowing ASRs: Evaluating the need for more inclusive automatic speech recognition systems. / Moore, Meredith; Demakethepalli Venkateswara, Hemanth; Panchanathan, Sethuraman.
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2018-September, 2018, p. 466-470.

Research output: Contribution to journal › Conference article › peer-review

@article{f2116c994e4c40e28d238b897b7e11b1,

title = "Whistle-blowing ASRs: Evaluating the need for more inclusive automatic speech recognition systems",

abstract = "Speech is a complex process that can break in many different ways and lead to a variety of voice disorders. Dysarthria is a voice disorder where individuals are unable to control one or more of the aspects of speech'the articulation, breathing, voicing, or prosody'leading to less intelligible speech. In this paper, we evaluate the accuracy of state-of-the-art automatic speech recognition systems (ASRs) on two dysarthric speech datasets and compare the results to ASR performance on control speech. The limits of ASR performance using different voices have not been explored since the field has shifted from generative models of speech recognition to deep neural network architectures. To test how far the field has come in recognizing disordered speech, we test two different ASR systems: (1) Carnegie Mellon University's Sphinx Open Source Recognition, and (2) Google{\textregistered}Speech Recognition. While (1) uses generative models of speech recognition, (2) uses deep neural networks. As expected, while (2) achieved lower word error rates (WER) on dysarthric speech than (1), control speech had a WER 59% lower than dysarthric speech. Future studies should be focused not only on making ASRs robust to environmental noise, but also more robust to different voices.",

keywords = "Dysarthric speech, Speech recognition, Voice disorders",

author = "Meredith Moore and {Demakethepalli Venkateswara}, Hemanth and Sethuraman Panchanathan",

note = "Funding Information: We offer our sincere gratitude to the National Science Foundation{\textquoteright}s Alliance for Person-Centered Accessible Technologies Interdisciplinary Graduate Education Traineeship (APAcT IGERT), as well as the National Science Foundation{\textquoteright}s Graduate Research Fellowship, without which, this research would not have been possible. Publisher Copyright: {\textcopyright} 2018 International Speech Communication Association. All rights reserved.; 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 ; Conference date: 02-09-2018 Through 06-09-2018",

year = "2018",

doi = "10.21437/Interspeech.2018-2391",

language = "English (US)",

volume = "2018-September",

pages = "466--470",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Whistle-blowing ASRs

T2 - 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018

AU - Moore, Meredith

AU - Demakethepalli Venkateswara, Hemanth

AU - Panchanathan, Sethuraman

N1 - Funding Information: We offer our sincere gratitude to the National Science Foundation’s Alliance for Person-Centered Accessible Technologies Interdisciplinary Graduate Education Traineeship (APAcT IGERT), as well as the National Science Foundation’s Graduate Research Fellowship, without which, this research would not have been possible. Publisher Copyright: © 2018 International Speech Communication Association. All rights reserved.

PY - 2018

Y1 - 2018

N2 - Speech is a complex process that can break in many different ways and lead to a variety of voice disorders. Dysarthria is a voice disorder where individuals are unable to control one or more of the aspects of speech'the articulation, breathing, voicing, or prosody'leading to less intelligible speech. In this paper, we evaluate the accuracy of state-of-the-art automatic speech recognition systems (ASRs) on two dysarthric speech datasets and compare the results to ASR performance on control speech. The limits of ASR performance using different voices have not been explored since the field has shifted from generative models of speech recognition to deep neural network architectures. To test how far the field has come in recognizing disordered speech, we test two different ASR systems: (1) Carnegie Mellon University's Sphinx Open Source Recognition, and (2) Google®Speech Recognition. While (1) uses generative models of speech recognition, (2) uses deep neural networks. As expected, while (2) achieved lower word error rates (WER) on dysarthric speech than (1), control speech had a WER 59% lower than dysarthric speech. Future studies should be focused not only on making ASRs robust to environmental noise, but also more robust to different voices.

AB - Speech is a complex process that can break in many different ways and lead to a variety of voice disorders. Dysarthria is a voice disorder where individuals are unable to control one or more of the aspects of speech'the articulation, breathing, voicing, or prosody'leading to less intelligible speech. In this paper, we evaluate the accuracy of state-of-the-art automatic speech recognition systems (ASRs) on two dysarthric speech datasets and compare the results to ASR performance on control speech. The limits of ASR performance using different voices have not been explored since the field has shifted from generative models of speech recognition to deep neural network architectures. To test how far the field has come in recognizing disordered speech, we test two different ASR systems: (1) Carnegie Mellon University's Sphinx Open Source Recognition, and (2) Google®Speech Recognition. While (1) uses generative models of speech recognition, (2) uses deep neural networks. As expected, while (2) achieved lower word error rates (WER) on dysarthric speech than (1), control speech had a WER 59% lower than dysarthric speech. Future studies should be focused not only on making ASRs robust to environmental noise, but also more robust to different voices.

KW - Dysarthric speech

KW - Speech recognition

KW - Voice disorders

UR - http://www.scopus.com/inward/record.url?scp=85054992004&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054992004&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2018-2391

DO - 10.21437/Interspeech.2018-2391

M3 - Conference article

AN - SCOPUS:85054992004

SN - 2308-457X

VL - 2018-September

SP - 466

EP - 470

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Y2 - 2 September 2018 through 6 September 2018

ER -

Whistle-blowing ASRs: Evaluating the need for more inclusive automatic speech recognition systems

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this