Abstract

Improved performance in speech applications using deep neural networks (DNNs) has come at the expense of reduced model interpretability. For consumer applications this is not a problem; however, for health applications, clinicians must be able to interpret why a predictive model made the decision that it did. In this paper, we propose an interpretable model for objective assessment of dysarthric speech for speech therapy applications based on DNNs. Our model aims to predict a general impression of the severity of the speech disorder; however, instead of directly generating a severity prediction from a highdimensional input acoustic feature space, we add an intermediate interpretable layer that acts as a bottle-neck feature extractor and constrains the solution space of the DNNs. During inference, the model provides an estimate of severity at the output of the network and a set of explanatory features from the intermediate layer of the network that explain the final decision. We evaluate the performance of the model on a dysarthric speech dataset and show that the proposed model provides an interpretable output that is highly correlated with the subjective evaluation of Speech-Language Pathologists (SLPs).

Original languageEnglish (US)
Pages (from-to)1849-1853
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2017-August
DOIs
Publication statusPublished - Jan 1 2017

    Fingerprint

Keywords

  • Deep neural networks
  • Dysarthric speech
  • Model interpretability
  • Objective assessment

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Cite this