Online speaking rate estimation using recurrent neural networks

Yishan Jiao, Ming Tu, Visar Berisha, Julie Liss

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

A reliable online speaking rate estimation tool is useful in many domains, including speech recognition, speech therapy intervention, speaker identification, etc. This paper proposes an online speaking rate estimation model based on recurrent neural networks (RNNs). Speaking rate is a long-term feature of speech, which depends on how many syllables were spoken over an extended time window (seconds). We posit that since RNNs can capture long-term dependencies through the memory of previous hidden states, they are a good match for the speaking rate estimation task. Here we train a long short-term memory (LSTM) RNN on a set of speech features that are known to correlate with speech rhythm. An evaluation on spontaneous speech shows that the method yields a higher correlation between the estimated rate and the ground-truth rate when compared to the state-of-the-art alternatives. The evaluation on longitudinal pathological speech shows that the proposed method can capture long-term and short-term changes in speaking rate.

Original languageEnglish (US)
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5245-5249
Number of pages5
Volume2016-May
ISBN (Electronic)9781479999880
DOIs
StatePublished - May 18 2016
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
Duration: Mar 20 2016Mar 25 2016

Other

Other41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
CountryChina
CityShanghai
Period3/20/163/25/16

Fingerprint

Recurrent neural networks
Speech recognition
Data storage equipment

Keywords

  • clinical tool
  • recurrent neural networks
  • speaking rate estimation

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Jiao, Y., Tu, M., Berisha, V., & Liss, J. (2016). Online speaking rate estimation using recurrent neural networks. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings (Vol. 2016-May, pp. 5245-5249). [7472678] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2016.7472678

Online speaking rate estimation using recurrent neural networks. / Jiao, Yishan; Tu, Ming; Berisha, Visar; Liss, Julie.

2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May Institute of Electrical and Electronics Engineers Inc., 2016. p. 5245-5249 7472678.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Jiao, Y, Tu, M, Berisha, V & Liss, J 2016, Online speaking rate estimation using recurrent neural networks. in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. vol. 2016-May, 7472678, Institute of Electrical and Electronics Engineers Inc., pp. 5245-5249, 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, 3/20/16. https://doi.org/10.1109/ICASSP.2016.7472678
Jiao Y, Tu M, Berisha V, Liss J. Online speaking rate estimation using recurrent neural networks. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May. Institute of Electrical and Electronics Engineers Inc. 2016. p. 5245-5249. 7472678 https://doi.org/10.1109/ICASSP.2016.7472678
Jiao, Yishan ; Tu, Ming ; Berisha, Visar ; Liss, Julie. / Online speaking rate estimation using recurrent neural networks. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May Institute of Electrical and Electronics Engineers Inc., 2016. pp. 5245-5249
@inproceedings{352ba511e1a54558b8cbda9ffcb42a81,
title = "Online speaking rate estimation using recurrent neural networks",
abstract = "A reliable online speaking rate estimation tool is useful in many domains, including speech recognition, speech therapy intervention, speaker identification, etc. This paper proposes an online speaking rate estimation model based on recurrent neural networks (RNNs). Speaking rate is a long-term feature of speech, which depends on how many syllables were spoken over an extended time window (seconds). We posit that since RNNs can capture long-term dependencies through the memory of previous hidden states, they are a good match for the speaking rate estimation task. Here we train a long short-term memory (LSTM) RNN on a set of speech features that are known to correlate with speech rhythm. An evaluation on spontaneous speech shows that the method yields a higher correlation between the estimated rate and the ground-truth rate when compared to the state-of-the-art alternatives. The evaluation on longitudinal pathological speech shows that the proposed method can capture long-term and short-term changes in speaking rate.",
keywords = "clinical tool, recurrent neural networks, speaking rate estimation",
author = "Yishan Jiao and Ming Tu and Visar Berisha and Julie Liss",
year = "2016",
month = "5",
day = "18",
doi = "10.1109/ICASSP.2016.7472678",
language = "English (US)",
volume = "2016-May",
pages = "5245--5249",
booktitle = "2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Online speaking rate estimation using recurrent neural networks

AU - Jiao, Yishan

AU - Tu, Ming

AU - Berisha, Visar

AU - Liss, Julie

PY - 2016/5/18

Y1 - 2016/5/18

N2 - A reliable online speaking rate estimation tool is useful in many domains, including speech recognition, speech therapy intervention, speaker identification, etc. This paper proposes an online speaking rate estimation model based on recurrent neural networks (RNNs). Speaking rate is a long-term feature of speech, which depends on how many syllables were spoken over an extended time window (seconds). We posit that since RNNs can capture long-term dependencies through the memory of previous hidden states, they are a good match for the speaking rate estimation task. Here we train a long short-term memory (LSTM) RNN on a set of speech features that are known to correlate with speech rhythm. An evaluation on spontaneous speech shows that the method yields a higher correlation between the estimated rate and the ground-truth rate when compared to the state-of-the-art alternatives. The evaluation on longitudinal pathological speech shows that the proposed method can capture long-term and short-term changes in speaking rate.

AB - A reliable online speaking rate estimation tool is useful in many domains, including speech recognition, speech therapy intervention, speaker identification, etc. This paper proposes an online speaking rate estimation model based on recurrent neural networks (RNNs). Speaking rate is a long-term feature of speech, which depends on how many syllables were spoken over an extended time window (seconds). We posit that since RNNs can capture long-term dependencies through the memory of previous hidden states, they are a good match for the speaking rate estimation task. Here we train a long short-term memory (LSTM) RNN on a set of speech features that are known to correlate with speech rhythm. An evaluation on spontaneous speech shows that the method yields a higher correlation between the estimated rate and the ground-truth rate when compared to the state-of-the-art alternatives. The evaluation on longitudinal pathological speech shows that the proposed method can capture long-term and short-term changes in speaking rate.

KW - clinical tool

KW - recurrent neural networks

KW - speaking rate estimation

UR - http://www.scopus.com/inward/record.url?scp=84973368961&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84973368961&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2016.7472678

DO - 10.1109/ICASSP.2016.7472678

M3 - Conference contribution

AN - SCOPUS:84973368961

VL - 2016-May

SP - 5245

EP - 5249

BT - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -