A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation

Mohit Shah, Lifeng Miao, Chaitali Chakrabarti, Andreas Spanias

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

In this paper, we present a speech-based emotion recognition framework based on a latent Dirichlet allocation model. This method assumes that incoming speech frames are conditionally independent and exchangeable. While this leads to a loss of temporal structure, it is able to capture significant statistical information between frames. In contrast, a hidden Markov model-based approach captures the temporal structure in speech. Using the German emotional speech database EMO-DB for evaluation, we achieve an average classification accuracy of 80.7% compared to 73% for hidden Markov models. This improvement is achieved at the cost of a slight increase in computational complexity. We map the proposed algorithm onto an FPGA platform and show that emotions in a speech utterance of duration 1.5s can be identified in 1.8ms, while utilizing 70% of the resources. This further demonstrates the suitability of our approach for real-time applications on hand-held devices.

Original languageEnglish (US)
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Pages2553-2557
Number of pages5
DOIs
StatePublished - Oct 18 2013
Event2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, Canada
Duration: May 26 2013May 31 2013

Other

Other2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
CountryCanada
CityVancouver, BC
Period5/26/135/31/13

Fingerprint

Speech recognition
Field programmable gate arrays (FPGA)
Hidden Markov models
Computational complexity

Keywords

  • affective computing
  • emotion recognition
  • FPGA implementation
  • latent Dirichlet allocation

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Shah, M., Miao, L., Chakrabarti, C., & Spanias, A. (2013). A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (pp. 2553-2557). [6638116] https://doi.org/10.1109/ICASSP.2013.6638116

A speech emotion recognition framework based on latent Dirichlet allocation : Algorithm and FPGA implementation. / Shah, Mohit; Miao, Lifeng; Chakrabarti, Chaitali; Spanias, Andreas.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2013. p. 2553-2557 6638116.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shah, M, Miao, L, Chakrabarti, C & Spanias, A 2013, A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings., 6638116, pp. 2553-2557, 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, 5/26/13. https://doi.org/10.1109/ICASSP.2013.6638116
Shah M, Miao L, Chakrabarti C, Spanias A. A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2013. p. 2553-2557. 6638116 https://doi.org/10.1109/ICASSP.2013.6638116
Shah, Mohit ; Miao, Lifeng ; Chakrabarti, Chaitali ; Spanias, Andreas. / A speech emotion recognition framework based on latent Dirichlet allocation : Algorithm and FPGA implementation. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2013. pp. 2553-2557
@inproceedings{33fe42248b874da3a2f916399595f8f5,
title = "A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation",
abstract = "In this paper, we present a speech-based emotion recognition framework based on a latent Dirichlet allocation model. This method assumes that incoming speech frames are conditionally independent and exchangeable. While this leads to a loss of temporal structure, it is able to capture significant statistical information between frames. In contrast, a hidden Markov model-based approach captures the temporal structure in speech. Using the German emotional speech database EMO-DB for evaluation, we achieve an average classification accuracy of 80.7{\%} compared to 73{\%} for hidden Markov models. This improvement is achieved at the cost of a slight increase in computational complexity. We map the proposed algorithm onto an FPGA platform and show that emotions in a speech utterance of duration 1.5s can be identified in 1.8ms, while utilizing 70{\%} of the resources. This further demonstrates the suitability of our approach for real-time applications on hand-held devices.",
keywords = "affective computing, emotion recognition, FPGA implementation, latent Dirichlet allocation",
author = "Mohit Shah and Lifeng Miao and Chaitali Chakrabarti and Andreas Spanias",
year = "2013",
month = "10",
day = "18",
doi = "10.1109/ICASSP.2013.6638116",
language = "English (US)",
isbn = "9781479903566",
pages = "2553--2557",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

}

TY - GEN

T1 - A speech emotion recognition framework based on latent Dirichlet allocation

T2 - Algorithm and FPGA implementation

AU - Shah, Mohit

AU - Miao, Lifeng

AU - Chakrabarti, Chaitali

AU - Spanias, Andreas

PY - 2013/10/18

Y1 - 2013/10/18

N2 - In this paper, we present a speech-based emotion recognition framework based on a latent Dirichlet allocation model. This method assumes that incoming speech frames are conditionally independent and exchangeable. While this leads to a loss of temporal structure, it is able to capture significant statistical information between frames. In contrast, a hidden Markov model-based approach captures the temporal structure in speech. Using the German emotional speech database EMO-DB for evaluation, we achieve an average classification accuracy of 80.7% compared to 73% for hidden Markov models. This improvement is achieved at the cost of a slight increase in computational complexity. We map the proposed algorithm onto an FPGA platform and show that emotions in a speech utterance of duration 1.5s can be identified in 1.8ms, while utilizing 70% of the resources. This further demonstrates the suitability of our approach for real-time applications on hand-held devices.

AB - In this paper, we present a speech-based emotion recognition framework based on a latent Dirichlet allocation model. This method assumes that incoming speech frames are conditionally independent and exchangeable. While this leads to a loss of temporal structure, it is able to capture significant statistical information between frames. In contrast, a hidden Markov model-based approach captures the temporal structure in speech. Using the German emotional speech database EMO-DB for evaluation, we achieve an average classification accuracy of 80.7% compared to 73% for hidden Markov models. This improvement is achieved at the cost of a slight increase in computational complexity. We map the proposed algorithm onto an FPGA platform and show that emotions in a speech utterance of duration 1.5s can be identified in 1.8ms, while utilizing 70% of the resources. This further demonstrates the suitability of our approach for real-time applications on hand-held devices.

KW - affective computing

KW - emotion recognition

KW - FPGA implementation

KW - latent Dirichlet allocation

UR - http://www.scopus.com/inward/record.url?scp=84890452441&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890452441&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2013.6638116

DO - 10.1109/ICASSP.2013.6638116

M3 - Conference contribution

AN - SCOPUS:84890452441

SN - 9781479903566

SP - 2553

EP - 2557

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

ER -