A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation

Mohit Shah; Lifeng Miao; Chaitali Chakrabarti; Andreas Spanias

doi:10.1109/ICASSP.2013.6638116

A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation

Mohit Shah, Lifeng Miao, Chaitali Chakrabarti, Andreas Spanias

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

18 Scopus citations

Abstract

In this paper, we present a speech-based emotion recognition framework based on a latent Dirichlet allocation model. This method assumes that incoming speech frames are conditionally independent and exchangeable. While this leads to a loss of temporal structure, it is able to capture significant statistical information between frames. In contrast, a hidden Markov model-based approach captures the temporal structure in speech. Using the German emotional speech database EMO-DB for evaluation, we achieve an average classification accuracy of 80.7% compared to 73% for hidden Markov models. This improvement is achieved at the cost of a slight increase in computational complexity. We map the proposed algorithm onto an FPGA platform and show that emotions in a speech utterance of duration 1.5s can be identified in 1.8ms, while utilizing 70% of the resources. This further demonstrates the suitability of our approach for real-time applications on hand-held devices.

Original language	English (US)
Title of host publication	2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
Pages	2553-2557
Number of pages	5
DOIs	https://doi.org/10.1109/ICASSP.2013.6638116
State	Published - Oct 18 2013
Event	2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, Canada Duration: May 26 2013 → May 31 2013

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)	1520-6149

Other

Other	2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
Country/Territory	Canada
City	Vancouver, BC
Period	5/26/13 → 5/31/13

Keywords

FPGA implementation
affective computing
emotion recognition
latent Dirichlet allocation

ASJC Scopus subject areas

Software
Signal Processing
Electrical and Electronic Engineering

Access to Document

10.1109/ICASSP.2013.6638116

Cite this

Shah, M., Miao, L., Chakrabarti, C., & Spanias, A. (2013). A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation. In 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings (pp. 2553-2557). Article 6638116 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2013.6638116

A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation. / Shah, Mohit; Miao, Lifeng; Chakrabarti, Chaitali et al.
2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings. 2013. p. 2553-2557 6638116 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Shah, M, Miao, L, Chakrabarti, C & Spanias, A 2013, A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation. in 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings., 6638116, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 2553-2557, 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, 5/26/13. https://doi.org/10.1109/ICASSP.2013.6638116

Shah M, Miao L, Chakrabarti C , Spanias A. A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation. In 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings. 2013. p. 2553-2557. 6638116. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2013.6638116

Shah, Mohit ; Miao, Lifeng ; Chakrabarti, Chaitali et al. / A speech emotion recognition framework based on latent Dirichlet allocation : Algorithm and FPGA implementation. 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings. 2013. pp. 2553-2557 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{33fe42248b874da3a2f916399595f8f5,

title = "A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation",

abstract = "In this paper, we present a speech-based emotion recognition framework based on a latent Dirichlet allocation model. This method assumes that incoming speech frames are conditionally independent and exchangeable. While this leads to a loss of temporal structure, it is able to capture significant statistical information between frames. In contrast, a hidden Markov model-based approach captures the temporal structure in speech. Using the German emotional speech database EMO-DB for evaluation, we achieve an average classification accuracy of 80.7% compared to 73% for hidden Markov models. This improvement is achieved at the cost of a slight increase in computational complexity. We map the proposed algorithm onto an FPGA platform and show that emotions in a speech utterance of duration 1.5s can be identified in 1.8ms, while utilizing 70% of the resources. This further demonstrates the suitability of our approach for real-time applications on hand-held devices.",

keywords = "FPGA implementation, affective computing, emotion recognition, latent Dirichlet allocation",

author = "Mohit Shah and Lifeng Miao and Chaitali Chakrabarti and Andreas Spanias",

year = "2013",

month = oct,

day = "18",

doi = "10.1109/ICASSP.2013.6638116",

language = "English (US)",

isbn = "9781479903566",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

pages = "2553--2557",

booktitle = "2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings",

note = "2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 ; Conference date: 26-05-2013 Through 31-05-2013",

}

TY - GEN

T1 - A speech emotion recognition framework based on latent Dirichlet allocation

T2 - 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013

AU - Shah, Mohit

AU - Miao, Lifeng

AU - Chakrabarti, Chaitali

AU - Spanias, Andreas

PY - 2013/10/18

Y1 - 2013/10/18

N2 - In this paper, we present a speech-based emotion recognition framework based on a latent Dirichlet allocation model. This method assumes that incoming speech frames are conditionally independent and exchangeable. While this leads to a loss of temporal structure, it is able to capture significant statistical information between frames. In contrast, a hidden Markov model-based approach captures the temporal structure in speech. Using the German emotional speech database EMO-DB for evaluation, we achieve an average classification accuracy of 80.7% compared to 73% for hidden Markov models. This improvement is achieved at the cost of a slight increase in computational complexity. We map the proposed algorithm onto an FPGA platform and show that emotions in a speech utterance of duration 1.5s can be identified in 1.8ms, while utilizing 70% of the resources. This further demonstrates the suitability of our approach for real-time applications on hand-held devices.

AB - In this paper, we present a speech-based emotion recognition framework based on a latent Dirichlet allocation model. This method assumes that incoming speech frames are conditionally independent and exchangeable. While this leads to a loss of temporal structure, it is able to capture significant statistical information between frames. In contrast, a hidden Markov model-based approach captures the temporal structure in speech. Using the German emotional speech database EMO-DB for evaluation, we achieve an average classification accuracy of 80.7% compared to 73% for hidden Markov models. This improvement is achieved at the cost of a slight increase in computational complexity. We map the proposed algorithm onto an FPGA platform and show that emotions in a speech utterance of duration 1.5s can be identified in 1.8ms, while utilizing 70% of the resources. This further demonstrates the suitability of our approach for real-time applications on hand-held devices.

KW - FPGA implementation

KW - affective computing

KW - emotion recognition

KW - latent Dirichlet allocation

UR - http://www.scopus.com/inward/record.url?scp=84890452441&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890452441&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2013.6638116

DO - 10.1109/ICASSP.2013.6638116

M3 - Conference contribution

AN - SCOPUS:84890452441

SN - 9781479903566

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 2553

EP - 2557

BT - 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings

Y2 - 26 May 2013 through 31 May 2013

ER -

A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this