Abstract

A multi-modal framework for emotion recognition using bag-of-words features and undirected, replicated softmax topic models is proposed here. Topic models ignore the temporal information between features, allowing them to capture the complex structure without a brute-force collection of statistics. Experiments are performed over face, speech and language features extracted from the USC IEMOCAP database. Performance on facial features yields an unweighted average recall of 60.71%, a relative improvement of 8.89% over state-of-the-art approaches. A comparable performance is achieved when considering only speech (57.39%) or a fusion of speech and face information (66.05%). Individually, each source is shown to be strong at recognizing either sadness (speech) or happiness (face) or neutral (language) emotions, while, a multi-modal fusion retains these properties and improves the accuracy to 68.92%. Implementation time for each source and their combination is provided. Results show that a turn of 1 second duration can be classified in approximately 666.65ms, thus making this method highly amenable for real-time implementation.

Original languageEnglish (US)
Title of host publicationProceedings - IEEE International Symposium on Circuits and Systems
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages754-757
Number of pages4
ISBN (Print)9781479934324
DOIs
StatePublished - 2014
Event2014 IEEE International Symposium on Circuits and Systems, ISCAS 2014 - Melbourne, VIC, Australia
Duration: Jun 1 2014Jun 5 2014

Other

Other2014 IEEE International Symposium on Circuits and Systems, ISCAS 2014
CountryAustralia
CityMelbourne, VIC
Period6/1/146/5/14

Fingerprint

Fusion reactions
Statistics
Experiments

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Cite this

Shah, M., Chakrabarti, C., & Spanias, A. (2014). A multi-modal approach to emotion recognition using undirected topic models. In Proceedings - IEEE International Symposium on Circuits and Systems (pp. 754-757). [6865245] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISCAS.2014.6865245

A multi-modal approach to emotion recognition using undirected topic models. / Shah, Mohit; Chakrabarti, Chaitali; Spanias, Andreas.

Proceedings - IEEE International Symposium on Circuits and Systems. Institute of Electrical and Electronics Engineers Inc., 2014. p. 754-757 6865245.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shah, M, Chakrabarti, C & Spanias, A 2014, A multi-modal approach to emotion recognition using undirected topic models. in Proceedings - IEEE International Symposium on Circuits and Systems., 6865245, Institute of Electrical and Electronics Engineers Inc., pp. 754-757, 2014 IEEE International Symposium on Circuits and Systems, ISCAS 2014, Melbourne, VIC, Australia, 6/1/14. https://doi.org/10.1109/ISCAS.2014.6865245
Shah M, Chakrabarti C, Spanias A. A multi-modal approach to emotion recognition using undirected topic models. In Proceedings - IEEE International Symposium on Circuits and Systems. Institute of Electrical and Electronics Engineers Inc. 2014. p. 754-757. 6865245 https://doi.org/10.1109/ISCAS.2014.6865245
Shah, Mohit ; Chakrabarti, Chaitali ; Spanias, Andreas. / A multi-modal approach to emotion recognition using undirected topic models. Proceedings - IEEE International Symposium on Circuits and Systems. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 754-757
@inproceedings{12338d0063b74ccc933eb70c3bddb2b3,
title = "A multi-modal approach to emotion recognition using undirected topic models",
abstract = "A multi-modal framework for emotion recognition using bag-of-words features and undirected, replicated softmax topic models is proposed here. Topic models ignore the temporal information between features, allowing them to capture the complex structure without a brute-force collection of statistics. Experiments are performed over face, speech and language features extracted from the USC IEMOCAP database. Performance on facial features yields an unweighted average recall of 60.71{\%}, a relative improvement of 8.89{\%} over state-of-the-art approaches. A comparable performance is achieved when considering only speech (57.39{\%}) or a fusion of speech and face information (66.05{\%}). Individually, each source is shown to be strong at recognizing either sadness (speech) or happiness (face) or neutral (language) emotions, while, a multi-modal fusion retains these properties and improves the accuracy to 68.92{\%}. Implementation time for each source and their combination is provided. Results show that a turn of 1 second duration can be classified in approximately 666.65ms, thus making this method highly amenable for real-time implementation.",
author = "Mohit Shah and Chaitali Chakrabarti and Andreas Spanias",
year = "2014",
doi = "10.1109/ISCAS.2014.6865245",
language = "English (US)",
isbn = "9781479934324",
pages = "754--757",
booktitle = "Proceedings - IEEE International Symposium on Circuits and Systems",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - A multi-modal approach to emotion recognition using undirected topic models

AU - Shah, Mohit

AU - Chakrabarti, Chaitali

AU - Spanias, Andreas

PY - 2014

Y1 - 2014

N2 - A multi-modal framework for emotion recognition using bag-of-words features and undirected, replicated softmax topic models is proposed here. Topic models ignore the temporal information between features, allowing them to capture the complex structure without a brute-force collection of statistics. Experiments are performed over face, speech and language features extracted from the USC IEMOCAP database. Performance on facial features yields an unweighted average recall of 60.71%, a relative improvement of 8.89% over state-of-the-art approaches. A comparable performance is achieved when considering only speech (57.39%) or a fusion of speech and face information (66.05%). Individually, each source is shown to be strong at recognizing either sadness (speech) or happiness (face) or neutral (language) emotions, while, a multi-modal fusion retains these properties and improves the accuracy to 68.92%. Implementation time for each source and their combination is provided. Results show that a turn of 1 second duration can be classified in approximately 666.65ms, thus making this method highly amenable for real-time implementation.

AB - A multi-modal framework for emotion recognition using bag-of-words features and undirected, replicated softmax topic models is proposed here. Topic models ignore the temporal information between features, allowing them to capture the complex structure without a brute-force collection of statistics. Experiments are performed over face, speech and language features extracted from the USC IEMOCAP database. Performance on facial features yields an unweighted average recall of 60.71%, a relative improvement of 8.89% over state-of-the-art approaches. A comparable performance is achieved when considering only speech (57.39%) or a fusion of speech and face information (66.05%). Individually, each source is shown to be strong at recognizing either sadness (speech) or happiness (face) or neutral (language) emotions, while, a multi-modal fusion retains these properties and improves the accuracy to 68.92%. Implementation time for each source and their combination is provided. Results show that a turn of 1 second duration can be classified in approximately 666.65ms, thus making this method highly amenable for real-time implementation.

UR - http://www.scopus.com/inward/record.url?scp=84907409936&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84907409936&partnerID=8YFLogxK

U2 - 10.1109/ISCAS.2014.6865245

DO - 10.1109/ISCAS.2014.6865245

M3 - Conference contribution

AN - SCOPUS:84907409936

SN - 9781479934324

SP - 754

EP - 757

BT - Proceedings - IEEE International Symposium on Circuits and Systems

PB - Institute of Electrical and Electronics Engineers Inc.

ER -