A multi-modal framework for emotion recognition using bag-of-words features and undirected, replicated softmax topic models is proposed here. Topic models ignore the temporal information between features, allowing them to capture the complex structure without a brute-force collection of statistics. Experiments are performed over face, speech and language features extracted from the USC IEMOCAP database. Performance on facial features yields an unweighted average recall of 60.71%, a relative improvement of 8.89% over state-of-the-art approaches. A comparable performance is achieved when considering only speech (57.39%) or a fusion of speech and face information (66.05%). Individually, each source is shown to be strong at recognizing either sadness (speech) or happiness (face) or neutral (language) emotions, while, a multi-modal fusion retains these properties and improves the accuracy to 68.92%. Implementation time for each source and their combination is provided. Results show that a turn of 1 second duration can be classified in approximately 666.65ms, thus making this method highly amenable for real-time implementation.

Original languageEnglish (US)
Title of host publication2014 IEEE International Symposium on Circuits and Systems, ISCAS 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages4
ISBN (Print)9781479934324
StatePublished - Jan 1 2014
Event2014 IEEE International Symposium on Circuits and Systems, ISCAS 2014 - Melbourne, VIC, Australia
Duration: Jun 1 2014Jun 5 2014

Publication series

NameProceedings - IEEE International Symposium on Circuits and Systems
ISSN (Print)0271-4310


Other2014 IEEE International Symposium on Circuits and Systems, ISCAS 2014
CityMelbourne, VIC

ASJC Scopus subject areas

  • Electrical and Electronic Engineering


Dive into the research topics of 'A multi-modal approach to emotion recognition using undirected topic models'. Together they form a unique fingerprint.

Cite this