Vocal emotion recognition with cochlear implants

Xin Luo, Qian Jie Fu, John J. Galvin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

Besides conveying linguistic information, spoken language can also transmit important cues regarding the emotion of a talker. These prosodic cues are most strongly coded by changes in amplitude, pitch, speech rate, voice quality and articulation. The present study investigated the ability of cochlear implant (CI) users to recognize vocal emotions, as well as the relative contributions of spectral and temporal cues to vocal emotion recognition. An English sentence database was recorded for the experiment; each test sentence was produced according to five target emotions. Vocal emotion recognition was tested in 6 CI and 6 normal-hearing (NH) subjects. With unprocessed speech, NH listeners' mean vocal emotion recognition performance was 90 % correct, while CI users' mean performance was only 45 % correct. Vocal emotion recognition was also measured in NH subjects while listening to acoustic, sine-wave vocoder CI simulations. To test the contribution of spectral cues to vocal emotion recognition, 1-, 2-, 4-, 8-and 16-channel CI processors were simulated; to test the contribution of temporal cues, the temporal envelope filter cutoff frequency in each channel was either 50 or 500 Hz. Results showed that both spectral and temporal cues significantly contributed to performance. With the 50-Hz envelope filter, performance generally improved as the number of spectral channels was increased. With the 500-Hz envelope filter, performance significantly improved only when the spectral resolution was increased from 1 to 2, and then from 2 to 16 channels. For all but the 16-channel simulations, increasing the envelope filter cutoff frequency from 50 Hz to 500 Hz significantly improved performance. CI users' vocal emotion recognition performance was statistically similar to that of NH subjects listening to 1 - 8 spectral channels with the 50-Hz envelope filter, and to 1 channel with the 500-Hz envelope filter. The results suggest that, while spectral cues may contribute more strongly to recognition of linguistic information, temporal cues may contribute more strongly to recognition of emotional content coded in spoken language.

Original languageEnglish (US)
Title of host publicationINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
PublisherInternational Speech Communication Association
Pages1830-1833
Number of pages4
ISBN (Print)9781604234497
StatePublished - Jan 1 2006
Externally publishedYes
EventINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP - Pittsburgh, PA, United States
Duration: Sep 17 2006Sep 21 2006

Publication series

NameINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Volume4

Other

OtherINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Country/TerritoryUnited States
CityPittsburgh, PA
Period9/17/069/21/06

Keywords

  • Cochlear implant
  • Vocal emotion recognition

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Vocal emotion recognition with cochlear implants'. Together they form a unique fingerprint.

Cite this