Vocal emotion recognition with cochlear implants

Xin Luo; Qian Jie Fu; John J. Galvin

Vocal emotion recognition with cochlear implants

Xin Luo, Qian Jie Fu, John J. Galvin

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Besides conveying linguistic information, spoken language can also transmit important cues regarding the emotion of a talker. These prosodic cues are most strongly coded by changes in amplitude, pitch, speech rate, voice quality and articulation. The present study investigated the ability of cochlear implant (CI) users to recognize vocal emotions, as well as the relative contributions of spectral and temporal cues to vocal emotion recognition. An English sentence database was recorded for the experiment; each test sentence was produced according to five target emotions. Vocal emotion recognition was tested in 6 CI and 6 normal-hearing (NH) subjects. With unprocessed speech, NH listeners' mean vocal emotion recognition performance was 90 % correct, while CI users' mean performance was only 45 % correct. Vocal emotion recognition was also measured in NH subjects while listening to acoustic, sine-wave vocoder CI simulations. To test the contribution of spectral cues to vocal emotion recognition, 1-, 2-, 4-, 8-and 16-channel CI processors were simulated; to test the contribution of temporal cues, the temporal envelope filter cutoff frequency in each channel was either 50 or 500 Hz. Results showed that both spectral and temporal cues significantly contributed to performance. With the 50-Hz envelope filter, performance generally improved as the number of spectral channels was increased. With the 500-Hz envelope filter, performance significantly improved only when the spectral resolution was increased from 1 to 2, and then from 2 to 16 channels. For all but the 16-channel simulations, increasing the envelope filter cutoff frequency from 50 Hz to 500 Hz significantly improved performance. CI users' vocal emotion recognition performance was statistically similar to that of NH subjects listening to 1 - 8 spectral channels with the 50-Hz envelope filter, and to 1 channel with the 500-Hz envelope filter. The results suggest that, while spectral cues may contribute more strongly to recognition of linguistic information, temporal cues may contribute more strongly to recognition of emotional content coded in spoken language.

Original language	English (US)
Title of host publication	INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Publisher	International Speech Communication Association
Pages	1830-1833
Number of pages	4
ISBN (Print)	9781604234497
State	Published - Jan 1 2006
Externally published	Yes
Event	INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP - Pittsburgh, PA, United States Duration: Sep 17 2006 → Sep 21 2006

Publication series

Name	INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Volume	4

Other

Other	INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Country/Territory	United States
City	Pittsburgh, PA
Period	9/17/06 → 9/21/06

Keywords

Cochlear implant
Vocal emotion recognition

ASJC Scopus subject areas

General Computer Science

Cite this

Luo, X., Fu, Q. J., & Galvin, J. J. (2006). Vocal emotion recognition with cochlear implants. In INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP (pp. 1830-1833). (INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP; Vol. 4). International Speech Communication Association.

Vocal emotion recognition with cochlear implants. / Luo, Xin; Fu, Qian Jie; Galvin, John J.
INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. International Speech Communication Association, 2006. p. 1830-1833 (INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP; Vol. 4).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Luo, X, Fu, QJ & Galvin, JJ 2006, Vocal emotion recognition with cochlear implants. in INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP, vol. 4, International Speech Communication Association, pp. 1830-1833, INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP, Pittsburgh, PA, United States, 9/17/06.

Luo, Xin ; Fu, Qian Jie ; Galvin, John J. / Vocal emotion recognition with cochlear implants. INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. International Speech Communication Association, 2006. pp. 1830-1833 (INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP).

@inproceedings{ed342d25f1cf4b50914d7f286e73bbef,

title = "Vocal emotion recognition with cochlear implants",

abstract = "Besides conveying linguistic information, spoken language can also transmit important cues regarding the emotion of a talker. These prosodic cues are most strongly coded by changes in amplitude, pitch, speech rate, voice quality and articulation. The present study investigated the ability of cochlear implant (CI) users to recognize vocal emotions, as well as the relative contributions of spectral and temporal cues to vocal emotion recognition. An English sentence database was recorded for the experiment; each test sentence was produced according to five target emotions. Vocal emotion recognition was tested in 6 CI and 6 normal-hearing (NH) subjects. With unprocessed speech, NH listeners' mean vocal emotion recognition performance was 90 % correct, while CI users' mean performance was only 45 % correct. Vocal emotion recognition was also measured in NH subjects while listening to acoustic, sine-wave vocoder CI simulations. To test the contribution of spectral cues to vocal emotion recognition, 1-, 2-, 4-, 8-and 16-channel CI processors were simulated; to test the contribution of temporal cues, the temporal envelope filter cutoff frequency in each channel was either 50 or 500 Hz. Results showed that both spectral and temporal cues significantly contributed to performance. With the 50-Hz envelope filter, performance generally improved as the number of spectral channels was increased. With the 500-Hz envelope filter, performance significantly improved only when the spectral resolution was increased from 1 to 2, and then from 2 to 16 channels. For all but the 16-channel simulations, increasing the envelope filter cutoff frequency from 50 Hz to 500 Hz significantly improved performance. CI users' vocal emotion recognition performance was statistically similar to that of NH subjects listening to 1 - 8 spectral channels with the 50-Hz envelope filter, and to 1 channel with the 500-Hz envelope filter. The results suggest that, while spectral cues may contribute more strongly to recognition of linguistic information, temporal cues may contribute more strongly to recognition of emotional content coded in spoken language.",

keywords = "Cochlear implant, Vocal emotion recognition",

author = "Xin Luo and Fu, {Qian Jie} and Galvin, {John J.}",

year = "2006",

month = jan,

day = "1",

language = "English (US)",

isbn = "9781604234497",

series = "INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP",

publisher = "International Speech Communication Association",

pages = "1830--1833",

booktitle = "INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP",

note = "INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP ; Conference date: 17-09-2006 Through 21-09-2006",

}

TY - GEN

T1 - Vocal emotion recognition with cochlear implants

AU - Luo, Xin

AU - Fu, Qian Jie

AU - Galvin, John J.

PY - 2006/1/1

Y1 - 2006/1/1

N2 - Besides conveying linguistic information, spoken language can also transmit important cues regarding the emotion of a talker. These prosodic cues are most strongly coded by changes in amplitude, pitch, speech rate, voice quality and articulation. The present study investigated the ability of cochlear implant (CI) users to recognize vocal emotions, as well as the relative contributions of spectral and temporal cues to vocal emotion recognition. An English sentence database was recorded for the experiment; each test sentence was produced according to five target emotions. Vocal emotion recognition was tested in 6 CI and 6 normal-hearing (NH) subjects. With unprocessed speech, NH listeners' mean vocal emotion recognition performance was 90 % correct, while CI users' mean performance was only 45 % correct. Vocal emotion recognition was also measured in NH subjects while listening to acoustic, sine-wave vocoder CI simulations. To test the contribution of spectral cues to vocal emotion recognition, 1-, 2-, 4-, 8-and 16-channel CI processors were simulated; to test the contribution of temporal cues, the temporal envelope filter cutoff frequency in each channel was either 50 or 500 Hz. Results showed that both spectral and temporal cues significantly contributed to performance. With the 50-Hz envelope filter, performance generally improved as the number of spectral channels was increased. With the 500-Hz envelope filter, performance significantly improved only when the spectral resolution was increased from 1 to 2, and then from 2 to 16 channels. For all but the 16-channel simulations, increasing the envelope filter cutoff frequency from 50 Hz to 500 Hz significantly improved performance. CI users' vocal emotion recognition performance was statistically similar to that of NH subjects listening to 1 - 8 spectral channels with the 50-Hz envelope filter, and to 1 channel with the 500-Hz envelope filter. The results suggest that, while spectral cues may contribute more strongly to recognition of linguistic information, temporal cues may contribute more strongly to recognition of emotional content coded in spoken language.

AB - Besides conveying linguistic information, spoken language can also transmit important cues regarding the emotion of a talker. These prosodic cues are most strongly coded by changes in amplitude, pitch, speech rate, voice quality and articulation. The present study investigated the ability of cochlear implant (CI) users to recognize vocal emotions, as well as the relative contributions of spectral and temporal cues to vocal emotion recognition. An English sentence database was recorded for the experiment; each test sentence was produced according to five target emotions. Vocal emotion recognition was tested in 6 CI and 6 normal-hearing (NH) subjects. With unprocessed speech, NH listeners' mean vocal emotion recognition performance was 90 % correct, while CI users' mean performance was only 45 % correct. Vocal emotion recognition was also measured in NH subjects while listening to acoustic, sine-wave vocoder CI simulations. To test the contribution of spectral cues to vocal emotion recognition, 1-, 2-, 4-, 8-and 16-channel CI processors were simulated; to test the contribution of temporal cues, the temporal envelope filter cutoff frequency in each channel was either 50 or 500 Hz. Results showed that both spectral and temporal cues significantly contributed to performance. With the 50-Hz envelope filter, performance generally improved as the number of spectral channels was increased. With the 500-Hz envelope filter, performance significantly improved only when the spectral resolution was increased from 1 to 2, and then from 2 to 16 channels. For all but the 16-channel simulations, increasing the envelope filter cutoff frequency from 50 Hz to 500 Hz significantly improved performance. CI users' vocal emotion recognition performance was statistically similar to that of NH subjects listening to 1 - 8 spectral channels with the 50-Hz envelope filter, and to 1 channel with the 500-Hz envelope filter. The results suggest that, while spectral cues may contribute more strongly to recognition of linguistic information, temporal cues may contribute more strongly to recognition of emotional content coded in spoken language.

KW - Cochlear implant

KW - Vocal emotion recognition

UR - http://www.scopus.com/inward/record.url?scp=34547613617&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34547613617&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:34547613617

SN - 9781604234497

T3 - INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP

SP - 1830

EP - 1833

BT - INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP

PB - International Speech Communication Association

T2 - INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP

Y2 - 17 September 2006 through 21 September 2006

ER -

Vocal emotion recognition with cochlear implants

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this