Human Frequency Following Responses to Vocoded Speech

Saradha Ananthakrishnan, Xin Luo, Ananthanarayan Krishnan

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

OBJECTIVES: Vocoders offer an effective platform to simulate the effects of cochlear implant speech processing strategies in normal-hearing listeners. Several behavioral studies have examined the effects of varying spectral and temporal cues on vocoded speech perception; however, little is known about the neural indices of vocoded speech perception. Here, the scalp-recorded frequency following response (FFR) was used to study the effects of varying spectral and temporal cues on brainstem neural representation of specific acoustic cues, the temporal envelope periodicity related to fundamental frequency (F0) and temporal fine structure (TFS) related to formant and formant-related frequencies, as reflected in the phase-locked neural activity in response to vocoded speech.

DESIGN: In experiment 1, FFRs were measured in 12 normal-hearing, adult listeners in response to a steady state English back vowel /u/ presented in an unaltered, unprocessed condition and six sine-vocoder conditions with varying numbers of channels (1, 2, 4, 8, 16, and 32), while the temporal envelope cutoff frequency was fixed at 500 Hz. In experiment 2, FFRs were obtained from 14 normal-hearing, adult listeners in response to the same English vowel /u/, presented in an unprocessed condition and four vocoded conditions where both the temporal envelope cutoff frequency (50 versus 500 Hz) and carrier type (sine wave versus noise band) were varied separately with the number of channels fixed at 8. Fast Fourier Transform was applied to the time waveforms of FFR to analyze the strength of brainstem neural representation of temporal envelope periodicity (F0) and TFS-related peaks (formant structure).

RESULTS: Brainstem neural representation of both temporal envelope and TFS cues improved when the number of channels increased from 1 to 4, followed by a plateau with 8 and 16 channels, and a reduction in phase-locking strength with 32 channels. For the sine vocoders, peaks in the FFRTFS spectra corresponded with the low-frequency sine-wave carriers and side band frequencies in the stimulus spectra. When the temporal envelope cutoff frequency increased from 50 to 500 Hz, an improvement was observed in brainstem F0 representation with no change in brainstem representation of spectral peaks proximal to the first formant frequency (F1). There was no significant effect of carrier type (sine- versus noise-vocoder) on brainstem neural representation of F0 cues when the temporal envelope cutoff frequency was 500 Hz.

CONCLUSIONS: While the improvement in neural representation of temporal envelope and TFS cues with up to 4 vocoder channels is consistent with the behavioral literature, the reduced neural phase-locking strength noted with even more channels may be because of the narrow bandwidth of each channel as the number of channels increases. Stronger neural representation of temporal envelope cues with higher temporal envelope cutoff frequencies is likely a reflection of brainstem neural phase-locking to F0-related periodicity fluctuations preserved in the 500-Hz temporal envelopes, which are unavailable in the 50-Hz temporal envelopes. No effect of temporal envelope cutoff frequency was seen for neural representation of TFS cues, suggesting that spectral side band frequencies created by the 500-Hz temporal envelopes did not improve neural representation of F1 cues over the 50-Hz temporal envelopes. Finally, brainstem F0 representation was not significantly affected by carrier type with a temporal envelope cutoff frequency of 500 Hz, which is inconsistent with previous results of behavioral studies examining pitch perception of vocoded stimuli.

Original languageEnglish (US)
Pages (from-to)e256-e267
JournalEar and hearing
Volume38
Issue number5
DOIs
StatePublished - Sep 1 2017

ASJC Scopus subject areas

  • Otorhinolaryngology
  • Speech and Hearing

Fingerprint

Dive into the research topics of 'Human Frequency Following Responses to Vocoded Speech'. Together they form a unique fingerprint.

Cite this