Residual + capsule networks (RESCAP) for simultaneous single-channel overlapped keyword recognition

Yan Xiong; Visar Berisha; Chaitali Chakrabarti

doi:10.21437/Interspeech.2019-2913

Residual + capsule networks (RESCAP) for simultaneous single-channel overlapped keyword recognition

Yan Xiong, Visar Berisha, Chaitali Chakrabarti

Research output: Contribution to journal › Conference article › peer-review

6 Scopus citations

Abstract

Overlapped speech poses a significant problem in a variety of applications in speech processing including speaker identification, speaker diarization, and speech recognition among others. To address it, existing systems combine source separation with algorithms for processing non-overlapped speech (e.g. source separation + follow-on speech recognition). In this paper we propose a modified network architecture to simultaneously recognize keywords from overlapped speech without explicitly having to perform source separation. We build our network by adding capsule layers to a ResNet architecture that has shown state-of-the-art performance on a traditional keyword recognition task. We evaluate the model on a series of 10-word overlapped keyword recognition experiments, using speaker dependent and speaker independent training. Results indicate that Residual + Capsule (ResCap) network shows marked improvement in recognizing overlapped speech, especially in experiments where there is a mismatch in the number of overlapped speakers between the training set and the test set.

Original language	English (US)
Pages (from-to)	3337-3341
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2019-September
DOIs	https://doi.org/10.21437/Interspeech.2019-2913
State	Published - 2019
Event	20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria Duration: Sep 15 2019 → Sep 19 2019

Keywords

Capsule networks
Keyword spotting
Overlapped speech
Recognition
ResNet
Residual networks
Speech recognition

ASJC Scopus subject areas

Language and Linguistics
Human-Computer Interaction
Signal Processing
Software
Modeling and Simulation

Access to Document

10.21437/Interspeech.2019-2913

Cite this

Residual + capsule networks (RESCAP) for simultaneous single-channel overlapped keyword recognition. / Xiong, Yan; Berisha, Visar ; Chakrabarti, Chaitali.
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2019-September, 2019, p. 3337-3341.

Research output: Contribution to journal › Conference article › peer-review

@article{fc77799c65a44a308adb83005d8dda69,

title = "Residual + capsule networks (RESCAP) for simultaneous single-channel overlapped keyword recognition",

abstract = "Overlapped speech poses a significant problem in a variety of applications in speech processing including speaker identification, speaker diarization, and speech recognition among others. To address it, existing systems combine source separation with algorithms for processing non-overlapped speech (e.g. source separation + follow-on speech recognition). In this paper we propose a modified network architecture to simultaneously recognize keywords from overlapped speech without explicitly having to perform source separation. We build our network by adding capsule layers to a ResNet architecture that has shown state-of-the-art performance on a traditional keyword recognition task. We evaluate the model on a series of 10-word overlapped keyword recognition experiments, using speaker dependent and speaker independent training. Results indicate that Residual + Capsule (ResCap) network shows marked improvement in recognizing overlapped speech, especially in experiments where there is a mismatch in the number of overlapped speakers between the training set and the test set.",

keywords = "Capsule networks, Keyword spotting, Overlapped speech, Recognition, ResNet, Residual networks, Speech recognition",

author = "Yan Xiong and Visar Berisha and Chaitali Chakrabarti",

note = "Funding Information: This research was supported by the National Institutes of Health Grant R01DC006859. Publisher Copyright: Copyright {\textcopyright} 2019 ISCA; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 ; Conference date: 15-09-2019 Through 19-09-2019",

year = "2019",

doi = "10.21437/Interspeech.2019-2913",

language = "English (US)",

volume = "2019-September",

pages = "3337--3341",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Residual + capsule networks (RESCAP) for simultaneous single-channel overlapped keyword recognition

AU - Xiong, Yan

AU - Berisha, Visar

AU - Chakrabarti, Chaitali

PY - 2019

Y1 - 2019

N2 - Overlapped speech poses a significant problem in a variety of applications in speech processing including speaker identification, speaker diarization, and speech recognition among others. To address it, existing systems combine source separation with algorithms for processing non-overlapped speech (e.g. source separation + follow-on speech recognition). In this paper we propose a modified network architecture to simultaneously recognize keywords from overlapped speech without explicitly having to perform source separation. We build our network by adding capsule layers to a ResNet architecture that has shown state-of-the-art performance on a traditional keyword recognition task. We evaluate the model on a series of 10-word overlapped keyword recognition experiments, using speaker dependent and speaker independent training. Results indicate that Residual + Capsule (ResCap) network shows marked improvement in recognizing overlapped speech, especially in experiments where there is a mismatch in the number of overlapped speakers between the training set and the test set.

AB - Overlapped speech poses a significant problem in a variety of applications in speech processing including speaker identification, speaker diarization, and speech recognition among others. To address it, existing systems combine source separation with algorithms for processing non-overlapped speech (e.g. source separation + follow-on speech recognition). In this paper we propose a modified network architecture to simultaneously recognize keywords from overlapped speech without explicitly having to perform source separation. We build our network by adding capsule layers to a ResNet architecture that has shown state-of-the-art performance on a traditional keyword recognition task. We evaluate the model on a series of 10-word overlapped keyword recognition experiments, using speaker dependent and speaker independent training. Results indicate that Residual + Capsule (ResCap) network shows marked improvement in recognizing overlapped speech, especially in experiments where there is a mismatch in the number of overlapped speakers between the training set and the test set.

KW - Capsule networks

KW - Keyword spotting

KW - Overlapped speech

KW - Recognition

KW - ResNet

KW - Residual networks

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85074733289&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074733289&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2019-2913

DO - 10.21437/Interspeech.2019-2913

M3 - Conference article

AN - SCOPUS:85074733289

SN - 2308-457X

VL - 2019-September

SP - 3337

EP - 3341

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019

Y2 - 15 September 2019 through 19 September 2019

ER -

Residual + capsule networks (RESCAP) for simultaneous single-channel overlapped keyword recognition

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this