Abstract
Alphabet recognition is known to be a difficult task due to the acoustic similarities among different letters, especially letters in the E-set. Recognition systems based on whole-word Hidden-Markov Models (HMM) perform poorly on this task due to the inability of the models to capture fine phonetic details, especially details occurring within segments of short duration. Letters B and D, for example, differ mainly in the 10-20 msec segment prior to vowel onset. In this paper, we use context-dependent phoneme-based HMMs to capture the fine phonetic detail that is required to discriminate such a confusable vocabulary. Our results reveal that context-dependent modeling gives about 9% improvement on speaker-independent performance over whole-word modeling, and an 18% improvement on the E-set. Furthermore, using an improved spectral representation of the stop consonants in the E-set, an additional 6% improvement in the E-set can be achieved. Our best speaker-independent E-set performance over 15 speakers is 90.3%, with overall alphabet recognition of 94.1%.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - IEEE International Symposium on Circuits and Systems |
Publisher | IEEE |
Pages | 189-192 |
Number of pages | 4 |
Volume | 2 |
State | Published - 1994 |
Event | Proceedings of the 1994 IEEE International Symposium on Circuits and Systems. Part 3 (of 6) - London, England Duration: May 30 1994 → Jun 2 1994 |
Other
Other | Proceedings of the 1994 IEEE International Symposium on Circuits and Systems. Part 3 (of 6) |
---|---|
City | London, England |
Period | 5/30/94 → 6/2/94 |
ASJC Scopus subject areas
- Electrical and Electronic Engineering
- Electronic, Optical and Magnetic Materials