Context-dependent modeling in alphabet recognition

Philipos C. Loizou, Andreas Spanias

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Alphabet recognition is known to be a difficult task due to the acoustic similarities among different letters, especially letters in the E-set. Recognition systems based on whole-word Hidden-Markov Models (HMM) perform poorly on this task due to the inability of the models to capture fine phonetic details, especially details occurring within segments of short duration. Letters B and D, for example, differ mainly in the 10-20 msec segment prior to vowel onset. In this paper, we use context-dependent phoneme-based HMMs to capture the fine phonetic detail that is required to discriminate such a confusable vocabulary. Our results reveal that context-dependent modeling gives about 9% improvement on speaker-independent performance over whole-word modeling, and an 18% improvement on the E-set. Furthermore, using an improved spectral representation of the stop consonants in the E-set, an additional 6% improvement in the E-set can be achieved. Our best speaker-independent E-set performance over 15 speakers is 90.3%, with overall alphabet recognition of 94.1%.

Original languageEnglish (US)
Title of host publicationProceedings - IEEE International Symposium on Circuits and Systems
PublisherIEEE
Pages189-192
Number of pages4
Volume2
StatePublished - 1994
EventProceedings of the 1994 IEEE International Symposium on Circuits and Systems. Part 3 (of 6) - London, England
Duration: May 30 1994Jun 2 1994

Other

OtherProceedings of the 1994 IEEE International Symposium on Circuits and Systems. Part 3 (of 6)
CityLondon, England
Period5/30/946/2/94

Fingerprint

Speech analysis
Hidden Markov models
Acoustics

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Electronic, Optical and Magnetic Materials

Cite this

Loizou, P. C., & Spanias, A. (1994). Context-dependent modeling in alphabet recognition. In Proceedings - IEEE International Symposium on Circuits and Systems (Vol. 2, pp. 189-192). IEEE.

Context-dependent modeling in alphabet recognition. / Loizou, Philipos C.; Spanias, Andreas.

Proceedings - IEEE International Symposium on Circuits and Systems. Vol. 2 IEEE, 1994. p. 189-192.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Loizou, PC & Spanias, A 1994, Context-dependent modeling in alphabet recognition. in Proceedings - IEEE International Symposium on Circuits and Systems. vol. 2, IEEE, pp. 189-192, Proceedings of the 1994 IEEE International Symposium on Circuits and Systems. Part 3 (of 6), London, England, 5/30/94.
Loizou PC, Spanias A. Context-dependent modeling in alphabet recognition. In Proceedings - IEEE International Symposium on Circuits and Systems. Vol. 2. IEEE. 1994. p. 189-192
Loizou, Philipos C. ; Spanias, Andreas. / Context-dependent modeling in alphabet recognition. Proceedings - IEEE International Symposium on Circuits and Systems. Vol. 2 IEEE, 1994. pp. 189-192
@inproceedings{e7366e19fd1844c7b3c17baf0623f92a,
title = "Context-dependent modeling in alphabet recognition",
abstract = "Alphabet recognition is known to be a difficult task due to the acoustic similarities among different letters, especially letters in the E-set. Recognition systems based on whole-word Hidden-Markov Models (HMM) perform poorly on this task due to the inability of the models to capture fine phonetic details, especially details occurring within segments of short duration. Letters B and D, for example, differ mainly in the 10-20 msec segment prior to vowel onset. In this paper, we use context-dependent phoneme-based HMMs to capture the fine phonetic detail that is required to discriminate such a confusable vocabulary. Our results reveal that context-dependent modeling gives about 9{\%} improvement on speaker-independent performance over whole-word modeling, and an 18{\%} improvement on the E-set. Furthermore, using an improved spectral representation of the stop consonants in the E-set, an additional 6{\%} improvement in the E-set can be achieved. Our best speaker-independent E-set performance over 15 speakers is 90.3{\%}, with overall alphabet recognition of 94.1{\%}.",
author = "Loizou, {Philipos C.} and Andreas Spanias",
year = "1994",
language = "English (US)",
volume = "2",
pages = "189--192",
booktitle = "Proceedings - IEEE International Symposium on Circuits and Systems",
publisher = "IEEE",

}

TY - GEN

T1 - Context-dependent modeling in alphabet recognition

AU - Loizou, Philipos C.

AU - Spanias, Andreas

PY - 1994

Y1 - 1994

N2 - Alphabet recognition is known to be a difficult task due to the acoustic similarities among different letters, especially letters in the E-set. Recognition systems based on whole-word Hidden-Markov Models (HMM) perform poorly on this task due to the inability of the models to capture fine phonetic details, especially details occurring within segments of short duration. Letters B and D, for example, differ mainly in the 10-20 msec segment prior to vowel onset. In this paper, we use context-dependent phoneme-based HMMs to capture the fine phonetic detail that is required to discriminate such a confusable vocabulary. Our results reveal that context-dependent modeling gives about 9% improvement on speaker-independent performance over whole-word modeling, and an 18% improvement on the E-set. Furthermore, using an improved spectral representation of the stop consonants in the E-set, an additional 6% improvement in the E-set can be achieved. Our best speaker-independent E-set performance over 15 speakers is 90.3%, with overall alphabet recognition of 94.1%.

AB - Alphabet recognition is known to be a difficult task due to the acoustic similarities among different letters, especially letters in the E-set. Recognition systems based on whole-word Hidden-Markov Models (HMM) perform poorly on this task due to the inability of the models to capture fine phonetic details, especially details occurring within segments of short duration. Letters B and D, for example, differ mainly in the 10-20 msec segment prior to vowel onset. In this paper, we use context-dependent phoneme-based HMMs to capture the fine phonetic detail that is required to discriminate such a confusable vocabulary. Our results reveal that context-dependent modeling gives about 9% improvement on speaker-independent performance over whole-word modeling, and an 18% improvement on the E-set. Furthermore, using an improved spectral representation of the stop consonants in the E-set, an additional 6% improvement in the E-set can be achieved. Our best speaker-independent E-set performance over 15 speakers is 90.3%, with overall alphabet recognition of 94.1%.

UR - http://www.scopus.com/inward/record.url?scp=0028573857&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0028573857&partnerID=8YFLogxK

M3 - Conference contribution

VL - 2

SP - 189

EP - 192

BT - Proceedings - IEEE International Symposium on Circuits and Systems

PB - IEEE

ER -