High-performance alphabet recognition

P. C. Loizou, Andreas Spanias

Research output: Contribution to journalArticle

52 Citations (Scopus)

Abstract

Alphabet recognition is needed in many applications for retrieving information associated with the spelling of a name, such as telephone numbers, addresses, etc. This is a difficult recognition task due to the acoustic similarities existing between letters in the alphabet (e.g., the E-set letters). This paper presents the development of a high-performance alphabet recognizer that has been evaluated on studio quality as well as on telephone-bandwidth speech. Unlike previously proposed systems, the alphabet recognizer presented here is based on context-dependent phoneme hidden Markov models (HMM's), which have been found to outperform whole-word models by as much as 8%. The proposed recognizer incorporates a series of new approaches to tackle the problems associated with the confusions occurring between the stop consonants in the E-set and the confusions between the nasals (i.e., letters M and N). First, a new feature representation is proposed for improved stop consonant discrimination, and second, two subspace approaches are proposed for improved nasal discrimination. The subspace approach was found to yield a 45% error-rate reduction in nasal discrimination. Various other techniques are also proposed, yielding a 97.3% speaker-independent performance on alphabet recognition and 95% speaker-independent performance on E-set recognition. A telephone alphabet recognizer was also developed using context-dependent HMM's. When tested on the recognition of 300 last names (which are contained in a list of 50 000 common last names) spelled by 300 speakers, the recognizer achieved 91.7% correct letter recognition with 1.1% letter insertions.

Original languageEnglish (US)
Pages (from-to)430-445
Number of pages16
JournalIEEE Transactions on Speech and Audio Processing
Volume4
Issue number6
DOIs
StatePublished - 1996
Externally publishedYes

Fingerprint

alphabets
Telephone
Hidden Markov models
telephones
discrimination
confusion
Studios
Acoustics
Bandwidth
phonemes
lists
insertion
bandwidth
acoustics

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Acoustics and Ultrasonics

Cite this

High-performance alphabet recognition. / Loizou, P. C.; Spanias, Andreas.

In: IEEE Transactions on Speech and Audio Processing, Vol. 4, No. 6, 1996, p. 430-445.

Research output: Contribution to journalArticle

@article{d4bd9dab56274707890e9fca2a44f524,
title = "High-performance alphabet recognition",
abstract = "Alphabet recognition is needed in many applications for retrieving information associated with the spelling of a name, such as telephone numbers, addresses, etc. This is a difficult recognition task due to the acoustic similarities existing between letters in the alphabet (e.g., the E-set letters). This paper presents the development of a high-performance alphabet recognizer that has been evaluated on studio quality as well as on telephone-bandwidth speech. Unlike previously proposed systems, the alphabet recognizer presented here is based on context-dependent phoneme hidden Markov models (HMM's), which have been found to outperform whole-word models by as much as 8{\%}. The proposed recognizer incorporates a series of new approaches to tackle the problems associated with the confusions occurring between the stop consonants in the E-set and the confusions between the nasals (i.e., letters M and N). First, a new feature representation is proposed for improved stop consonant discrimination, and second, two subspace approaches are proposed for improved nasal discrimination. The subspace approach was found to yield a 45{\%} error-rate reduction in nasal discrimination. Various other techniques are also proposed, yielding a 97.3{\%} speaker-independent performance on alphabet recognition and 95{\%} speaker-independent performance on E-set recognition. A telephone alphabet recognizer was also developed using context-dependent HMM's. When tested on the recognition of 300 last names (which are contained in a list of 50 000 common last names) spelled by 300 speakers, the recognizer achieved 91.7{\%} correct letter recognition with 1.1{\%} letter insertions.",
author = "Loizou, {P. C.} and Andreas Spanias",
year = "1996",
doi = "10.1109/89.544528",
language = "English (US)",
volume = "4",
pages = "430--445",
journal = "IEEE Transactions on Speech and Audio Processing",
issn = "1558-7916",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "6",

}

TY - JOUR

T1 - High-performance alphabet recognition

AU - Loizou, P. C.

AU - Spanias, Andreas

PY - 1996

Y1 - 1996

N2 - Alphabet recognition is needed in many applications for retrieving information associated with the spelling of a name, such as telephone numbers, addresses, etc. This is a difficult recognition task due to the acoustic similarities existing between letters in the alphabet (e.g., the E-set letters). This paper presents the development of a high-performance alphabet recognizer that has been evaluated on studio quality as well as on telephone-bandwidth speech. Unlike previously proposed systems, the alphabet recognizer presented here is based on context-dependent phoneme hidden Markov models (HMM's), which have been found to outperform whole-word models by as much as 8%. The proposed recognizer incorporates a series of new approaches to tackle the problems associated with the confusions occurring between the stop consonants in the E-set and the confusions between the nasals (i.e., letters M and N). First, a new feature representation is proposed for improved stop consonant discrimination, and second, two subspace approaches are proposed for improved nasal discrimination. The subspace approach was found to yield a 45% error-rate reduction in nasal discrimination. Various other techniques are also proposed, yielding a 97.3% speaker-independent performance on alphabet recognition and 95% speaker-independent performance on E-set recognition. A telephone alphabet recognizer was also developed using context-dependent HMM's. When tested on the recognition of 300 last names (which are contained in a list of 50 000 common last names) spelled by 300 speakers, the recognizer achieved 91.7% correct letter recognition with 1.1% letter insertions.

AB - Alphabet recognition is needed in many applications for retrieving information associated with the spelling of a name, such as telephone numbers, addresses, etc. This is a difficult recognition task due to the acoustic similarities existing between letters in the alphabet (e.g., the E-set letters). This paper presents the development of a high-performance alphabet recognizer that has been evaluated on studio quality as well as on telephone-bandwidth speech. Unlike previously proposed systems, the alphabet recognizer presented here is based on context-dependent phoneme hidden Markov models (HMM's), which have been found to outperform whole-word models by as much as 8%. The proposed recognizer incorporates a series of new approaches to tackle the problems associated with the confusions occurring between the stop consonants in the E-set and the confusions between the nasals (i.e., letters M and N). First, a new feature representation is proposed for improved stop consonant discrimination, and second, two subspace approaches are proposed for improved nasal discrimination. The subspace approach was found to yield a 45% error-rate reduction in nasal discrimination. Various other techniques are also proposed, yielding a 97.3% speaker-independent performance on alphabet recognition and 95% speaker-independent performance on E-set recognition. A telephone alphabet recognizer was also developed using context-dependent HMM's. When tested on the recognition of 300 last names (which are contained in a list of 50 000 common last names) spelled by 300 speakers, the recognizer achieved 91.7% correct letter recognition with 1.1% letter insertions.

UR - http://www.scopus.com/inward/record.url?scp=0030286185&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0030286185&partnerID=8YFLogxK

U2 - 10.1109/89.544528

DO - 10.1109/89.544528

M3 - Article

AN - SCOPUS:0030286185

VL - 4

SP - 430

EP - 445

JO - IEEE Transactions on Speech and Audio Processing

JF - IEEE Transactions on Speech and Audio Processing

SN - 1558-7916

IS - 6

ER -