Abstract
Spectro-temporal dynamics of consonant-vowel (CV) transition regions are considered to provide robust cues related to articulation. In this work, we propose an objective measure of precise articulation, dubbed the objective articulation measure (OAM), by analyzing the CV transitions segmented around vowel onsets. The OAM is derived based on the posteriors of a convolutional neural network pre-trained to classify between different consonants using CV regions as input. We demonstrate that the OAM is correlated with perceptual measures in a variety of contexts including (a) adult dysarthric speech, (b) the speech of children with cleft lip/palate, and (c) a database of accented English speech from native Mandarin and Spanish speakers.
Original language | English (US) |
---|---|
Pages (from-to) | 86-95 |
Number of pages | 10 |
Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
Volume | 31 |
DOIs | |
State | Published - 2023 |
Keywords
- Articulation precision
- and second language learning
- cleft lip and palate
- consonant-vowel transitions
- convolution neural networks
- dysarthria
- pronunciation scores
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- Acoustics and Ultrasonics
- Computational Mathematics
- Electrical and Electronic Engineering