Consonant-Vowel Transition Models Based on Deep Learning for Objective Evaluation of Articulation

Vikram C. Mathad; Julie M. Liss; Kathy Chapman; Nancy Scherer; Visar Berisha

doi:10.1109/TASLP.2022.3209937

Consonant-Vowel Transition Models Based on Deep Learning for Objective Evaluation of Articulation

Vikram C. Mathad, Julie M. Liss, Kathy Chapman, Nancy Scherer, Visar Berisha

Health Solutions, College of (CHS)

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Spectro-temporal dynamics of consonant-vowel (CV) transition regions are considered to provide robust cues related to articulation. In this work, we propose an objective measure of precise articulation, dubbed the objective articulation measure (OAM), by analyzing the CV transitions segmented around vowel onsets. The OAM is derived based on the posteriors of a convolutional neural network pre-trained to classify between different consonants using CV regions as input. We demonstrate that the OAM is correlated with perceptual measures in a variety of contexts including (a) adult dysarthric speech, (b) the speech of children with cleft lip/palate, and (c) a database of accented English speech from native Mandarin and Spanish speakers.

Original language	English (US)
Pages (from-to)	86-95
Number of pages	10
Journal	IEEE/ACM Transactions on Audio Speech and Language Processing
Volume	31
DOIs	https://doi.org/10.1109/TASLP.2022.3209937
State	Published - 2023

Keywords

Articulation precision
and second language learning
cleft lip and palate
consonant-vowel transitions
convolution neural networks
dysarthria
pronunciation scores

ASJC Scopus subject areas

Computer Science (miscellaneous)
Acoustics and Ultrasonics
Computational Mathematics
Electrical and Electronic Engineering

Access to Document

10.1109/TASLP.2022.3209937

Cite this

@article{20660ac08d134bcd989494849bb00619,

title = "Consonant-Vowel Transition Models Based on Deep Learning for Objective Evaluation of Articulation",

abstract = "Spectro-temporal dynamics of consonant-vowel (CV) transition regions are considered to provide robust cues related to articulation. In this work, we propose an objective measure of precise articulation, dubbed the objective articulation measure (OAM), by analyzing the CV transitions segmented around vowel onsets. The OAM is derived based on the posteriors of a convolutional neural network pre-trained to classify between different consonants using CV regions as input. We demonstrate that the OAM is correlated with perceptual measures in a variety of contexts including (a) adult dysarthric speech, (b) the speech of children with cleft lip/palate, and (c) a database of accented English speech from native Mandarin and Spanish speakers.",

keywords = "Articulation precision, and second language learning, cleft lip and palate, consonant-vowel transitions, convolution neural networks, dysarthria, pronunciation scores",

author = "Mathad, {Vikram C.} and Liss, {Julie M.} and Kathy Chapman and Nancy Scherer and Visar Berisha",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.",

year = "2023",

doi = "10.1109/TASLP.2022.3209937",

language = "English (US)",

volume = "31",

pages = "86--95",

journal = "IEEE/ACM Transactions on Audio Speech and Language Processing",

issn = "2329-9290",

publisher = "IEEE Advancing Technology for Humanity",

}

TY - JOUR

T1 - Consonant-Vowel Transition Models Based on Deep Learning for Objective Evaluation of Articulation

AU - Mathad, Vikram C.

AU - Liss, Julie M.

AU - Chapman, Kathy

AU - Scherer, Nancy

AU - Berisha, Visar

PY - 2023

Y1 - 2023

N2 - Spectro-temporal dynamics of consonant-vowel (CV) transition regions are considered to provide robust cues related to articulation. In this work, we propose an objective measure of precise articulation, dubbed the objective articulation measure (OAM), by analyzing the CV transitions segmented around vowel onsets. The OAM is derived based on the posteriors of a convolutional neural network pre-trained to classify between different consonants using CV regions as input. We demonstrate that the OAM is correlated with perceptual measures in a variety of contexts including (a) adult dysarthric speech, (b) the speech of children with cleft lip/palate, and (c) a database of accented English speech from native Mandarin and Spanish speakers.

AB - Spectro-temporal dynamics of consonant-vowel (CV) transition regions are considered to provide robust cues related to articulation. In this work, we propose an objective measure of precise articulation, dubbed the objective articulation measure (OAM), by analyzing the CV transitions segmented around vowel onsets. The OAM is derived based on the posteriors of a convolutional neural network pre-trained to classify between different consonants using CV regions as input. We demonstrate that the OAM is correlated with perceptual measures in a variety of contexts including (a) adult dysarthric speech, (b) the speech of children with cleft lip/palate, and (c) a database of accented English speech from native Mandarin and Spanish speakers.

KW - Articulation precision

KW - and second language learning

KW - cleft lip and palate

KW - consonant-vowel transitions

KW - convolution neural networks

KW - dysarthria

KW - pronunciation scores

UR - http://www.scopus.com/inward/record.url?scp=85139817031&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85139817031&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2022.3209937

DO - 10.1109/TASLP.2022.3209937

M3 - Article

AN - SCOPUS:85139817031

SN - 2329-9290

VL - 31

SP - 86

EP - 95

JO - IEEE/ACM Transactions on Audio Speech and Language Processing

JF - IEEE/ACM Transactions on Audio Speech and Language Processing

ER -

Consonant-Vowel Transition Models Based on Deep Learning for Objective Evaluation of Articulation

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this