TY - GEN
T1 - The impact of forced-alignment errors on automatic pronunciation evaluation
AU - Mathad, Vikram C.
AU - Mahr, Tristan J.
AU - Scherer, Nancy
AU - Chapman, Kathy
AU - Hustad, Katherine C.
AU - Liss, Julie
AU - Berisha, Visar
N1 - Publisher Copyright:
Copyright © 2021 ISCA.
PY - 2021
Y1 - 2021
N2 - Automatic evaluation of phone-level pronunciation scores typically involves two stages: (1) automatic phonetic segmentation via text-constrained phoneme alignment and (2) quantification of acoustic deviation for each phoneme-level relative to a database of correctly-pronounced speech. It's clear that the second stage depends on the first. That is, if there is misalignment, the acoustic deviation will also be impacted. In this paper, we analyzed the impact of alignment error on a measure of goodness of pronunciation. We computed (1) automatic pronunciation scores using force-aligned samples, (2) the forced-alignment error rate, and (3) acoustic deviation using manually-aligned samples. We used a bivariate linear regression model to characterize the contributions of forced alignment errors and acoustic deviation on the automatic pronunciation scores. This was done across two different children speech databases, namely children with cleft lip/palate and typically developing children between the ages of 3-6 years. The analysis shows that, for speech from typically-developing children, most of the variation in the automatic pronunciation scores is explained by acoustic deviation, with the errors in forced alignment playing a relatively minor role. The forced alignment errors have a small but significant downstream impact on pronunciation assessment for children with cleft lip/palate.
AB - Automatic evaluation of phone-level pronunciation scores typically involves two stages: (1) automatic phonetic segmentation via text-constrained phoneme alignment and (2) quantification of acoustic deviation for each phoneme-level relative to a database of correctly-pronounced speech. It's clear that the second stage depends on the first. That is, if there is misalignment, the acoustic deviation will also be impacted. In this paper, we analyzed the impact of alignment error on a measure of goodness of pronunciation. We computed (1) automatic pronunciation scores using force-aligned samples, (2) the forced-alignment error rate, and (3) acoustic deviation using manually-aligned samples. We used a bivariate linear regression model to characterize the contributions of forced alignment errors and acoustic deviation on the automatic pronunciation scores. This was done across two different children speech databases, namely children with cleft lip/palate and typically developing children between the ages of 3-6 years. The analysis shows that, for speech from typically-developing children, most of the variation in the automatic pronunciation scores is explained by acoustic deviation, with the errors in forced alignment playing a relatively minor role. The forced alignment errors have a small but significant downstream impact on pronunciation assessment for children with cleft lip/palate.
KW - Alignment error
KW - Force alignments
KW - Goodness of pronunciation
KW - Manual alignments
UR - http://www.scopus.com/inward/record.url?scp=85119495651&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119495651&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2021-1403
DO - 10.21437/Interspeech.2021-1403
M3 - Conference contribution
AN - SCOPUS:85119495651
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 176
EP - 180
BT - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
PB - International Speech Communication Association
T2 - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Y2 - 30 August 2021 through 3 September 2021
ER -