Abstract

Automatic pronunciation evaluation plays an important role in pronunciation training and second language education. This field draws heavily on concepts from automatic speech recognition (ASR) to quantify how close the pronunciation of nonnative speech is to native-like pronunciation. However, it is known that the formation of accent is related to pronunciation patterns of both the target language (L2) and the speaker's first language (L1). In this paper, we propose to use two native speech acoustic models, one trained on L2 speech and the other trained on L1 speech. We develop two sets of measurements that can be extracted from two acoustic models given accented speech. A new utterance-level feature extraction scheme is used to convert these measurements into a fixed-dimension vector which is used as an input to a statistical model to predict the accentedness of a speaker. On a data set consisting of speakers from 4 different L1 backgrounds, we show that the proposed system yields improved correlation with human evaluators compared to systems only using the L2 acoustic model.

Original languageEnglish (US)
Pages (from-to)1636-1640
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2018-September
DOIs
StatePublished - Jan 1 2018
Event19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India
Duration: Sep 2 2018Sep 6 2018

Fingerprint

Acoustic Model
Evaluation
Acoustics
Automatic Speech Recognition
Speech recognition
Statistical Model
Feature Extraction
Convert
Feature extraction
Quantify
Education
Speech
Predict
Target
Language

Keywords

  • Accentedness
  • Automatic speech recognition
  • Pronunciation evaluation

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Cite this

Investigating the role of L1 in automatic pronunciation evaluation of L2 speech. / Tu, Ming; Grabek, Anna; Liss, Julie; Berisha, Visar.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2018-September, 01.01.2018, p. 1636-1640.

Research output: Contribution to journalConference article

@article{02dc974d9d564e53a8a0c89e62749617,
title = "Investigating the role of L1 in automatic pronunciation evaluation of L2 speech",
abstract = "Automatic pronunciation evaluation plays an important role in pronunciation training and second language education. This field draws heavily on concepts from automatic speech recognition (ASR) to quantify how close the pronunciation of nonnative speech is to native-like pronunciation. However, it is known that the formation of accent is related to pronunciation patterns of both the target language (L2) and the speaker's first language (L1). In this paper, we propose to use two native speech acoustic models, one trained on L2 speech and the other trained on L1 speech. We develop two sets of measurements that can be extracted from two acoustic models given accented speech. A new utterance-level feature extraction scheme is used to convert these measurements into a fixed-dimension vector which is used as an input to a statistical model to predict the accentedness of a speaker. On a data set consisting of speakers from 4 different L1 backgrounds, we show that the proposed system yields improved correlation with human evaluators compared to systems only using the L2 acoustic model.",
keywords = "Accentedness, Automatic speech recognition, Pronunciation evaluation",
author = "Ming Tu and Anna Grabek and Julie Liss and Visar Berisha",
year = "2018",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2018-1350",
language = "English (US)",
volume = "2018-September",
pages = "1636--1640",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Investigating the role of L1 in automatic pronunciation evaluation of L2 speech

AU - Tu, Ming

AU - Grabek, Anna

AU - Liss, Julie

AU - Berisha, Visar

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Automatic pronunciation evaluation plays an important role in pronunciation training and second language education. This field draws heavily on concepts from automatic speech recognition (ASR) to quantify how close the pronunciation of nonnative speech is to native-like pronunciation. However, it is known that the formation of accent is related to pronunciation patterns of both the target language (L2) and the speaker's first language (L1). In this paper, we propose to use two native speech acoustic models, one trained on L2 speech and the other trained on L1 speech. We develop two sets of measurements that can be extracted from two acoustic models given accented speech. A new utterance-level feature extraction scheme is used to convert these measurements into a fixed-dimension vector which is used as an input to a statistical model to predict the accentedness of a speaker. On a data set consisting of speakers from 4 different L1 backgrounds, we show that the proposed system yields improved correlation with human evaluators compared to systems only using the L2 acoustic model.

AB - Automatic pronunciation evaluation plays an important role in pronunciation training and second language education. This field draws heavily on concepts from automatic speech recognition (ASR) to quantify how close the pronunciation of nonnative speech is to native-like pronunciation. However, it is known that the formation of accent is related to pronunciation patterns of both the target language (L2) and the speaker's first language (L1). In this paper, we propose to use two native speech acoustic models, one trained on L2 speech and the other trained on L1 speech. We develop two sets of measurements that can be extracted from two acoustic models given accented speech. A new utterance-level feature extraction scheme is used to convert these measurements into a fixed-dimension vector which is used as an input to a statistical model to predict the accentedness of a speaker. On a data set consisting of speakers from 4 different L1 backgrounds, we show that the proposed system yields improved correlation with human evaluators compared to systems only using the L2 acoustic model.

KW - Accentedness

KW - Automatic speech recognition

KW - Pronunciation evaluation

UR - http://www.scopus.com/inward/record.url?scp=85054974214&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054974214&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2018-1350

DO - 10.21437/Interspeech.2018-1350

M3 - Conference article

VL - 2018-September

SP - 1636

EP - 1640

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -