MLP-based isolated phoneme classification using likelihood features extracted from reconstructed phase space

Yasser Shekofteh, Farshad Almasganj, Ayoub Daliri

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Nonlinear properties of a complex signal can be represented in reconstructed phase space (RPS). Previously, researchers have developed RPS-based feature extraction approaches to capture nonlinear properties. Typically, these approaches are more computationally demanding - higher run-time - and less accurate than traditional techniques such as Mel-frequency cepstral coefficients (MFCCs) that fail to capture nonlinear properties of signals. To overcome these issues, we propose a new RPS-based feature extraction approach that is based on a previously reported approach. The proposed approach calculates the similarities between the embedded speech signals and a set of predefined speech attractor models in the RPS, and uses the similarities as a set of proper input features for a final phonetic classifier. A set of Gaussian mixture models (GMMs) is trained to represent the variety of all phoneme attractors in the RPS. Using the developed GMMs, for each embedded out-sample speech signal, a feature vector is calculated that consists of the Log-likelihoods. Then, an MLP-based classifier is used to estimate posterior probabilities for the phoneme classes. To test the performance of the proposed approach, we apply the approach to a Persian speech corpus (i.e., FARSDAT). Results show 1.89% absolute classification accuracy improvement in comparison to performance of a baseline system that exploits MFCC features. Combining different classifiers that use the proposed RPS-based features and MFCC features, the classifier gain the highest accuracy of 68.85% phoneme classification rate, with absolute accuracy improvements of 4.78% against a baseline system.

Original languageEnglish (US)
Article number2332
Pages (from-to)1-9
Number of pages9
JournalEngineering Applications of Artificial Intelligence
Volume44
DOIs
StatePublished - Jan 1 2015
Externally publishedYes

Fingerprint

Classifiers
Feature extraction
Speech analysis

Keywords

  • Gaussian mixture models
  • Isolated phoneme classification
  • Nonlinear speech processing
  • Phoneme attractor
  • Reconstructed phase space

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Artificial Intelligence
  • Electrical and Electronic Engineering

Cite this

MLP-based isolated phoneme classification using likelihood features extracted from reconstructed phase space. / Shekofteh, Yasser; Almasganj, Farshad; Daliri, Ayoub.

In: Engineering Applications of Artificial Intelligence, Vol. 44, 2332, 01.01.2015, p. 1-9.

Research output: Contribution to journalArticle

@article{bc20b55d53fc4ff780e05d93043ebbfb,
title = "MLP-based isolated phoneme classification using likelihood features extracted from reconstructed phase space",
abstract = "Nonlinear properties of a complex signal can be represented in reconstructed phase space (RPS). Previously, researchers have developed RPS-based feature extraction approaches to capture nonlinear properties. Typically, these approaches are more computationally demanding - higher run-time - and less accurate than traditional techniques such as Mel-frequency cepstral coefficients (MFCCs) that fail to capture nonlinear properties of signals. To overcome these issues, we propose a new RPS-based feature extraction approach that is based on a previously reported approach. The proposed approach calculates the similarities between the embedded speech signals and a set of predefined speech attractor models in the RPS, and uses the similarities as a set of proper input features for a final phonetic classifier. A set of Gaussian mixture models (GMMs) is trained to represent the variety of all phoneme attractors in the RPS. Using the developed GMMs, for each embedded out-sample speech signal, a feature vector is calculated that consists of the Log-likelihoods. Then, an MLP-based classifier is used to estimate posterior probabilities for the phoneme classes. To test the performance of the proposed approach, we apply the approach to a Persian speech corpus (i.e., FARSDAT). Results show 1.89{\%} absolute classification accuracy improvement in comparison to performance of a baseline system that exploits MFCC features. Combining different classifiers that use the proposed RPS-based features and MFCC features, the classifier gain the highest accuracy of 68.85{\%} phoneme classification rate, with absolute accuracy improvements of 4.78{\%} against a baseline system.",
keywords = "Gaussian mixture models, Isolated phoneme classification, Nonlinear speech processing, Phoneme attractor, Reconstructed phase space",
author = "Yasser Shekofteh and Farshad Almasganj and Ayoub Daliri",
year = "2015",
month = "1",
day = "1",
doi = "10.1016/j.engappai.2015.05.001",
language = "English (US)",
volume = "44",
pages = "1--9",
journal = "Engineering Applications of Artificial Intelligence",
issn = "0952-1976",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - MLP-based isolated phoneme classification using likelihood features extracted from reconstructed phase space

AU - Shekofteh, Yasser

AU - Almasganj, Farshad

AU - Daliri, Ayoub

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Nonlinear properties of a complex signal can be represented in reconstructed phase space (RPS). Previously, researchers have developed RPS-based feature extraction approaches to capture nonlinear properties. Typically, these approaches are more computationally demanding - higher run-time - and less accurate than traditional techniques such as Mel-frequency cepstral coefficients (MFCCs) that fail to capture nonlinear properties of signals. To overcome these issues, we propose a new RPS-based feature extraction approach that is based on a previously reported approach. The proposed approach calculates the similarities between the embedded speech signals and a set of predefined speech attractor models in the RPS, and uses the similarities as a set of proper input features for a final phonetic classifier. A set of Gaussian mixture models (GMMs) is trained to represent the variety of all phoneme attractors in the RPS. Using the developed GMMs, for each embedded out-sample speech signal, a feature vector is calculated that consists of the Log-likelihoods. Then, an MLP-based classifier is used to estimate posterior probabilities for the phoneme classes. To test the performance of the proposed approach, we apply the approach to a Persian speech corpus (i.e., FARSDAT). Results show 1.89% absolute classification accuracy improvement in comparison to performance of a baseline system that exploits MFCC features. Combining different classifiers that use the proposed RPS-based features and MFCC features, the classifier gain the highest accuracy of 68.85% phoneme classification rate, with absolute accuracy improvements of 4.78% against a baseline system.

AB - Nonlinear properties of a complex signal can be represented in reconstructed phase space (RPS). Previously, researchers have developed RPS-based feature extraction approaches to capture nonlinear properties. Typically, these approaches are more computationally demanding - higher run-time - and less accurate than traditional techniques such as Mel-frequency cepstral coefficients (MFCCs) that fail to capture nonlinear properties of signals. To overcome these issues, we propose a new RPS-based feature extraction approach that is based on a previously reported approach. The proposed approach calculates the similarities between the embedded speech signals and a set of predefined speech attractor models in the RPS, and uses the similarities as a set of proper input features for a final phonetic classifier. A set of Gaussian mixture models (GMMs) is trained to represent the variety of all phoneme attractors in the RPS. Using the developed GMMs, for each embedded out-sample speech signal, a feature vector is calculated that consists of the Log-likelihoods. Then, an MLP-based classifier is used to estimate posterior probabilities for the phoneme classes. To test the performance of the proposed approach, we apply the approach to a Persian speech corpus (i.e., FARSDAT). Results show 1.89% absolute classification accuracy improvement in comparison to performance of a baseline system that exploits MFCC features. Combining different classifiers that use the proposed RPS-based features and MFCC features, the classifier gain the highest accuracy of 68.85% phoneme classification rate, with absolute accuracy improvements of 4.78% against a baseline system.

KW - Gaussian mixture models

KW - Isolated phoneme classification

KW - Nonlinear speech processing

KW - Phoneme attractor

KW - Reconstructed phase space

UR - http://www.scopus.com/inward/record.url?scp=84940025234&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84940025234&partnerID=8YFLogxK

U2 - 10.1016/j.engappai.2015.05.001

DO - 10.1016/j.engappai.2015.05.001

M3 - Article

VL - 44

SP - 1

EP - 9

JO - Engineering Applications of Artificial Intelligence

JF - Engineering Applications of Artificial Intelligence

SN - 0952-1976

M1 - 2332

ER -