Low bit-rate speech coding based on an improved sinusoidal model

Sassan Ahmadi, Andreas Spanias

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

This paper addresses the design, implementation and evaluation of efficient low bit-rate speech coding algorithms based on an improved sinusoidal model. A series of algorithms were developed for speech classification and pitch frequency determination, modeling of sinusoidal amplitudes and phases, and frame interpolation. An improved paradigm for sinusoidal phase coding is presented, where short-time sinusoidal phases are modeled using a combination of linear prediction, spectral sampling, linear phase alignment and all-pass phase error correction components. A class-dependent split vector quantization scheme is used to encode the sinusoidal amplitudes. The masking properties of the human auditory system are effectively exploited in the algorithms. The algorithms were successfully integrated into a 2.4 kbps sinusoidal coder. The performance of the 2.4 kbps coder was evaluated in terms of informal subjective tests such as the mean opinion score (MOS) and the diagnostic rhyme test (DRT), as well as some perceptually motivated objective distortion measures. Performance analysis on a large speech database indicates considerable improvement in short-time signal matching both in the time and the spectral domains. In addition, subjective quality of the reproduced speech is considerably improved.

Original languageEnglish (US)
Pages (from-to)369-390
Number of pages22
JournalSpeech Communication
Volume34
Issue number4
DOIs
StatePublished - Jul 2001

Fingerprint

Speech Coding
Speech coding
coding
Phase Error
Linear Prediction
Diagnostic Tests
Vector Quantization
Vector quantization
Masking
Error correction
Error Correction
Routine Diagnostic Tests
Model
Performance Analysis
performance
Interpolation
diagnostic
Alignment
Coding
Interpolate

Keywords

  • Frame interpolation
  • Linear prediction
  • Phase modeling
  • Sinusoidal model
  • Speech classification
  • Speech coding

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Experimental and Cognitive Psychology
  • Linguistics and Language

Cite this

Low bit-rate speech coding based on an improved sinusoidal model. / Ahmadi, Sassan; Spanias, Andreas.

In: Speech Communication, Vol. 34, No. 4, 07.2001, p. 369-390.

Research output: Contribution to journalArticle

@article{f0762cc25c3e47a2a0808fe603a32673,
title = "Low bit-rate speech coding based on an improved sinusoidal model",
abstract = "This paper addresses the design, implementation and evaluation of efficient low bit-rate speech coding algorithms based on an improved sinusoidal model. A series of algorithms were developed for speech classification and pitch frequency determination, modeling of sinusoidal amplitudes and phases, and frame interpolation. An improved paradigm for sinusoidal phase coding is presented, where short-time sinusoidal phases are modeled using a combination of linear prediction, spectral sampling, linear phase alignment and all-pass phase error correction components. A class-dependent split vector quantization scheme is used to encode the sinusoidal amplitudes. The masking properties of the human auditory system are effectively exploited in the algorithms. The algorithms were successfully integrated into a 2.4 kbps sinusoidal coder. The performance of the 2.4 kbps coder was evaluated in terms of informal subjective tests such as the mean opinion score (MOS) and the diagnostic rhyme test (DRT), as well as some perceptually motivated objective distortion measures. Performance analysis on a large speech database indicates considerable improvement in short-time signal matching both in the time and the spectral domains. In addition, subjective quality of the reproduced speech is considerably improved.",
keywords = "Frame interpolation, Linear prediction, Phase modeling, Sinusoidal model, Speech classification, Speech coding",
author = "Sassan Ahmadi and Andreas Spanias",
year = "2001",
month = "7",
doi = "10.1016/S0167-6393(00)00057-1",
language = "English (US)",
volume = "34",
pages = "369--390",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier",
number = "4",

}

TY - JOUR

T1 - Low bit-rate speech coding based on an improved sinusoidal model

AU - Ahmadi, Sassan

AU - Spanias, Andreas

PY - 2001/7

Y1 - 2001/7

N2 - This paper addresses the design, implementation and evaluation of efficient low bit-rate speech coding algorithms based on an improved sinusoidal model. A series of algorithms were developed for speech classification and pitch frequency determination, modeling of sinusoidal amplitudes and phases, and frame interpolation. An improved paradigm for sinusoidal phase coding is presented, where short-time sinusoidal phases are modeled using a combination of linear prediction, spectral sampling, linear phase alignment and all-pass phase error correction components. A class-dependent split vector quantization scheme is used to encode the sinusoidal amplitudes. The masking properties of the human auditory system are effectively exploited in the algorithms. The algorithms were successfully integrated into a 2.4 kbps sinusoidal coder. The performance of the 2.4 kbps coder was evaluated in terms of informal subjective tests such as the mean opinion score (MOS) and the diagnostic rhyme test (DRT), as well as some perceptually motivated objective distortion measures. Performance analysis on a large speech database indicates considerable improvement in short-time signal matching both in the time and the spectral domains. In addition, subjective quality of the reproduced speech is considerably improved.

AB - This paper addresses the design, implementation and evaluation of efficient low bit-rate speech coding algorithms based on an improved sinusoidal model. A series of algorithms were developed for speech classification and pitch frequency determination, modeling of sinusoidal amplitudes and phases, and frame interpolation. An improved paradigm for sinusoidal phase coding is presented, where short-time sinusoidal phases are modeled using a combination of linear prediction, spectral sampling, linear phase alignment and all-pass phase error correction components. A class-dependent split vector quantization scheme is used to encode the sinusoidal amplitudes. The masking properties of the human auditory system are effectively exploited in the algorithms. The algorithms were successfully integrated into a 2.4 kbps sinusoidal coder. The performance of the 2.4 kbps coder was evaluated in terms of informal subjective tests such as the mean opinion score (MOS) and the diagnostic rhyme test (DRT), as well as some perceptually motivated objective distortion measures. Performance analysis on a large speech database indicates considerable improvement in short-time signal matching both in the time and the spectral domains. In addition, subjective quality of the reproduced speech is considerably improved.

KW - Frame interpolation

KW - Linear prediction

KW - Phase modeling

KW - Sinusoidal model

KW - Speech classification

KW - Speech coding

UR - http://www.scopus.com/inward/record.url?scp=0035400321&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0035400321&partnerID=8YFLogxK

U2 - 10.1016/S0167-6393(00)00057-1

DO - 10.1016/S0167-6393(00)00057-1

M3 - Article

AN - SCOPUS:0035400321

VL - 34

SP - 369

EP - 390

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

IS - 4

ER -