Low bit-rate speech coding based on an improved sinusoidal model

Sassan Ahmadi, Andreas Spanias

Research output: Contribution to journalArticlepeer-review

17 Scopus citations

Abstract

This paper addresses the design, implementation and evaluation of efficient low bit-rate speech coding algorithms based on an improved sinusoidal model. A series of algorithms were developed for speech classification and pitch frequency determination, modeling of sinusoidal amplitudes and phases, and frame interpolation. An improved paradigm for sinusoidal phase coding is presented, where short-time sinusoidal phases are modeled using a combination of linear prediction, spectral sampling, linear phase alignment and all-pass phase error correction components. A class-dependent split vector quantization scheme is used to encode the sinusoidal amplitudes. The masking properties of the human auditory system are effectively exploited in the algorithms. The algorithms were successfully integrated into a 2.4 kbps sinusoidal coder. The performance of the 2.4 kbps coder was evaluated in terms of informal subjective tests such as the mean opinion score (MOS) and the diagnostic rhyme test (DRT), as well as some perceptually motivated objective distortion measures. Performance analysis on a large speech database indicates considerable improvement in short-time signal matching both in the time and the spectral domains. In addition, subjective quality of the reproduced speech is considerably improved.

Original languageEnglish (US)
Pages (from-to)369-390
Number of pages22
JournalSpeech Communication
Volume34
Issue number4
DOIs
StatePublished - Jul 2001

Keywords

  • Frame interpolation
  • Linear prediction
  • Phase modeling
  • Sinusoidal model
  • Speech classification
  • Speech coding

ASJC Scopus subject areas

  • Software
  • Modeling and Simulation
  • Communication
  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Low bit-rate speech coding based on an improved sinusoidal model'. Together they form a unique fingerprint.

Cite this