Perceptual segmentation and component selection for sinusoidal representations of audio

Ted Painter; Andreas Spanias

doi:10.1109/TSA.2004.841050

Perceptual segmentation and component selection for sinusoidal representations of audio

Ted Painter, Andreas Spanias

Electrical Engineering

Research output: Contribution to journal › Article › peer-review

19 Scopus citations

Abstract

This paper presents two fundamental enhancements in a hybrid audio signal model consisting of sinusoidal, transient, and noise (STN) components. The first enhancement involves a novel application of a perceptual metric for optimal time segmentation for the analysis of transients. In particular, Moore and Glasberg's model of partial loudness is modified for use with general signals and then integrated into a novel time segmentation scheme. The second, and perhaps more significant STN enhancement is concerned with a new methodology for ranking and selection of the most perceptually relevant sinusoids. A systematic procedure is developed for the selection of a compact set of sinusoids and comparative results are given to demonstrate the merit of this method.

Original language	English (US)
Pages (from-to)	149-161
Number of pages	13
Journal	IEEE Transactions on Speech and Audio Processing
Volume	13
Issue number	2
DOIs	https://doi.org/10.1109/TSA.2004.841050
State	Published - Mar 2005

Keywords

Audio coding
Psychoacoustics
Segmentation
Sinusoidal models

ASJC Scopus subject areas

Software
Acoustics and Ultrasonics
Computer Vision and Pattern Recognition
Electrical and Electronic Engineering

Access to Document

10.1109/TSA.2004.841050

Cite this

@article{45ff41038bd249f4a48142eab92f460b,

title = "Perceptual segmentation and component selection for sinusoidal representations of audio",

abstract = "This paper presents two fundamental enhancements in a hybrid audio signal model consisting of sinusoidal, transient, and noise (STN) components. The first enhancement involves a novel application of a perceptual metric for optimal time segmentation for the analysis of transients. In particular, Moore and Glasberg's model of partial loudness is modified for use with general signals and then integrated into a novel time segmentation scheme. The second, and perhaps more significant STN enhancement is concerned with a new methodology for ranking and selection of the most perceptually relevant sinusoids. A systematic procedure is developed for the selection of a compact set of sinusoids and comparative results are given to demonstrate the merit of this method.",

keywords = "Audio coding, Psychoacoustics, Segmentation, Sinusoidal models",

author = "Ted Painter and Andreas Spanias",

note = "Funding Information: Manuscript received June 18, 2001; revised April 15, 2003. This work was supported in part by the Intel Research Council. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Peter Vary. T. Painter is with the Intel Corporation HD2-230, Handheld Computing Division, Hudson, MA 01749 USA (e-mail: ted.painter@intel.com). A. Spanias is with the Department of Electrical Engineering, Arizona State University, Tempe, AZ 85287-7206 USA (e-mail: spanias@asu.edu). Digital Object Identifier 10.1109/TSA.2004.841050",

year = "2005",

month = mar,

doi = "10.1109/TSA.2004.841050",

language = "English (US)",

volume = "13",

pages = "149--161",

journal = "IEEE Transactions on Speech and Audio Processing",

issn = "1063-6676",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "2",

}

TY - JOUR

T1 - Perceptual segmentation and component selection for sinusoidal representations of audio

AU - Painter, Ted

AU - Spanias, Andreas

N1 - Funding Information: Manuscript received June 18, 2001; revised April 15, 2003. This work was supported in part by the Intel Research Council. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Peter Vary. T. Painter is with the Intel Corporation HD2-230, Handheld Computing Division, Hudson, MA 01749 USA (e-mail: ted.painter@intel.com). A. Spanias is with the Department of Electrical Engineering, Arizona State University, Tempe, AZ 85287-7206 USA (e-mail: spanias@asu.edu). Digital Object Identifier 10.1109/TSA.2004.841050

PY - 2005/3

Y1 - 2005/3

N2 - This paper presents two fundamental enhancements in a hybrid audio signal model consisting of sinusoidal, transient, and noise (STN) components. The first enhancement involves a novel application of a perceptual metric for optimal time segmentation for the analysis of transients. In particular, Moore and Glasberg's model of partial loudness is modified for use with general signals and then integrated into a novel time segmentation scheme. The second, and perhaps more significant STN enhancement is concerned with a new methodology for ranking and selection of the most perceptually relevant sinusoids. A systematic procedure is developed for the selection of a compact set of sinusoids and comparative results are given to demonstrate the merit of this method.

AB - This paper presents two fundamental enhancements in a hybrid audio signal model consisting of sinusoidal, transient, and noise (STN) components. The first enhancement involves a novel application of a perceptual metric for optimal time segmentation for the analysis of transients. In particular, Moore and Glasberg's model of partial loudness is modified for use with general signals and then integrated into a novel time segmentation scheme. The second, and perhaps more significant STN enhancement is concerned with a new methodology for ranking and selection of the most perceptually relevant sinusoids. A systematic procedure is developed for the selection of a compact set of sinusoids and comparative results are given to demonstrate the merit of this method.

KW - Audio coding

KW - Psychoacoustics

KW - Segmentation

KW - Sinusoidal models

UR - http://www.scopus.com/inward/record.url?scp=14644436350&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=14644436350&partnerID=8YFLogxK

U2 - 10.1109/TSA.2004.841050

DO - 10.1109/TSA.2004.841050

M3 - Article

AN - SCOPUS:14644436350

SN - 1063-6676

VL - 13

SP - 149

EP - 161

JO - IEEE Transactions on Speech and Audio Processing

JF - IEEE Transactions on Speech and Audio Processing

IS - 2

ER -

Perceptual segmentation and component selection for sinusoidal representations of audio

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this