Segmentation, indexing, and retrieval for environmental and natural sounds

Gordon Wichern, Jiachen Xue, Harvey Thornburg, Brandon Mechtley, Andreas Spanias

Research output: Contribution to journalArticle

46 Citations (Scopus)

Abstract

We propose a method for characterizing sound activity in fixed spaces through segmentation, indexing, and retrieval of continuous audio recordings. Regarding segmentation, we present a dynamic Bayesian network (DBN) that jointly infers onsets and end times of the most prominent sound events in the space, along with an extension of the algorithm for covering large spaces with distributed microphone arrays. Each segmented sound event is indexed with a hidden Markov model (HMM) that models the distribution of example-based queries that a user would employ to retrieve the event (or similar events). In order to increase the efficiency of the retrieval search, we recursively apply a modified spectral clustering algorithm to group similar sound events based on the distance between their corresponding HMMs. We then conduct a formal user study to obtain the relevancy decisions necessary for evaluation of our retrieval algorithm on both automatically and manually segmented sound clips. Furthermore, our segmentation and retrieval algorithms are shown to be effective in both quiet indoor and noisy outdoor recording conditions.

Original languageEnglish (US)
Article number5410056
Pages (from-to)688-707
Number of pages20
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume18
Issue number3
DOIs
StatePublished - Mar 2010

Fingerprint

retrieval
Acoustic waves
acoustics
recording
Audio recordings
clips
Bayesian networks
Hidden Markov models
Microphones
microphones
Clustering algorithms
coverings
evaluation

Keywords

  • Acoustic signal analysis
  • Acoustic signal detection
  • Bayes procedures
  • Clustering methods
  • Database query processing

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Acoustics and Ultrasonics

Cite this

Segmentation, indexing, and retrieval for environmental and natural sounds. / Wichern, Gordon; Xue, Jiachen; Thornburg, Harvey; Mechtley, Brandon; Spanias, Andreas.

In: IEEE Transactions on Audio, Speech and Language Processing, Vol. 18, No. 3, 5410056, 03.2010, p. 688-707.

Research output: Contribution to journalArticle

@article{78c1f4c5eb354940ba29222aa287f487,
title = "Segmentation, indexing, and retrieval for environmental and natural sounds",
abstract = "We propose a method for characterizing sound activity in fixed spaces through segmentation, indexing, and retrieval of continuous audio recordings. Regarding segmentation, we present a dynamic Bayesian network (DBN) that jointly infers onsets and end times of the most prominent sound events in the space, along with an extension of the algorithm for covering large spaces with distributed microphone arrays. Each segmented sound event is indexed with a hidden Markov model (HMM) that models the distribution of example-based queries that a user would employ to retrieve the event (or similar events). In order to increase the efficiency of the retrieval search, we recursively apply a modified spectral clustering algorithm to group similar sound events based on the distance between their corresponding HMMs. We then conduct a formal user study to obtain the relevancy decisions necessary for evaluation of our retrieval algorithm on both automatically and manually segmented sound clips. Furthermore, our segmentation and retrieval algorithms are shown to be effective in both quiet indoor and noisy outdoor recording conditions.",
keywords = "Acoustic signal analysis, Acoustic signal detection, Bayes procedures, Clustering methods, Database query processing",
author = "Gordon Wichern and Jiachen Xue and Harvey Thornburg and Brandon Mechtley and Andreas Spanias",
year = "2010",
month = "3",
doi = "10.1109/TASL.2010.2041384",
language = "English (US)",
volume = "18",
pages = "688--707",
journal = "IEEE Transactions on Speech and Audio Processing",
issn = "1558-7916",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "3",

}

TY - JOUR

T1 - Segmentation, indexing, and retrieval for environmental and natural sounds

AU - Wichern, Gordon

AU - Xue, Jiachen

AU - Thornburg, Harvey

AU - Mechtley, Brandon

AU - Spanias, Andreas

PY - 2010/3

Y1 - 2010/3

N2 - We propose a method for characterizing sound activity in fixed spaces through segmentation, indexing, and retrieval of continuous audio recordings. Regarding segmentation, we present a dynamic Bayesian network (DBN) that jointly infers onsets and end times of the most prominent sound events in the space, along with an extension of the algorithm for covering large spaces with distributed microphone arrays. Each segmented sound event is indexed with a hidden Markov model (HMM) that models the distribution of example-based queries that a user would employ to retrieve the event (or similar events). In order to increase the efficiency of the retrieval search, we recursively apply a modified spectral clustering algorithm to group similar sound events based on the distance between their corresponding HMMs. We then conduct a formal user study to obtain the relevancy decisions necessary for evaluation of our retrieval algorithm on both automatically and manually segmented sound clips. Furthermore, our segmentation and retrieval algorithms are shown to be effective in both quiet indoor and noisy outdoor recording conditions.

AB - We propose a method for characterizing sound activity in fixed spaces through segmentation, indexing, and retrieval of continuous audio recordings. Regarding segmentation, we present a dynamic Bayesian network (DBN) that jointly infers onsets and end times of the most prominent sound events in the space, along with an extension of the algorithm for covering large spaces with distributed microphone arrays. Each segmented sound event is indexed with a hidden Markov model (HMM) that models the distribution of example-based queries that a user would employ to retrieve the event (or similar events). In order to increase the efficiency of the retrieval search, we recursively apply a modified spectral clustering algorithm to group similar sound events based on the distance between their corresponding HMMs. We then conduct a formal user study to obtain the relevancy decisions necessary for evaluation of our retrieval algorithm on both automatically and manually segmented sound clips. Furthermore, our segmentation and retrieval algorithms are shown to be effective in both quiet indoor and noisy outdoor recording conditions.

KW - Acoustic signal analysis

KW - Acoustic signal detection

KW - Bayes procedures

KW - Clustering methods

KW - Database query processing

UR - http://www.scopus.com/inward/record.url?scp=76949085351&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=76949085351&partnerID=8YFLogxK

U2 - 10.1109/TASL.2010.2041384

DO - 10.1109/TASL.2010.2041384

M3 - Article

AN - SCOPUS:76949085351

VL - 18

SP - 688

EP - 707

JO - IEEE Transactions on Speech and Audio Processing

JF - IEEE Transactions on Speech and Audio Processing

SN - 1558-7916

IS - 3

M1 - 5410056

ER -