A scalable feature learning and tag prediction framework for natural environment sounds

P. Sattigeri, J. J. Thiagarajan, M. Shah, K. N. Ramamurthy, Andreas Spanias

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Building feature extraction approaches that can effectively characterize natural environment sounds is challenging due to the dynamic nature. In this paper, we develop a framework for feature extraction and obtaining semantic inferences from such data. In particular, we propose a new pooling strategy for deep architectures, that can preserve the temporal dynamics in the resulting representation. By constructing an ensemble of semantic embeddings, we employ an l<inf>1</inf>-reconstruction based prediction algorithm for estimating the relevant tags. We evaluate our approach on challenging environmental sound recognition datasets, and show that the proposed features outperform traditional spectral features.

Original languageEnglish (US)
Title of host publicationConference Record - Asilomar Conference on Signals, Systems and Computers
PublisherIEEE Computer Society
Pages1779-1783
Number of pages5
Volume2015-April
ISBN (Print)9781479982974
DOIs
StatePublished - Apr 24 2015
Event48th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015 - Pacific Grove, United States
Duration: Nov 2 2014Nov 5 2014

Other

Other48th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015
CountryUnited States
CityPacific Grove
Period11/2/1411/5/14

Fingerprint

Feature extraction
Semantics
Acoustic waves

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Signal Processing

Cite this

Sattigeri, P., Thiagarajan, J. J., Shah, M., Ramamurthy, K. N., & Spanias, A. (2015). A scalable feature learning and tag prediction framework for natural environment sounds. In Conference Record - Asilomar Conference on Signals, Systems and Computers (Vol. 2015-April, pp. 1779-1783). [7094773] IEEE Computer Society. https://doi.org/10.1109/ACSSC.2014.7094773

A scalable feature learning and tag prediction framework for natural environment sounds. / Sattigeri, P.; Thiagarajan, J. J.; Shah, M.; Ramamurthy, K. N.; Spanias, Andreas.

Conference Record - Asilomar Conference on Signals, Systems and Computers. Vol. 2015-April IEEE Computer Society, 2015. p. 1779-1783 7094773.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sattigeri, P, Thiagarajan, JJ, Shah, M, Ramamurthy, KN & Spanias, A 2015, A scalable feature learning and tag prediction framework for natural environment sounds. in Conference Record - Asilomar Conference on Signals, Systems and Computers. vol. 2015-April, 7094773, IEEE Computer Society, pp. 1779-1783, 48th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015, Pacific Grove, United States, 11/2/14. https://doi.org/10.1109/ACSSC.2014.7094773
Sattigeri P, Thiagarajan JJ, Shah M, Ramamurthy KN, Spanias A. A scalable feature learning and tag prediction framework for natural environment sounds. In Conference Record - Asilomar Conference on Signals, Systems and Computers. Vol. 2015-April. IEEE Computer Society. 2015. p. 1779-1783. 7094773 https://doi.org/10.1109/ACSSC.2014.7094773
Sattigeri, P. ; Thiagarajan, J. J. ; Shah, M. ; Ramamurthy, K. N. ; Spanias, Andreas. / A scalable feature learning and tag prediction framework for natural environment sounds. Conference Record - Asilomar Conference on Signals, Systems and Computers. Vol. 2015-April IEEE Computer Society, 2015. pp. 1779-1783
@inproceedings{19767103795a48c4b73d7697c704b233,
title = "A scalable feature learning and tag prediction framework for natural environment sounds",
abstract = "Building feature extraction approaches that can effectively characterize natural environment sounds is challenging due to the dynamic nature. In this paper, we develop a framework for feature extraction and obtaining semantic inferences from such data. In particular, we propose a new pooling strategy for deep architectures, that can preserve the temporal dynamics in the resulting representation. By constructing an ensemble of semantic embeddings, we employ an l1-reconstruction based prediction algorithm for estimating the relevant tags. We evaluate our approach on challenging environmental sound recognition datasets, and show that the proposed features outperform traditional spectral features.",
author = "P. Sattigeri and Thiagarajan, {J. J.} and M. Shah and Ramamurthy, {K. N.} and Andreas Spanias",
year = "2015",
month = "4",
day = "24",
doi = "10.1109/ACSSC.2014.7094773",
language = "English (US)",
isbn = "9781479982974",
volume = "2015-April",
pages = "1779--1783",
booktitle = "Conference Record - Asilomar Conference on Signals, Systems and Computers",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - A scalable feature learning and tag prediction framework for natural environment sounds

AU - Sattigeri, P.

AU - Thiagarajan, J. J.

AU - Shah, M.

AU - Ramamurthy, K. N.

AU - Spanias, Andreas

PY - 2015/4/24

Y1 - 2015/4/24

N2 - Building feature extraction approaches that can effectively characterize natural environment sounds is challenging due to the dynamic nature. In this paper, we develop a framework for feature extraction and obtaining semantic inferences from such data. In particular, we propose a new pooling strategy for deep architectures, that can preserve the temporal dynamics in the resulting representation. By constructing an ensemble of semantic embeddings, we employ an l1-reconstruction based prediction algorithm for estimating the relevant tags. We evaluate our approach on challenging environmental sound recognition datasets, and show that the proposed features outperform traditional spectral features.

AB - Building feature extraction approaches that can effectively characterize natural environment sounds is challenging due to the dynamic nature. In this paper, we develop a framework for feature extraction and obtaining semantic inferences from such data. In particular, we propose a new pooling strategy for deep architectures, that can preserve the temporal dynamics in the resulting representation. By constructing an ensemble of semantic embeddings, we employ an l1-reconstruction based prediction algorithm for estimating the relevant tags. We evaluate our approach on challenging environmental sound recognition datasets, and show that the proposed features outperform traditional spectral features.

UR - http://www.scopus.com/inward/record.url?scp=84940510433&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84940510433&partnerID=8YFLogxK

U2 - 10.1109/ACSSC.2014.7094773

DO - 10.1109/ACSSC.2014.7094773

M3 - Conference contribution

SN - 9781479982974

VL - 2015-April

SP - 1779

EP - 1783

BT - Conference Record - Asilomar Conference on Signals, Systems and Computers

PB - IEEE Computer Society

ER -