A Compact and Discriminative Feature Based on Auditory Summary Statistics for Acoustic Scene Classification

  title={A Compact and Discriminative Feature Based on Auditory Summary Statistics for Acoustic Scene Classification},
  author={Hongwei Song and Jiqing Han and Shiwen Deng},
One of the biggest challenges of acoustic scene classification (ASC) is to find proper features to better represent and characterize environmental sounds. Environmental sounds generally involve more sound sources while exhibiting less structure in temporal spectral representations. However, the background of an acoustic scene exhibits temporal homogeneity in acoustic properties, suggesting it could be characterized by distribution statistics rather than temporal details. In this work, we… 

Figures and Tables from this paper

Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events
This study indicates that recognizing acoustic scenes by identifying distinct sound events is effective and paves the way for future studies that combine this strategy with previous ones.
A Robust Framework for Acoustic Scene Classification
This paper aims at a robust framework that can explore and utilise a range of different time-frequency features and neural networks, either singly or merged, to achieve good classification performance.
Sound Context Classification Basing on Join Learning Model and Multi-Spectrogram Features
A deep learning framework applied for Acoustic Scene Classification (ASC), the task of classifying different environments from the sounds they produce, is presented and a novel joint learning model using a parallel architecture of Convolutional Neural Network and C-RNN is proposed.
Acoustic scene classification using multi-layer temporal pooling based on convolutional neural network
A multiple layers temporal pooling method using CNN feature sequence as in-put, which can effectively capture the temporal dynamics for an entire audio signal with arbitrary duration by building direct connections between the sequence and its time indexes is proposed.
This work targets the task 1A and 1B of DCASE2019 challenge that are Acoustic Scene Classification (ASC) over ten different classes recorded by a same device and mismatched devices and proposes a combination of three types of spectrograms: Gammatone, logMel and Constant Q Transform.
A Re-trained Model Based On Multi-kernel Convolutional Neural Network for Acoustic Scene Classification
This paper proposes a deep learning framework applied for Acoustic Scene Classification (ASC), which identifies recording location. In general, we apply three types of spectrograms: Gammatone (GAM),
This work proposes a deep learning framework applied for Acoustic Scene Classification (ASC), targeting DCASE2019 task 1A, which shows a combination of three types of spectrograms: Gammatone, log-Mel and Constant Q Transform.


Summary statistics in auditory perception
Evidence is provided that the auditory system summarizes the temporal details of sounds using time-averaged statistics, which, for different examples of the same texture, converge to the same values with increasing duration, indicating that once these sounds are of moderate length, the brain's representation is limited to time-aversaged statistics.
Classifying soundtracks with audio texture features
It is shown that the texture statistics perform as well as the best conventional statistics (based on MFCC covariance) and the relative contributions of the different statistics are examined, showing the importance of modulation spectra and cross-band envelope correlations.
Feature Learning With Matrix Factorization Applied to Acoustic Scene Classification
It is shown that the unsupervised learning methods provide better representations of acoustic scenes than the best conventional hand-crafted features on both datasets and the introduction of a novel nonnegative supervised matrix factorization model and deep neural networks trained on spectrograms allow for further improvements.
A hybrid approach with multi-channel i-vectors and convolutional neural networks for acoustic scene classification
A novel multi-channel i-vector extraction and scoring scheme for ASC and a CNN architecture that achieves promising ASC results are proposed, and it is shown that i-vectors and CNNs capture complementary information from acoustic scenes.
Acoustic Scene Classification: Classifying environments from the sounds they produce
An account of the state of the art in acoustic scene classification (ASC), the task of classifying environments from the sounds they produce, and a range of different algorithms submitted for a data challenge to provide a general and fair benchmark for ASC techniques.
Enhanced LBP texture features from time frequency representations for acoustic scene classification
This paper proposes a novel zoning mechanism that provides a simple solution to extract spectrally relevant local features which better characterize the audio TFRs and demonstrates an improved performance by achieving a classification accuracy of 95.2% using a fusion of time-frequency derived features.
The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music.
This paper proposes to explicitly examine the difference between urban soundscapes and polyphonic music with respect to their modeling with the BOF approach, and reveals critical differences in the temporal and statistical structure of the typical frame distribution of each type of signal.
TUT database for acoustic scene classification and sound event detection
The recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models are presented.
A comparison of Deep Learning methods for environmental sound detection
This work presents a comparison of several state-of-the-art Deep Learning models on the IEEE challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 challenge task and data, classifying sounds into one of fifteen common indoor and outdoor acoustic scenes.