A Compact and Discriminative Feature Based on Auditory Summary Statistics for Acoustic Scene Classification

@inproceedings{Song2018ACA,
  title={A Compact and Discriminative Feature Based on Auditory Summary Statistics for Acoustic Scene Classification},
  author={Hongwei Song and Jiqing Han and Shiwen Deng},
  booktitle={INTERSPEECH},
  year={2018}
}
One of the biggest challenges of acoustic scene classification (ASC) is to find proper features to better represent and characterize environmental sounds. Environmental sounds generally involve more sound sources while exhibiting less structure in temporal spectral representations. However, the background of an acoustic scene exhibits temporal homogeneity in acoustic properties, suggesting it could be characterized by distribution statistics rather than temporal details. In this work, we… 

Figures and Tables from this paper

Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events
TLDR
This study indicates that recognizing acoustic scenes by identifying distinct sound events is effective and paves the way for future studies that combine this strategy with previous ones.
A Robust Framework for Acoustic Scene Classification
TLDR
This paper aims at a robust framework that can explore and utilise a range of different time-frequency features and neural networks, either singly or merged, to achieve good classification performance.
Sound Context Classification Basing on Join Learning Model and Multi-Spectrogram Features
TLDR
A deep learning framework applied for Acoustic Scene Classification (ASC), the task of classifying scene contexts from environmental input sounds, is presented and a novel join learning architecture using parallel convolutional recurrent networks is proposed, effective to learn spatial features and temporal sequences of spectrogram input.
Acoustic scene classification using multi-layer temporal pooling based on convolutional neural network
TLDR
A multiple layers temporal pooling method using CNN feature sequence as in-put, which can effectively capture the temporal dynamics for an entire audio signal with arbitrary duration by building direct connections between the sequence and its time indexes is proposed.
A Re-trained Model Based On Multi-kernel Convolutional Neural Network for Acoustic Scene Classification
This paper proposes a deep learning framework applied for Acoustic Scene Classification (ASC), which identifies recording location. In general, we apply three types of spectrograms: Gammatone (GAM),
A MULTI-SPECTROGRAM DEEP NEURAL NETWORK FOR ACOUSTIC SCENE CLASSIFICATION Technical Report
TLDR
This work targets the task 1A and 1B of DCASE2019 challenge that are Acoustic Scene Classification (ASC) over ten different classes recorded by a same device and mismatched devices and proposes a combination of three types of spectrograms: Gammatone, logMel and Constant Q Transform.
CDNN-CRNN JOINED MODEL FOR ACOUSTIC SCENE CLASSIFICATION Technical Report
TLDR
This work proposes a deep learning framework applied for Acoustic Scene Classification (ASC), targeting DCASE2019 task 1A, which shows a combination of three types of spectrograms: Gammatone, log-Mel and Constant Q Transform.

References

SHOWING 1-10 OF 23 REFERENCES
HOG and subband power distribution image features for acoustic scene classification
TLDR
This work proposes to use the Subband Power Distribution (SPD) as a feature to capture the occurrences of these events by computing the histogram of amplitude values in each frequency band of a spectrogram image by using the so-called Sinkhorn kernel.
Acoustic Scene Classification Using a CNN-SuperVector System Trained with Auditory and Spectrogram Image Features
TLDR
This study analyzes the performance of a state-of-theart CNN system for different auditory image and spectrogram features, including Mel-scaled, logarithmically scaled, linearly scaled filterbank spectrograms, and Stabilized Auditory Image (SAI) features, and benchmark an MFCC based Gaussian Mixture Model (GMM) SuperVector (SV) system for acoustic scene classification.
Summary statistics in auditory perception
TLDR
Evidence is provided that the auditory system summarizes the temporal details of sounds using time-averaged statistics, which, for different examples of the same texture, converge to the same values with increasing duration, indicating that once these sounds are of moderate length, the brain's representation is limited to time-aversaged statistics.
Classifying soundtracks with audio texture features
TLDR
It is shown that the texture statistics perform as well as the best conventional statistics (based on MFCC covariance) and the relative contributions of the different statistics are examined, showing the importance of modulation spectra and cross-band envelope correlations.
Feature Learning With Matrix Factorization Applied to Acoustic Scene Classification
TLDR
It is shown that the unsupervised learning methods provide better representations of acoustic scenes than the best conventional hand-crafted features on both datasets and the introduction of a novel nonnegative supervised matrix factorization model and deep neural networks trained on spectrograms allow for further improvements.
A hybrid approach with multi-channel i-vectors and convolutional neural networks for acoustic scene classification
TLDR
A novel multi-channel i-vector extraction and scoring scheme for ASC and a CNN architecture that achieves promising ASC results are proposed, and it is shown that i-vectors and CNNs capture complementary information from acoustic scenes.
Histogram of gradients of Time-Frequency Representations for Audio scene detection
TLDR
This paper addresses the problem of audio scenes classification and contributes to the state of the art by proposing a novel feature by considering histogram of gradients (HOG) of time-frequency representation of an audio scene and evaluating its performances with state-of-the-art competitors.
Acoustic Scene Classification: Classifying environments from the sounds they produce
TLDR
An account of the state of the art in acoustic scene classification (ASC), the task of classifying environments from the sounds they produce, and a range of different algorithms submitted for a data challenge to provide a general and fair benchmark for ASC techniques.
Enhanced LBP texture features from time frequency representations for acoustic scene classification
TLDR
This paper proposes a novel zoning mechanism that provides a simple solution to extract spectrally relevant local features which better characterize the audio TFRs and demonstrates an improved performance by achieving a classification accuracy of 95.2% using a fusion of time-frequency derived features.
...
1
2
3
...