Identifying Optimal Features for Multi-channel Acoustic Scene Classification

  title={Identifying Optimal Features for Multi-channel Acoustic Scene Classification},
  author={Abigail Copiaco and Christian Ritz and Nidhal Abdulaziz and Stefano Fasciani},
  journal={2019 2nd International Conference on Signal Processing and Information Security (ICSPIS)},
  • A. Copiaco, C. Ritz, Stefano Fasciani
  • Published 1 October 2019
  • Computer Science
  • 2019 2nd International Conference on Signal Processing and Information Security (ICSPIS)
Recent approaches to audio classification are typically developed for single channel recordings of acoustic events. In contrast, approaches to acoustic classification of multichannel recordings of domestic audio have not been thoroughly investigated, especially for household recorded acoustic scenes. In this paper, we consider domestic multi-channel audio classification through the use of a Deep Convolutional Neural Network (DCNN) model. The DCNN is applied to cepstral and spectral-based… 

Figures and Tables from this paper

A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification
A detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications and the use of spectro-temporal features is presented, and the development of a compact version of the AlexNet model for computationally-limited platforms is detailed.
Identifying Sound Source Node Locations Using Neural Networks Trained with Phasograms
This work focuses on the phase component of the STFT coefficients to estimate the sound source location by classifying the closest microphone array (node) by mapping of the phase differences information within the time-frequency domain.
DASEE A Synthetic Database of Domestic Acoustic Scenes and Events in Dementia Patients Environment
This work details its approach on generating an unbiased synthetic domestic audio database, consisting of sound scenes and events, emulated in both quiet and noisy environments, and presents an 11-class database containing excerpts of clean and noisy signals.
An Application for Dementia Patient Monitoring with Sound Level Assessment Tool
This work proposes an application with an intuitive interface that allows the acoustic monitoring of the patient without infringing their privacy and implements a sound level assessment tool, such that the time-average levels of the sound are compared to the recommended levels depending on the specific location and time of the day.


A convolutional neural network approach for acoustic scene classification
This paper proposes the use of a CNN trained to classify short sequences of audio, represented by their log-mel spectrogram, and introduces a training method that can be used under particular circumstances in order to make full use of small datasets.
Acoustic Features for Environmental Sound Analysis
The general processing chain to convert an sound signal to a feature vector that can be efficiently exploited by a classifier and the relation to features used for speech and music processing are described is this chapter.
Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition
  • Chanwoo Kim, R. Stern
  • Computer Science
    IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2016
Experimental results demonstrate that PNCC processing provides substantial improvements in recognition accuracy compared to MFCC and PLP processing for speech in the presence of various types of additive noise and in reverberant environments, with only slightly greater computational cost than conventional MFCC processing.
Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features
The proposed SED system is compared against the state of the art mono channel method on the development subset of TUT sound events detection 2016 database and the usage of spatial and harmonic features are shown to improve the performance of SED.
Deep Convolutional Neural Network with Scalogram for Audio Scene Modeling
An approach to learning audio scene pat-terns from scalogram, which is extracted from raw signal with simple wavelet transforms is proposed, which showed that multi-scale feature led to an obvious accuracy increase.
Audio source separation with time-frequency velocities
A new approach is introduced, which relies on the time dynamics of rigid audio models, based on harmonic templates, which provides piecewise constant velocity approximations for blind source separation from single channel audio signals.
Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction
Experimental results demonstrate that the PNCC processing provides substantial improvements in recognition accuracy compared to MFCC and PLP processing for various types of additive noise.
The SINS Database for Detection of Daily Activities in a Home Environment Using an Acoustic Sensor Network
A database recorded in one living home, over a period of one week, containing activities being performed in a spontaneous manner, which make use of an acoustic sensor network, and are recorded as a continuous stream is introduced.
This technical report describes the proposed design and implementation of the system used for the DCASE 2018 Challenge submission, and proposes data augmentation techniques using shuffling and mixing two sounds in a same class to mitigate the unbalanced training dataset.