Scalogram Neural Network Activations with Machine Learning for Domestic Multi-channel Audio Classification

  title={Scalogram Neural Network Activations with Machine Learning for Domestic Multi-channel Audio Classification},
  author={Abigail Copiaco and Christian Ritz and Stefano Fasciani and Nidhal Abdulaziz},
  journal={2019 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)},
  • A. Copiaco, C. Ritz, N. Abdulaziz
  • Published 1 December 2019
  • Computer Science
  • 2019 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)
Current methodologies explored for audio classification, particularly multi-channel audio, commonly involve the use of individual deep learning approaches. In this paper, we look at domestic multi-channel audio classification through a comparison of various combinations of existing pre-trained Neural Network (NN) models, with Support Vector Machine (SVM) for classification. The NN model is first trained with spectro-temporal features extracted from the audio, characterized by scalogram images… 

Figures and Tables from this paper

Identifying Sound Source Node Locations Using Neural Networks Trained with Phasograms
This work focuses on the phase component of the STFT coefficients to estimate the sound source location by classifying the closest microphone array (node) by mapping of the phase differences information within the time-frequency domain.
Automated detection of the head-twitch response using wavelet scalograms and a deep convolutional neural network
An automated method is developed that can detect head twitches unambiguously, without relying on features in the amplitude-time domain, and can be used to automate HTR detection with robust sensitivity and reliability.
DASEE A Synthetic Database of Domestic Acoustic Scenes and Events in Dementia Patients Environment
This work details its approach on generating an unbiased synthetic domestic audio database, consisting of sound scenes and events, emulated in both quiet and noisy environments, and presents an 11-class database containing excerpts of clean and noisy signals.


Multi-channel Convolutional Neural Networks with Multi-level Feature Fusion for Environmental Sound Classification
The proposed method outperforms the state-of-the-art end-to-end methods for environmental sound classification in terms of the classification accuracy and is Inspired by VGG networks.
Deep Scalogram Representations for Acoustic Scene Classification
Spectrogram representations of acoustic scenes have achieved competitive performance for acoustic scene classification. Yet, the spectrogram alone does not take into account a substantial amount of
Acoustic Scene Classification Using Deep Convolutional Neural Network and Multiple Spectrograms Fusion
By fusing DCNN features of the standard and CQT spectrograms, the accuracy is significantly improved in the authors' experiments, comparing with the single spectrogram schemes, which proves the effectiveness of the proposed multi-spectrograms fusion method.
Mixup-Based Acoustic Scene Classification Using Multi-Channel Convolutional Neural Network
This paper explores the use of Multi-channel CNN for the classification task, which aims to extract features from different channels in an end-to-end manner, and explores the using of mixup method, which can provide higher prediction accuracy and robustness in contrast with previous models.
Acoustic scene classification using convolutional neural networks and multi-scale multi-feature extraction
Experimental results show that the use of multi-scale multi-feature extraction methods improves significantly the performance of the system and develops a convolutional neural network that outperforms the baseline approach by a large margin.
Domestic Cat Sound Classification Using Learned Features from Deep Neural Nets
The domestic cat (Feliscatus) is one of the most attractive pets in the world, and it generates mysterious kinds of sound according to its mood and situation, so this work starts with building a small dataset named CatSound across 10 categories to deal with the automatic classification of cat sounds using machine learning.
This paper proposes folded mean aggregation, which first multiplies output probabilities of static and delta augmentation data from the same window first prior to audio clip-wise aggregation, and it is found that this method reduces the error rate further.
Improved Audio Scene Classification Based on Label-Tree Embeddings and Convolutional Neural Networks
This paper quantized and reduced the low-level features into the likelihoods of the metaclasses, on which the template learning and matching are efficient and investigates classification with label-tree embedding features learned from different low- level features as well as their fusion.
ImageNet classification with deep convolutional neural networks
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Speech and crosstalk detection in multichannel audio
Tests performed on a large corpus of recorded meetings show classification accuracies of up to 96%, and automatic speech recognition performance close to that obtained using ground truth segmentation.