Mixup-Based Acoustic Scene Classification Using Multi-Channel Convolutional Neural Network

@article{Xu2018MixupBasedAS,
  title={Mixup-Based Acoustic Scene Classification Using Multi-Channel Convolutional Neural Network},
  author={Kele Xu and Dawei Feng and Haibo Mi and Boqing Zhu and Dezhi Wang and Lilun Zhang and Hengxing Cai and S. Liu},
  journal={ArXiv},
  year={2018},
  volume={abs/1805.07319}
}
Audio scene classification, the problem of predicting class labels of audio scenes, has drawn lots of attention during the last several years. However, it remains challenging and falls short of accuracy and efficiency. Recently, Convolutional Neural Network (CNN)-based methods have achieved better performance with comparison to the traditional methods. Nevertheless, conventional single channel CNN may fail to consider the fact that additional cues may be embedded in the multi-channel recordings… 

Figures and Tables from this paper

Sample Dropout for Audio Scene Classification Using Multi-Scale Dense Connected Convolutional Neural Network
TLDR
Inspired by the silence removal in the speech signal processing, a novel sample dropout approach is proposed, which aims to remove outliers in the training dataset, and can further improve the classification robustness of multi-scale DenseNet.
CNNs-based Acoustic Scene Classification using Multi-Spectrogram Fusion and Label Expansions
TLDR
A novel multi-spectrogram fusion framework is proposed, making the spectrograms complement each other, and can achieve promising accuracies on both the DCASE2017 and the LITIS Rouen datasets.
Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features
TLDR
Results demonstrate that the proposed method is highly effective in the classification tasks by employing multi-temporal resolution and multi-level features, and it outperforms the previous methods which only account for single- level features.
Acoustic Scene Classification from Binaural Signals using Convolutional Neural Networks
TLDR
This paper describes the audio pre-processing, feature extraction steps and the time-frequency representations employed for acoustic scene classification using binaural recordings, and proposes two distinct and light-weight architectures of convolutional neural networks for processing the extracted audio features and classification.
Clustering by Errors: A Self-Organized Multitask Learning Method for Acoustic Scene Classification
TLDR
The experiments have shown that the proposed multitask learning method improves the performance of ASC, and the similarity relations amongst scenes are correlated with the classification error.
Low-complexity deep learning frameworks for acoustic scene classification
TLDR
This report presents low-complexity deep learning frameworks for acoustic scene classification (ASC) and fuse probabilities which obtained from three individual classifiers, which are independently-trained with three type of spectrograms to achieve the best classification accuracy.
A Two-Stage Approach to Device-Robust Acoustic Scene Classification
  • Hu Hu, C. Yang, Chin-Hui Lee
  • Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
TLDR
The results show that the proposed ASC system attains a state-of-the-art accuracy on the development set, where the best system, a two-stage fusion of CNN ensembles, delivers a 81.9% average accuracy among multi-device test data, and it obtains a significant improvement on unseen devices.
Multi-Scale Recalibrated Features Fusion for Acoustic Scene Classification Technical Report
TLDR
This work introduces the Squeeze-and-Excitation unit to embed the backbone structure of Xception to recalibrate the channel weights of feature maps in each block and introduces Mixup method to augment the data in training stage to reduce the degree of over-fitting of network.
A Robust Framework for Acoustic Scene Classification
TLDR
This paper aims at a robust framework that can explore and utilise a range of different time-frequency features and neural networks, either singly or merged, to achieve good classification performance.
...
...

References

SHOWING 1-10 OF 29 REFERENCES
Classifying Variable-Length Audio Files with All-Convolutional Networks and Masked Global Pooling
TLDR
A deep all-convolutional neural network with masked global pooling to perform single-label classification for acoustic scene classification and multi- label classification for domestic audio tagging in the DCASE-2016 contest improves the baselines by a relative amount of 17% and 19%, respectively.
Acoustic scene classification using convolutional neural network and multiple-width frequency-delta data augmentation
TLDR
This work applies ConvNet to acoustic scene classification, and shows that the error rate can be further decreased by using delta features in the frequency domain, and describes a ConvNet output aggregation method designed for MWFD augmentation, folded mean aggregation, which combines output probabilities of static and MWFD features from the same analysis window using multiplication first.
Convolutional gated recurrent neural network incorporating spatial features for audio tagging
TLDR
This paper proposes to use a convolutional neural network (CNN) to extract robust features from mel-filter banks, spectrograms or even raw waveforms for audio tagging to evaluate the proposed methods on Task 4 of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge.
AN I-VECTOR BASED APPROACH FOR AUDIO SCENE DETECTION
TLDR
The i-vector system is state-ofthe-art in Speaker Verification and Scene Detection, and is outperforming conventional Gaussian Mixture Model (GMM)-based approaches, and compensates for undesired acoustic variability and extracts information from the acoustic environment, making it a meaningful choice for detection on UGC.
CP-JKU SUBMISSIONS FOR DCASE-2016 : A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS
TLDR
This report describes the 4 submissions for Task 1 (Audio scene classification) of the DCASE-2016 challenge of the CP-JKU team and proposes a novel i-vector extraction scheme for ASC using both left and right audio channels and a Deep Convolutional Neural Network architecture trained on spectrograms of audio excerpts in end-to-end fashion.
Time Series Classification Using Multi-Channels Deep Convolutional Neural Networks
TLDR
A novel deep learning framework for multivariate time series classification is proposed that is not only more efficient than the state of the art but also competitive in accuracy and demonstrates that feature learning is worth to investigate for time series Classification.
Performance comparison of GMM, HMM and DNN based approaches for acoustic event detection within Task 3 of the DCASE 2016 challenge
TLDR
It is shown that the DNN based system performs worse than the traditional systems for this task and best results are achieved using GFB features in combination with a single label GMM-HMM approach.
THE UP SYSTEM FOR THE 2016 DCASE CHALLENGE USING DEEP RECURRENT NEURAL NETWORK AND MULTISCALE KERNEL SUBSPACE LEARNING
TLDR
A system for acoustic scene classification using pairwise decomposition with deep neural networks and dimensionality reduction by multiscale kernel subspace learning and a description of the actual system submitted to the DCASE2016 challenge is provided.
Score Fusion of Classification Systems for Acoustic Scene Classification
TLDR
This study explores several methods in three aspects; feature extraction, generative/discriminative machine learning, and score fusion for final decision on the acoustic scene classification task of the IEEE AASP Challenge: Detection and Classification of Acoustic Scenes and Events.
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
...
...