An Interpretable Deep Learning Model for Automatic Sound Classification

  title={An Interpretable Deep Learning Model for Automatic Sound Classification},
  author={Pablo Zinemanas and Mart{\'i}n Rocamora and Marius Miron and Frederic Font and Xavier Serra},
Deep learning models have improved cutting-edge technologies in many research areas, but their black-box structure makes it difficult to understand their inner workings and the rationale behind their predictions. This may lead to unintended effects, such as being susceptible to adversarial attacks or the reinforcement of biases. There is still a lack of research in the audio domain, despite the increasing interest in developing deep learning models that provide explanations of their decisions… 


A novel interpretable model for polyphonic sound event detection that tackles one of the limitations of the previous work, i.e. the inability to deal with a multi-label setting properly and which is comparable to that of two opaque baselines but with fewer parameters while offering interpretability.

A Model You Can Hear: Audio Identification with Playable Prototypes

An audio identification model based on learnable spectral prototypes based on dedicated transformation networks that can be used to cluster and classify input audio samples from large collections of sounds is proposed.

Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF

A novel interpreter design is proposed that incorporates non-negative matrix factorization (NMF) and is trained to take hidden layer representations of the targeted network as input and produce time activations of pre-learnt NMF components as intermediate outputs.

Sound Classification and Processing of Urban Environments: A Systematic Literature Review

It can be realized that Deep Learning architectures, attention mechanisms, data augmentation techniques, and pretraining are the most crucial factors to consider while creating an efficient sound classification model.

Concept-Based Techniques for "Musicologist-friendly" Explanations in a Deep Music Classifier

This research focuses on more human-friendly explanations based on high-level musical concepts and explores two approaches: a supervised one, where the user can define a musical concept and test if it is relevant to the system; and an unsupervised one,where musical excerpts containing relevant concepts are automatically selected and given to the user for interpretation.

Prototype Learning for Interpretable Respiratory Sound Analysis

  • Zhao RenT. NguyenW. Nejdl
  • Computer Science
    ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2022
A prototype learning framework is proposed, that jointly generates exemplar samples for explanation and integrates these samples into a layer of DNNs, that outperforms the state-of-the-art approaches on the largest public respiratory sound database.

Light Field Image Quality Enhancement by a Lightweight Deformable Deep Learning Framework for Intelligent Transportation Systems

A Lightweight Deformable Deep Learning Framework is implemented, in which the problem of disparity into LF images is treated and an angular alignment module and a soft activation function into the Convolutional Neural Network are implemented.

Acoustic Sensor-Based Approach for Detecting Damage in Masonry Structures

This article proposes an acoustic sensor based model for classification, the task is to detect damage in masonry structures. In impact acoustic, we make use of a small metal object to tap on a

ProtoMF: Prototype-based Matrix Factorization for Effective and Explainable Recommendations

The idea of prototypes is extended to the recommender system domain by introducing ProtoMF, a novel collaborative filtering algorithm that learns sets of user/item prototypes that represent the general consumption characteristics of users/items in the underlying dataset.

Wind Sounds Classification Using Different Audio Feature Extraction Techniques

Experimental results show that each of these extraction feature methods give different results, but classification accuracy that is obtained by using PLP features return the best results.



Local Interpretable Model-Agnostic Explanations for Music Content Analysis

This work proposes three versions of explanations for model-agnostic explanations for the neural network model that are based on frequency and time-frequency segmentation and demonstrates that despite achieving 71.4% classification accuracy, the decision tree model fails to generalise.

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

It is shown that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a “shallow” dictionary learning model with augmentation.

SampleCNN: End-to-End Deep Convolutional Neural Networks Using Very Small Filters for Music Classification

A CNN architecture which learns representations using sample-level filters beyond typical frame-level input representations is proposed and extended using multi-level and multi-scale feature aggregation technique and subsequently conduct transfer learning for several music classification tasks.

Deep Learning for Case-based Reasoning through Prototypes: A Neural Network that Explains its Predictions

This work creates a novel network architecture for deep learning that naturally explains its own reasoning for each prediction, and the explanations are loyal to what the network actually computes.

Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals

This paper presents a novel audio dataset of English spoken digits which is used for classification tasks on spoken digits and speaker's gender and confirms that the networks are highly reliant on features marked as relevant by LRP.

Intriguing properties of neural networks

It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.

This looks like that: deep learning for interpretable image recognition

A deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way to the way ornithologists, physicians, and others would explain to people on how to solve challenging image classification tasks, that provides a level of interpretability that is absent in other interpretable deep models.

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

Concept Activation Vectors (CAVs) are introduced, which provide an interpretation of a neural net's internal state in terms of human-friendly concepts, and may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application.

Concept Whitening for Interpretable Image Recognition

This work introduces a mechanism, called concept whitening (CW), to alter a given layer of the network to allow us to better understand the computation leading up to that layer and can provide a much clearer understanding for how the network gradually learns concepts over layers.