Deep Learning and Music Adversaries

@article{Kereliuk2015DeepLA,
  title={Deep Learning and Music Adversaries},
  author={Corey Kereliuk and Bob L. Sturm and Jan Larsen},
  journal={IEEE Transactions on Multimedia},
  year={2015},
  volume={17},
  pages={2059-2071}
}
An adversary is an agent designed to make a classification system perform in some particular way, e.g., increase the probability of a false negative. Recent work builds adversaries for deep learning systems applied to image object recognition, exploiting the parameters of the system to find the minimal perturbation of the input image such that the system misclassifies it with high confidence. We adapt this approach to construct and deploy an adversary of deep learning systems applied to music… 

Figures and Tables from this paper

Universal Adversarial Audio Perturbations
TLDR
It is demonstrated the existence of universal adversarial perturbations, which can fool a family of audio classification architectures, for both targeted and untargeted attack scenarios, and a proof that the proposed penalty method theoretically converges to a solution that corresponds to universal adversaries.
Attack Agnostic Statistical Method for Adversarial Detection
TLDR
This paper presents a novel statistical approach for adversarial detection in image classification based on constructing a per-class feature distribution and detecting adversaries based on comparison of features of a test image with the feature distribution of its class.
Classification score approach for detecting adversarial example in deep neural network
TLDR
A new method for detecting adversarial examples that does not invoke any additional process is proposed that can detect adversarialExamples by using a pattern feature of the classification scores of adversarial Examples.
Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions
TLDR
A novel probabilistic framework to generalize and extend adversarial attacks in order to produce a desired probability distribution for the classes when the authors apply the attack method to a large number of inputs is proposed.
Advanced Ensemble Adversarial Example on Unknown Deep Neural Network Classifiers
TLDR
This work proposes a generalized adversarial example attack method that can effectively attack unknown classifiers by using a hierarchical ensemble method and shows that the proposed method can achieve attack success rates for an unknown classifier of up to 9.25% and 18.94% higher on MNIST data and CIFAR10 data.
Efficient Generation of Speech Adversarial Examples with Generative Model
TLDR
This paper proposes to use the conditional generative adversarial network (CGAN) to efficiently generate speech adversarial examples and demonstrates the transferability of generated adversarial example among the different KWS classification networks.
SirenAttack: Generating Adversarial Audio for End-to-End Acoustic Systems
TLDR
SirenAttack is evaluated on a set of state-of-the-art deep learning-based acoustic systems (including speech command recognition, speaker recognition and sound event classification), with results showing the versatility, effectiveness, and stealthiness of SirenAttack.
Random Untargeted Adversarial Example on Deep Neural Network
TLDR
This paper proposes a new method for generating untargeted adversarial examples, one that uses an arbitrary class in the generation process, and shows that the proposed scheme can be applied to steganography and can be used to fool humans.
On the human evaluation of audio adversarial examples
TLDR
This paper investigates to which extent the distortion metrics proposed in the literature for audio adversarial examples, and which are commonly applied to evaluate the effectiveness of methods for generating these attacks, are a reliable measure of the human perception of the perturbations.
...
...

References

SHOWING 1-10 OF 91 REFERENCES
Deep learning, audio adversaries, and music content analysis
TLDR
This work designs an adversary for a DNN that takes as input short-time spectral magnitudes of recorded music and outputs a high-level music descriptor, and demonstrates how this adversary can make the DNN behave in any way with only extremely minor changes to the music recording signal.
Towards Deep Neural Network Architectures Robust to Adversarial Examples
TLDR
Deep Contractive Network is proposed, a model with a new end-to-end training procedure that includes a smoothness penalty inspired by the contractive autoencoder (CAE) to increase the network robustness to adversarial examples, without a significant performance penalty.
Deep neural networks are easily fooled: High confidence predictions for unrecognizable images
TLDR
This work takes convolutional neural networks trained to perform well on either the ImageNet or MNIST datasets and finds images with evolutionary algorithms or gradient ascent that DNNs label with high confidence as belonging to each dataset class, and produces fooling images, which are then used to raise questions about the generality of DNN computer vision.
Learning Features from Music Audio with Deep Belief Networks
TLDR
This work presents a system that can automatically extract relevant features from audio for a given task by using a Deep Belief Network on Discrete Fourier Transforms of the audio to solve the task of genre recognition.
Unsupervised feature learning for audio classification using convolutional deep belief networks
In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. However, to our knowledge, these deep learning
Deep Learning
TLDR
Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
¿El Caballo Viejo? Latin Genre Recognition with Deep Learning and Spectral Periodicity
TLDR
It is found that tempo changes of at most \(\pm 6\,\%\) greatly diminish and improve the “winning” deep learning system using LMD, and this supports the conclusion that the system is exploiting some low-level absolute time characteristics to reproduce ground truth in LMD.
Improved music feature learning with deep neural networks
  • Siddharth SigtiaS. Dixon
  • Computer Science
    2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
TLDR
3 ways to improve feature learning for audio data using neural networks are examined: using Rectified Linear Units (ReLUs) instead of standard sigmoid units; using a powerful regularisation technique called Dropout; and using Hessian-Free (HF) optimisation to improve training of sigmoids nets.
End-to-end learning for music audio
  • S. DielemanB. Schrauwen
  • Computer Science
    2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
TLDR
Although convolutional neural networks do not outperform a spectrogram-based approach, the networks are able to autonomously discover frequency decompositions from raw audio, as well as phase-and translation-invariant feature representations.
A deep representation for invariance and music classification
TLDR
A novel, mid-level representation for acoustical signals is proposed, using the empirical distributions of projections on a set of templates and their transformations, and is empirically evaluated on music genre classification.
...
...