Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

@article{Carlini2018AudioAE,
  title={Audio Adversarial Examples: Targeted Attacks on Speech-to-Text},
  author={Nicholas Carlini and David A. Wagner},
  journal={2018 IEEE Security and Privacy Workshops (SPW)},
  year={2018},
  pages={1-7}
}
We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (recognizing up to 50 characters per second of audio). We apply our white-box iterative optimization-based attack to Mozilla's implementation DeepSpeech end-to-end, and show it has a 100% success rate. The feasibility of this attack introduce a new domain to study adversarial examples. 

Figures from this paper

Audio Adversarial Examples: Attacks Using Vocal Masks
TLDR
The feasibility of this audio adversarial attack introduces a new domain to study machine and human perception of speech and shows that these adversarial examples fool State-Of-The-Art Speech-To-Text systems, yet humans are able to consistently pick out the speech.
Towards Mitigating Audio Adversarial Perturbations
TLDR
This work aims to explore the robustness of these audio adversarial examples generated via two attack strategies by applying different signal processing methods to recover the original audio sequence, and shows that by inspecting the temporal consistency in speech signals, it can potentially identify non-adaptive audio adversaries.
Adversarial attack on Speech-to-Text Recognition Models
TLDR
This paper introduces the first study of weighted-sampling audio adversarial examples, specifically focusing on the factor of the numbers and the positions of distortion to reduce the search space, and proposes a new attack scenario, audio injection attack, which offers some novel insights in the concealment of adversarial attack.
Towards Weighted-Sampling Audio Adversarial Example Attack.
TLDR
Experiments show that this method is the first in the field to generate audio adversarial examples with low noise and high audio robustness at the minute time-consuming level.
Weighted-Sampling Audio Adversarial Example Attack
TLDR
Weighted-sampling audio adversarial examples are proposed, focusing on the numbers and the weights of distortion to reinforce the attack, and a denoising method in the loss function is applied to make the adversarial attack more imperceptible.
Universal Adversarial Perturbations for Speech Recognition Systems
TLDR
This work proposes an algorithm to find a single quasi-imperceptible perturbation, which when added to any arbitrary speech signal, will most likely fool the victim speech recognition model.
EvolMusic: towards musical adversarial examples for black-box attacks on speech-to-text
TLDR
EvolMusic is presented, the first targeted adversarial attack based on musical note-sequences, generated via an adaptive evolutionary approach in a black-box setting and evaluated against DeepSpeech v0.9.1 using the Fluent Speech Commands dataset.
Perceptual Based Adversarial Audio Attacks
TLDR
This paper demonstrates a physically realizableaudio adversarial attack, based on a psychoacoustic-property-based loss function, and automated generation of room impulse responses, to create adversarial attacks that are robust when played over a speaker in multiple environments.
Robust Audio Adversarial Example for a Physical Attack
TLDR
Evaluation and a listening experiment demonstrated that adversarial examples generated by the proposed method are able to attack a state-of-the-art speech recognition model in the physical world without being noticed by humans, suggesting that audio adversarial example may become a real threat.
GENERATING ROBUST AUDIO ADVERSARIAL EXAM-
TLDR
A new approach to generate adversarial audios using Iterative Proportional Clipping (IPC), which exploits temporal dependency in original audios to significantly limit human-perceptible noise and can bypass temporal dependency based defense mechanisms.
...
...

References

SHOWING 1-10 OF 53 REFERENCES
Crafting Adversarial Examples For Speech Paralinguistics Applications
TLDR
This work proposes a novel end-to-end scheme to generate adversarial examples by perturbing directly the raw waveform of an audio recording rather than specific acoustic features, which can lead to a significant performance drop of state-of-the-art deep neural networks.
Deep Learning and Music Adversaries
TLDR
This work builds adversaries for deep learning systems applied to image object recognition by exploiting the parameters of the system to find the minimal perturbation of the input image such that the system misclassifies it with high confidence.
Houdini: Fooling Deep Structured Prediction Models
TLDR
This work introduces a novel flexible approach named Houdini for generating adversarial examples specifically tailored for the final performance measure of the task considered, be it combinatorial and non-decomposable.
Adversarial Attacks on Neural Network Policies
TLDR
This work shows existing adversarial example crafting techniques can be used to significantly degrade test-time performance of trained policies, even with small adversarial perturbations that do not interfere with human perception.
Deep Speech: Scaling up end-to-end speech recognition
TLDR
Deep Speech, a state-of-the-art speech recognition system developed using end-to-end deep learning, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set.
Adversarial Examples for Generative Models
TLDR
This work explores methods of producing adversarial examples on deep generative models such as the variational autoencoder (VAE) and the VAE-GAN and presents three classes of attacks, motivating why an attacker might be interested in deploying such techniques against a target generative network.
Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples
TLDR
This work introduces the first practical demonstration that cross-model transfer phenomenon enables attackers to control a remotely hosted DNN with no access to the model, its parameters, or its training data, and introduces the attack strategy of fitting a substitute model to the input-output pairs in this manner, then crafting adversarial examples based on this auxiliary model.
Synthesizing Robust Adversarial Examples
TLDR
The existence of robust 3D adversarial objects is demonstrated, and the first algorithm for synthesizing examples that are adversarial over a chosen distribution of transformations is presented, which synthesizes two-dimensional adversarial images that are robust to noise, distortion, and affine transformation.
Adversarial Diversity and Hard Positive Generation
TLDR
A new psychometric perceptual adversarial similarity score (PASS) measure for quantifying adversarial images, the notion of hard positive generation is introduced, and a novel hot/cold approach for adversarial example generation is presented, which provides multiple possible adversarial perturbations for every single image.
Towards Evaluating the Robustness of Neural Networks
TLDR
It is demonstrated that defensive distillation does not significantly increase the robustness of neural networks, and three new attack algorithms are introduced that are successful on both distilled and undistilled neural networks with 100% probability are introduced.
...
...