Robust Audio Adversarial Example for a Physical Attack

@inproceedings{Yakura2019RobustAA,
  title={Robust Audio Adversarial Example for a Physical Attack},
  author={Hiromu Yakura and Jun Sakuma},
  booktitle={IJCAI},
  year={2019}
}
We propose a method to generate audio adversarial examples that can attack a state-of-the-art speech recognition model in the physical world. [] Key Method In contrast, our method obtains robust adversarial examples by simulating transformations caused by playback or recording in the physical world and incorporating the transformations into the generation process. Evaluation and a listening experiment demonstrated that our adversarial examples are able to attack without being noticed by humans. This result…

Figures and Tables from this paper

Adversarial attack on Speech-to-Text Recognition Models
TLDR
This paper introduces the first study of weighted-sampling audio adversarial examples, specifically focusing on the factor of the numbers and the positions of distortion to reduce the search space, and proposes a new attack scenario, audio injection attack, which offers some novel insights in the concealment of adversarial attack.
Generating Robust Audio Adversarial Examples with Temporal Dependency
TLDR
A new Iterative Proportional Clipping (IPC) algorithm is proposed that preserves temporal dependency in audios for generating more robust adversarial examples and can significantly reduce human-perceptible noises and resist the defenses based on the temporal structure.
Towards Resistant Audio Adversarial Examples
TLDR
This work finds that due to flaws in the generation process, state-of-the-art adversarial example generation methods cause overfitting because of the binning operation in the target speech recognition system (e.g., Mozilla Deepspeech), and devise an approach to mitigate this flaw, which improves generation of adversarial examples with varying offsets.
Detecting Audio Adversarial Examples with Logit Noising
TLDR
This paper proposes a novel method to detect audio adversarial examples by adding noise to the logits before feeding them into the decoder of the ASR, and shows that carefully selected noise can significantly impact the transcription results of the audio adversarian examples, whereas it has minimal impact on the transcriptionresults of benign audio waves.
Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition
TLDR
This paper develops effectively imperceptible audio adversarial examples by leveraging the psychoacoustic principle of auditory masking, while retaining 100% targeted success rate on arbitrary full-sentence targets and makes progress towards physical-world over-the-air audio adversaria examples by constructing perturbations which remain effective even after applying realistic simulated environmental distortions.
Robustness of Adversarial Attacks in Sound Event Classification
TLDR
This paper investigates the robustness of adversarial examples to simple input transformations such as mp3 compression, resampling, white noise and reverb in the task of sound event classification to provide insights on strengths and weaknesses in current adversarial attack algorithms and provide a baseline for defenses against adversarial attacks.
A Unified Framework for Detecting Audio Adversarial Examples
TLDR
A unified adversarial detection framework for detecting adaptive audio adversarial examples, which combines noise padding with sound reverberation is proposed, which consistently outperforms the state-of-the-art audio defense methods, even for the adaptive and robust attacks.
Audio Adversarial Examples Generation with Recurrent Neural Networks*
TLDR
A new type of real-time adversarial attack methodology is introduced, which applies Recurrent Neural Networks with a two-step training process to generate adversarial examples targeting a Keyword Spotting (KWS) system and is extended to physical world by adding extra constraints in order to eliminate the distortions in real world.
AdvPulse: Universal, Synchronization-free, and Targeted Audio Adversarial Attacks via Subsecond Perturbations
TLDR
AdvPulse is proposed, a systematic approach to generate subsecond audio adversarial perturbations that achieves the capability to alter the recognition results of streaming audio inputs in a targeted and synchronization-free manner and exploits penalty-based universal adversarialperturbation generation algorithm and incorporates the varying time delay into the optimization process.
WaveGuard: Understanding and Mitigating Audio Adversarial Examples
TLDR
WaveGuard is introduced: a framework for detecting adversarial inputs that are crafted to attack ASR systems and empirically demonstrates that audio transformations that recover audio from perceptually informed representations can lead to a strong defense that is robust against an adaptive adversary even in a complete whitebox setting.
...
...

References

SHOWING 1-10 OF 33 REFERENCES
Audio Adversarial Examples: Targeted Attacks on Speech-to-Text
TLDR
A white-box iterative optimization-based attack to Mozilla's implementation DeepSpeech end-to-end has a 100% success rate, and the feasibility of this attack introduce a new domain to study adversarial examples.
Did you hear that? Adversarial Examples Against Automatic Speech Recognition
TLDR
A first of its kind demonstration of adversarial attacks against speech classification model by adding small background noise without having to know the underlying model parameter and architecture is presented.
Synthesizing Robust Adversarial Examples
TLDR
The existence of robust 3D adversarial objects is demonstrated, and the first algorithm for synthesizing examples that are adversarial over a chosen distribution of transformations is presented, which synthesizes two-dimensional adversarial images that are robust to noise, distortion, and affine transformation.
Houdini : Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples
TLDR
This work introduces a novel flexible approach named Houdini for generating adversarial examples specifically tailored for the final performance measure of the task considered, be it combinatorial and non-decomposable.
CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition
TLDR
Novel techniques are developed that address a key technical challenge: integrating the commands into a song in a way that can be effectively recognized by ASR through the air, in the presence of background noise, while not being detected by a human listener.
Reverberation robust acoustic modeling using i-vectors with time delay neural networks
TLDR
iVectors are used as an input to the neural network to perform instantaneous speaker and environment adaptation, providing 10% relative improvement in word error rate, and subsampling the outputs at TDNN layers across time steps, training time is reduced.
Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
TLDR
It is shown that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech-two vastly different languages, and is competitive with the transcription of human workers when benchmarked on standard datasets.
A binaural room impulse response database for the evaluation of dereverberation algorithms
TLDR
This paper describes a new database of binaural room impulse responses (BRIR), referred to as the Aachen Impulse Response (AIR), which covers a wide range of situations where digital hearing aids or other hands-free devices can be used.
An effective quality evaluation protocol for speech enhancement algorithms
TLDR
It is proposed that researchers use the evaluation core test set of TIMIT, with a set of noise les, and a combination of objective measures and subjective testing for broad and phone-level quality assessment for speech enhancement evaluation.
Deep Learning
TLDR
Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
...
...