Robust Audio Adversarial Example for a Physical Attack

  title={Robust Audio Adversarial Example for a Physical Attack},
  author={Hiromu Yakura and Jun Sakuma},
We propose a method to generate audio adversarial examples that can attack a state-of-the-art speech recognition model in the physical world. [] Key Method In contrast, our method obtains robust adversarial examples by simulating transformations caused by playback or recording in the physical world and incorporating the transformations into the generation process. Evaluation and a listening experiment demonstrated that our adversarial examples are able to attack without being noticed by humans. This result…

Figures and Tables from this paper

Weighted-Sampling Audio Adversarial Example Attack

Weighted-sampling audio adversarial examples are proposed, focusing on the numbers and the weights of distortion to reinforce the attack, and a denoising method in the loss function is applied to make the adversarial attack more imperceptible.

Towards Weighted-Sampling Audio Adversarial Example Attack.

Experiments show that this method is the first in the field to generate audio adversarial examples with low noise and high audio robustness at the minute time-consuming level.

Adversarial attack on Speech-to-Text Recognition Models

This paper introduces the first study of weighted-sampling audio adversarial examples, specifically focusing on the factor of the numbers and the positions of distortion to reduce the search space, and proposes a new attack scenario, audio injection attack, which offers some novel insights in the concealment of adversarial attack.

Perceptual Based Adversarial Audio Attacks

This paper demonstrates a physically realizableaudio adversarial attack, based on a psychoacoustic-property-based loss function, and automated generation of room impulse responses, to create adversarial attacks that are robust when played over a speaker in multiple environments.

Generating Robust Audio Adversarial Examples with Temporal Dependency

A new Iterative Proportional Clipping (IPC) algorithm is proposed that preserves temporal dependency in audios for generating more robust adversarial examples and can significantly reduce human-perceptible noises and resist the defenses based on the temporal structure.

Towards Resistant Audio Adversarial Examples

This work finds that due to flaws in the generation process, state-of-the-art adversarial example generation methods cause overfitting because of the binning operation in the target speech recognition system (e.g., Mozilla Deepspeech), and devise an approach to mitigate this flaw, which improves generation of adversarial examples with varying offsets.

Detecting Audio Adversarial Examples with Logit Noising

This paper proposes a novel method to detect audio adversarial examples by adding noise to the logits before feeding them into the decoder of the ASR, and shows that carefully selected noise can significantly impact the transcription results of the audio adversarian examples, whereas it has minimal impact on the transcriptionresults of benign audio waves.

Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition

This paper develops effectively imperceptible audio adversarial examples by leveraging the psychoacoustic principle of auditory masking, while retaining 100% targeted success rate on arbitrary full-sentence targets and makes progress towards physical-world over-the-air audio adversaria examples by constructing perturbations which remain effective even after applying realistic simulated environmental distortions.

Robustness of Adversarial Attacks in Sound Event Classification

This paper investigates the robustness of adversarial examples to simple input transformations such as mp3 compression, resampling, white noise and reverb in the task of sound event classification to provide insights on strengths and weaknesses in current adversarial attack algorithms and provide a baseline for defenses against adversarial attacks.

A Unified Framework for Detecting Audio Adversarial Examples

A unified adversarial detection framework for detecting adaptive audio adversarial examples, which combines noise padding with sound reverberation is proposed, which consistently outperforms the state-of-the-art audio defense methods, even for the adaptive and robust attacks.



Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

A white-box iterative optimization-based attack to Mozilla's implementation DeepSpeech end-to-end has a 100% success rate, and the feasibility of this attack introduce a new domain to study adversarial examples.

Did you hear that? Adversarial Examples Against Automatic Speech Recognition

A first of its kind demonstration of adversarial attacks against speech classification model by adding small background noise without having to know the underlying model parameter and architecture is presented.

Synthesizing Robust Adversarial Examples

The existence of robust 3D adversarial objects is demonstrated, and the first algorithm for synthesizing examples that are adversarial over a chosen distribution of transformations is presented, which synthesizes two-dimensional adversarial images that are robust to noise, distortion, and affine transformation.

Houdini : Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples

This work introduces a novel flexible approach named Houdini for generating adversarial examples specifically tailored for the final performance measure of the task considered, be it combinatorial and non-decomposable.

Explaining and Harnessing Adversarial Examples

It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Novel techniques are developed that address a key technical challenge: integrating the commands into a song in a way that can be effectively recognized by ASR through the air, in the presence of background noise, while not being detected by a human listener.

Deep Speech: Scaling up end-to-end speech recognition

Deep Speech, a state-of-the-art speech recognition system developed using end-to-end deep learning, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set.

Reverberation robust acoustic modeling using i-vectors with time delay neural networks

iVectors are used as an input to the neural network to perform instantaneous speaker and environment adaptation, providing 10% relative improvement in word error rate, and subsampling the outputs at TDNN layers across time steps, training time is reduced.

Intriguing properties of neural networks

It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.

The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech

A common evaluation framework including datasets, tasks, and evaluation metrics for both speech enhancement and ASR techniques is proposed, which will be used as a common basis for the REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge.