• Corpus ID: 245131182

Real-Time Neural Voice Camouflage

@article{Chiquier2021RealTimeNV,
  title={Real-Time Neural Voice Camouflage},
  author={Mia Chiquier and Chengzhi Mao and Carl Vondrick},
  journal={ArXiv},
  year={2021},
  volume={abs/2112.07076}
}
Automatic speech recognition systems have created exciting possibilities for applications, however they also enable opportunities for systematic eavesdropping. We propose a method to camouflage a person’s voice over-the-air from these systems without inconveniencing the conversation between people in the room. Standard adversarial attacks are not effective in real-time streaming situations because the characteristics of the signal will have changed by the time the attack is executed. We… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 47 REFERENCES
Adversarial Attacks Against Automatic Speech Recognition Systems via Psychoacoustic Hiding
TLDR
A new type of adversarial examples based on psychoacoustic hiding is introduced, which allows us to embed an arbitrary audio input with a malicious voice command that is then transcribed by the ASR system, with the audio signal remaining barely distinguishable from the original signal.
Adversarial Attacks and Defenses for Speech Recognition Systems
TLDR
It is shown that a WaveGAN vocoder can be a useful countermeasure to adversarial attacks on ASR systems – even when it is jointly attacked with the ASR, the target phrases’ word error rate is high.
Robust Audio Adversarial Example for a Physical Attack
TLDR
Evaluation and a listening experiment demonstrated that adversarial examples generated by the proposed method are able to attack a state-of-the-art speech recognition model in the physical world without being noticed by humans, suggesting that audio adversarial example may become a real threat.
Audio Adversarial Examples: Targeted Attacks on Speech-to-Text
TLDR
A white-box iterative optimization-based attack to Mozilla's implementation DeepSpeech end-to-end has a 100% success rate, and the feasibility of this attack introduce a new domain to study adversarial examples.
Adversarial Music: Real World Audio Adversary Against Wake-word Detection System
TLDR
This is the first real-world adversarial attack against a commercial grade VA wake-word detection system and can effectively reduce the recognition F1 score of the emulated model from 93.4% to 11.0%.
Deep Speech: Scaling up end-to-end speech recognition
TLDR
Deep Speech, a state-of-the-art speech recognition system developed using end-to-end deep learning, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set.
Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
TLDR
It is shown that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech-two vastly different languages, and is competitive with the transcription of human workers when benchmarked on standard datasets.
Adversarial Auto-Encoding for Packet Loss Concealment
TLDR
This work proposes a non-autoregressive adversarial auto-encoder, namedPLAAE, to perform real-time PLC in the waveform domain, and highlights the superiority of PLAAE over two classic PLCs and two deep autoregressive models in terms of spectral and intonation reconstruction, perceptual quality, and intelligibility.
Listening to Sounds of Silence for Speech Denoising
TLDR
A deep learning model for speech denoising, a long-standing challenge in audio analysis arising in numerous applications, based on a key observation about human speech: there is often a short pause between each sentence or word, which exposes not just pure noise but its time-varying features.
Adversarial Generation of Time-Frequency Features with application in audio synthesis
TLDR
The potential of deliberate generative TF modeling is demonstrated by training a generative adversarial network (GAN) on short-time Fourier features and it is shown that by applying guidelines, the TF-based network was able to outperform a state-of-the-art GAN generating waveforms directly, despite the similar architecture in the two networks.
...
1
2
3
4
5
...