Hear "No Evil", See "Kenansville"*: Efficient and Transferable Black-Box Attacks on Speech Recognition and Voice Identification Systems

  title={Hear "No Evil", See "Kenansville"*: Efficient and Transferable Black-Box Attacks on Speech Recognition and Voice Identification Systems},
  author={Hadi Abdullah and Muhammad Sajidur Rahman and Washington Garcia and Logan Blue and Kevin Warren and Anurag Swarnim Yadav and Tom Shrimpton and Patrick Traynor},
  journal={2021 IEEE Symposium on Security and Privacy (SP)},
Automatic speech recognition and voice identification systems are being deployed in a wide array of applications, from providing control mechanisms to devices lacking traditional interfaces, to the automatic transcription of conversations and authentication of users. Many of these applications have significant security and privacy considerations. We develop attacks that force mistranscription and misidentification in state of the art systems, with minimal impact on human comprehension… 

Attacks as Defenses: Designing Robust Audio CAPTCHAs Using Attacks on Automatic Speech Recognition Systems

This work proposes a new mechanism that is both comparatively intelligible (evaluated through a user study) and hard to automatically transcribe (i.e., P(transcription) = 4× 10−5) and demonstrates a CAPTCHA that is approximately four orders of magnitude more difficult to crack.

CrossASR: Efficient Differential Testing of Automatic Speech Recognition via Text-To-Speech

CrossASR is a differential testing solution that compares outputs of multiple ASR systems to uncover erroneous behaviors among ASRs, and efficiently generates test cases to uncover failures with as few generated tests as possible.

PhoneyTalker: An Out-of-the-Box Toolkit for Adversarial Example Attack on Speaker Recognition

PhoneyTalker is proposed, an out-of-the-box toolkit for any adversary to generate universal and transferable adversarial examples with low complexity, releasing the requirement for professional background and specialized equipment.

SoK: A Modularized Approach to Study the Security of Automatic Speech Recognition Systems

This article presents the systematization of knowledge for ASR security and provides a comprehensive taxonomy for existing work based on a modularized workflow, and shows that transfer attacks across ASR models are feasible, even in the absence of knowledge about models and training data.

Practical Attacks on Voice Spoofing Countermeasures

The first practical attack on CMs is developed, and it is shown how a malicious actor may efficiently craft audio samples to bypass voice authentication in its strictest form, calling into question the security of modern voice authentication systems in light of the real threat of attackers bypassing these measures to gain access to users’ most valuable resources.

SoK: A Study of the Security on Voice Processing Systems

This paper will identify and classify an arrangement of unique attacks on voice processing systems and suggest future developments and theoretical improvements so that the state of current attacks and defenses can be identified and suggested.

Demystifying Limited Adversarial Transferability in Automatic Speech Recognition Systems

This paper dis-cover and quantify six factors that impact the targeted transferability of optimization attacks against Automatic Speech Recognition systems (ASRs), which can be leveraged to design ASRs that are more robust to other transferable attack types, or to modify architectures in other domains to reduce their vulnerability to targeted transferable samples.

Catch Me If You Can: Blackbox Adversarial Attacks on Automatic Speech Recognition using Frequency Masking

This work proposes techniques that generate blackbox, untargeted adversarial attacks that are portable across ASRs, not easily detected by a state-of-the-art defense system, and had significant difference in output transcriptions while sounding similar to original audio.

FenceSitter: Black-box, Content-Agnostic, and Synchronization-Free Enrollment-Phase Attacks on Speaker Recognition Systems

A new attack surface of SRSs is explored by presenting an enrollment-phase attack paradigm, named FenceSitter, where the adversary poisons the S RS using imperceptible adversarial ambient sound when the legitimate user registers into the SRS.

Disappeared Command: Spoofing Attack On Automatic Speech Recognition Systems with Sound Masking

A non-contact black-box adversarial attack algorithm with high transferability is proposed, which achieves an 81.57% success rate of adversarial attacks on the commercially available speech API and searches the most suitable masking music for the adversarial samples based on the psychoacoustic model to improve the concealment of the samples.



Cocaine Noodles: Exploiting the Gap between Human and Machine Speech Recognition

It is found that differences in how humans and machines understand spoken speech can be easily exploited by an adversary to produce sound which is intelligible as a command to a computer speech recognition system but is not easily understandable by humans.

Hidden Voice Commands

This paper explores in this paper how voice interfaces can be attacked with hidden voice commands that are unintelligible to human listeners but which are interpreted as commands by devices.

Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems

This paper exploits the fact that multiple source audio samples have similar feature vectors when transformed by acoustic feature extraction algorithms to exploit knowledge of the signal processing algorithms commonly used by VPSes to generate the data fed into machine learning systems.

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Novel techniques are developed that address a key technical challenge: integrating the commands into a song in a way that can be effectively recognized by ASR through the air, in the presence of background noise, while not being detected by a human listener.

DolphinAttack: Inaudible Voice Commands

A totally inaudible attack, DolphinAttack, that modulates voice commands on ultrasonic carriers to achieve inaudibility and is validated on popular speech recognition systems, including Siri, Google Now, Samsung S Voice, Huawei HiVoice, Cortana and Alexa.

Adversarial Attacks Against Automatic Speech Recognition Systems via Psychoacoustic Hiding

A new type of adversarial examples based on psychoacoustic hiding is introduced, which allows us to embed an arbitrary audio input with a malicious voice command that is then transcribed by the ASR system, with the audio signal remaining barely distinguishable from the original signal.

The Failure of Noise-Based Non-continuous Audio Captchas

Decaptcha's performance on actual observed and synthetic CAPT CHAs indicates that such speech CAPTCHAs are inherently weak and, because of the importance of audio for various classes of users, alternative audio CAPTChAs must be developed.

Deep Speech: Scaling up end-to-end speech recognition

Deep Speech, a state-of-the-art speech recognition system developed using end-to-end deep learning, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set.

Fooling End-To-End Speaker Verification With Adversarial Examples

This paper presents white-box attacks on a deep end-to-end network that was either trained on YOHO or NTIMIT, and shows that one can significantly decrease the accuracy of a target system even when the adversarial examples are generated with different system potentially using different features.

Did you hear that? Adversarial Examples Against Automatic Speech Recognition

A first of its kind demonstration of adversarial attacks against speech classification model by adding small background noise without having to know the underlying model parameter and architecture is presented.