Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification

  title={Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification},
  author={Xu Li and N. Li and Jinghua Zhong and Xixin Wu and Xunying Liu and Dan Su and Dong Yu and Helen M. Meng},
  • Xu LiN. Li H. Meng
  • Published in Interspeech 11 June 2020
  • Computer Science
Recently adversarial attacks on automatic speaker verification (ASV) systems attracted widespread attention as they pose severe threats to ASV systems. However, methods to defend against such attacks are limited. Existing approaches mainly focus on retraining ASV systems with adversarial data augmentation. Also, countermeasure robustness against different attack settings are insufficiently investigated. Orthogonal to prior approaches, this work proposes to defend ASV systems against adversarial… 

Figures and Tables from this paper

Adversarial Attack and Defense Strategies of Speaker Recognition Systems: A Survey

A comprehensive survey of speaker recognition systems (SRSs), adversarial attacks and defenses against SRSs, which includes the development of S RSs, adversarial training, attack detection, and input refactoring against existing attacks.

Pairing Weak with Strong: Twin Models for Defending Against Adversarial Attack on Speaker Verification

The task of adversarial defense as a problem of attack detection is formed, made possible with the verification scores from a pair of purposely se-lected SV models, and can be combined with existing single-model countermeasures for even stronger defenses.

LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification

An attacker-independent and interpretable method, named learnable mask detector (LMD), to separate adversarial examples from the genuine ones, and experimental results show that the proposed method outperforms state-of-the-art baselines.

Defending against FakeBob Adversarial Attacks in Speaker Verification Systems with Noise-Adding

This work designs and implements a simple and light-weight defense system that is effective against FakeBob, and specifically studies two opposite pre-processing operations on input audios in speak verification systems: denoising that attempts to remove or reduce perturbations and noise-adding that adds small noise to an input audio.

Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion

This work focuses on fusion-based SASV solutions and proposes a multi-model fusion framework to leverage the power of multiple state-of-the-art ASV and CM models and vastly improves the SASV-EER from 8.75% to be 1.17%, which is 86% relative improvement compared to the best baseline system in the SASv challenge.

Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition

It is demonstrated that the proposed novel feature-level transformation combined with adversarial training is rather effective compared to the sole adversarialTraining in a complete white-box setting, while other transformations do not necessarily improve the overall defense capability.

AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification

An improvement to representation learning to classify and detect adversarial attacks is proposed and empirically validate the claim that training representation learning network using adversarial perturbations as opposed to adversarial examples is beneficial because it poten-tially eliminates nuisance information.

A Survey on Voice Assistant Security: Attacks and Countermeasures

A broad category of relevant but seemingly unrelated attacks by the vulnerable system components and attack methods are systematized, and existing countermeasures based on the defensive strategies from a system designer’s perspective are categorized.

On the Detection of Adaptive Adversarial Attacks in Speaker Verification Systems

The proposed detector, called MEH-FEST, calculates the minimum energy in high frequencies from the short-time Fourier transform of an audio and uses it as a detection metric, and is effective in determining whether an audio is corrupted by FAKEBOB attacks.



Adversarial Machine Learning at Scale

This research applies adversarial training to ImageNet and finds that single-step attacks are the best for mounting black-box attacks, and resolution of a "label leaking" effect that causes adversarially trained models to perform better on adversarial examples than on clean examples.

The Limitations of Deep Learning in Adversarial Settings

This work formalizes the space of adversaries against deep neural networks (DNNs) and introduces a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs.

Voxceleb: Large-scale speaker verification in the wild

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

VoxCeleb: A Large-Scale Speaker Identification Dataset

This paper proposes a fully automated pipeline based on computer vision techniques to create a large scale text-independent speaker identification dataset collected 'in the wild', and shows that a CNN based architecture obtains the best performance for both identification and verification.

Defense Against Adversarial Attacks on Spoofing Countermeasures of ASV

This paper is among the first to use defense methods to improve the robustness of ASV spoofing countermeasure models under adversarial attacks, and the experimental results show that these two defense methods positively help spoofing Countermeasures models counter adversarial examples.

Real-Time, Universal, and Robust Adversarial Attacks Against Speaker Recognition Systems

The first real-time, universal, and robust adversarial attack against the state-of-the-art deep neural network (DNN) based speaker recognition system is proposed, adding an audio-agnostic universal perturbation on arbitrary enrolled speaker’s voice input to identify the speaker as any target (i.e., adversary-desired) speaker label.

Practical Adversarial Attacks Against Speaker Recognition Systems

This paper launches a practical and systematic adversarial attack against X-vector, the state-of-the-art deep neural network (DNN) based speaker recognition system, and integrates the estimated room impulse response (RIR) into the adversarial example training process toward practical audio adversarial examples which could remain effective while being played over the air in the physical world.

Adversarial Attacks on GMM I-Vector Based Speaker Verification Systems

Experimental results show that GMM i-vector systems are seriously vulnerable to adversarial attacks, and the crafted adversarial samples are proved to be transferable and pose threats to neural network speaker embedding based systems (e.g. x- vector systems).

Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems

This paper conducts the first comprehensive and systematic study of the adversarial attacks on SR systems (SRSs) to understand their security weakness in the practical black-box setting, and proposes an adversarial attack, named FakeBob, to craft adversarial samples.