• Corpus ID: 236976127

FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset

  title={FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset},
  author={Hasam Khalid and Shahroz Tariq and Simon S. Woo},
With the significant advancements made in generation of forged video and audio, commonly known as deepfakes, using deep learning technologies, the problem of its misuse is a well-known issue now. Deepfakes can cause serious security and privacy issues as it can impersonate identity of a person in an image by replacing his/her face with a another person’s face. Recently, a new problem of generating cloned or synthesized human voice of a person is emerging. AI-based deep learning models can… 

Learning Pairwise Interaction for Generalizable DeepFake Detection

This work proposes a new approach, Multi-Channel Xception Attention Pairwise Interaction (MCX-API), that exploits the power of pairwise learning and complementary information from different color space representations in a fine-grained manner and can generalize better than the state-of-the-art Deepfakes detectors.

Attack Agnostic Dataset: Towards Generalization and Stabilization of Audio DeepFake Detection

This work introduces Attack Agnostic Dataset — a combination of two audio DeepFake and one anti–spoofing datasets that, thanks to the disjoint use of attacks, can lead to better generalization of detection methods.

Voice-Face Homogeneity Tells Deepfake

A voice-face matching method is devised to measure the matching degree of the identities behind voices and faces in deepfake videos, and it is shown that this method achieves competitive results when fine-tuned on limited deepfake data.

Deepfake Detection for Facial Images with Facemasks

The extensive experiments show that, among the two methods, face-crop performs better than the face-patch, and could be a train method for deepfake detection models to detect fake faces with facemask in real world.

Self-Supervised Video Forensics by Audio-Visual Anomaly Detection

An autoregressive model is trained to generate sequences of audio-visual features, using feature sets that capture the temporal synchronization between video frames and sound, and obtains strong performance on the task of detecting manipulated speech videos.

TIMIT-TTS: A Text-to-Speech Dataset for Multimodal Synthetic Media Detection

A general pipeline for synthesizing deepfake speech content from a given video, facilitating the creation of counterfeit multimodal material is presented and TIMIT-TTS, a synthetic speech dataset containing the most cutting-edge methods in the TTS field is released.

How Do Deepfakes Move? Motion Magnification for Deepfake Source Detection

This work contrasts the movement in deepfakes and authentic videos by motion magnification towards building a generalized deepfake source detector, and exploits the difference between real motion and the amplifier GANs to detect whether a video is fake and its source generator if so.

Audio-Visual Person-of-Interest DeepFake Detection

This work extracts audio-visual features which characterize the identity of a person, and uses a contrastive learning paradigm to create a person-of-interest (POI) deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world.

Multimodal Forgery Detection Using Ensemble Learning

This paper focuses on the multimodal forgery detection task and proposes a deep forgery Detection method based on audiovisual ensemble learning, significantly outperforming existing models.

SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection

  • Piotr KawaMarcin PlataP. Syga
  • Computer Science
    2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)
  • 2022
This work focuses on increasing accessibility to the audio DeepFake detection methods by providing SpecRNet, a neural network architecture characterized by a quick inference time and low computational requirements, and provides benchmarks in three unique settings that confirm the correctness of the model.

A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild

This work investigates the problem of lip-syncing a talking face video of an arbitrary identity to match a target speech segment, and identifies key reasons pertaining to this and hence resolves them by learning from a powerful lip-sync discriminator.

FSGAN: Subject Agnostic Face Swapping and Reenactment

A novel recurrent neural network (RNN)-based approach for face reenactment which adjusts for both pose and expression variations and can be applied to a single image or a video sequence and uses a novel Poisson blending loss which combines Poisson optimization with perceptual loss.

The DeepFake Detection Challenge Dataset

Although Deep fake detection is extremely difficult and still an unsolved problem, a Deepfake detection model trained only on the DFDC can generalize to real "in-the-wild" Deepfake videos, and such a model can be a valuable analysis tool when analyzing potentially Deepfaked videos.

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

It is shown that randomly sampled speaker embeddings can be used to synthesize speech in the voice of novel speakers dissimilar from those used in training, indicating that the model has learned a high quality speaker representation.

VoxCeleb2: Deep Speaker Recognition

A very large-scale audio-visual speaker recognition dataset collected from open-source media is introduced and Convolutional Neural Network models and training strategies that can effectively recognise identities from voice under various conditions are developed and compared.

The Deepfake Detection Challenge (DFDC) Preview Dataset

A set of specific metrics to evaluate the performance have been defined and two existing models for detecting deepfakes have been tested to provide a reference performance baseline.

FaceForensics++: Learning to Detect Manipulated Facial Images

This paper proposes an automated benchmark for facial manipulation detection, and shows that the use of additional domain-specific knowledge improves forgery detection to unprecedented accuracy, even in the presence of strong compression, and clearly outperforms human observers.

Exposing Deep Fakes Using Inconsistent Head Poses

  • Xin YangYuezun LiSiwei Lyu
  • Computer Science
    ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2019
This paper proposes a new method to expose AI-generated fake face images or videos based on the observations that Deep Fakes are created by splicing synthesized face region into the original image, and in doing so, introducing errors that can be revealed when 3D head poses are estimated from the face images.

Exploring the Asynchronous of the Frequency Spectra of GAN-generated Facial Images

This paper proposes a new approach that explores the asynchronous frequency spectra of color channels, which is simple but effective for training both unsupervised and supervised learning models to distinguish GAN-based synthetic images.

ADD: Frequency Attention and Multi-View based Knowledge Distillation to Detect Low-Quality Compressed Deepfake Images

This work applies frequency domain learning and optimal transport theory in knowledge distillation to specifically improve the detection of low-quality compressed deepfake images and proposes the Attention-based Deepfake detection Distiller (ADD), which consists of two novel distillations.