Deep4SNet: deep learning for fake speech classification

@article{DoraMBallesteros2021Deep4SNetDL,
  title={Deep4SNet: deep learning for fake speech classification},
  author={L. DoraM.Ballesteros and Yohanna Rodr{\'i}guez-Ortega and Diego Renza and Gonzalo R. Arce},
  journal={Expert Syst. Appl.},
  year={2021},
  volume={184},
  pages={115465}
}

On the Generalizability of Two-dimensional Convolutional Neural Networks for Fake Speech Detection

TLDR
The powerful capabilities of modern text-to-speech methods to produce synthetic computer generated voice, can pose a problem in terms of discerning real from fake audio, so a new fake audio detection dataset based on the TIMIT corpus is created.

A Review of Modern Audio Deepfake Detection Methods: Challenges and Future Directions

TLDR
A review of existing AD detection methods was conducted, along with a comparative description of the available faked audio datasets, in what is believed to be the first review targeting imitated and synthetically generated audio detection methods.

A modified DeepLabV3+ based semantic segmentation of chest computed tomography images for COVID‐19 lung infections

  • Hasan Polat
  • Medicine
    International journal of imaging systems and technology
  • 2022
TLDR
An efficient segmentation framework based on the modified DeepLabV3+ using lower atrous rates in the Atrous Spatial Pyramid Pooling (ASPP) module is proposed to provide robust solutions for improving segmentation performance and hardware implementation.

CNN-Based Model for Landslide Susceptibility Assessment from Multispectral Data

In this work, a new convolutional neural network architecture is proposed to evaluate the susceptibility to landslides. It is a supervised learning algorithm that has been trained from data whose

References

SHOWING 1-10 OF 29 REFERENCES

Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms

TLDR
A new implementation of emotion recognition from the para-lingual information in the speech, based on a deep neural network, applied directly to spectrograms, achieves higher recognition accuracy compared to previously published results, while also limiting the latency.

Neural Predictive Coding Using Convolutional Neural Networks Toward Unsupervised Learning of Speaker Characteristics

TLDR
A convolutional deep siamese network is trained to produce “speaker embeddings” by learning to separate “same” versus “different” speaker pairs which are generated from an unlabeled data of audio streams.

Speech Emotion Recognition Using Spectrogram & Phoneme Embedding

TLDR
A phoneme and spectrogram combined CNN model proved to be most accurate in recognizing emotions on IEMOCAP data and achieved more than 4% increase in overall accuracy and average class accuracy as compared to the existing state-of-the-art methods.

Copy-Move Forgery Detection (CMFD) Using Deep Learning for Image and Video Forensics

TLDR
A model by transfer learning of VGG-16 achieves metrics about 10% higher than the model by a custom architecture, however, it requires approximately twice as much inference time as the latter.

Deep Voice: Real-time Neural Text-to-Speech

TLDR
Deep Voice lays the groundwork for truly end-to-end neural speech synthesis and shows that inference with the system can be performed faster than real time and describes optimized WaveNet inference kernels on both CPU and GPU that achieve up to 400x speedups over existing implementations.

Deep Voice 3: 2000-Speaker Neural Text-to-Speech

TLDR
Deep Voice 3 is presented, a fully-convolutional attention-based neural text-to-speech (TTS) system that matches state-of-the-art neural speech synthesis systems in naturalness while training ten times faster.

Dual branch convolutional neural network for copy move forgery detection

TLDR
A deep learning–based passive Copy Move Forgery Detection algorithm is proposed that uses a novel dual branch convolutional neural network to classify images as original and forged.

Fake Colorized Image Detection with Channel-wise Convolution based Deep-learning Framework

TLDR
This paper introduces WISERNet (Wider Separate-then-reunion Network), a recently proposed deep-learning based data-driven color image steganalyzer in the field of fake colorized image detection, and believes that statistical inconsistencies introduced by different automatic colorization methods can be captured by advanced deep- learning based Data Driven Color Image Steganalyzers such as WISerNet.

Deep Nonlinear Metric Learning for Speaker Verification in the I-Vector Space

TLDR
A nonlinear metric learning method, which learns an explicit mapping from the original space to an optimal subspace using deep Restricted Boltzmann Machine network and shows superior performance than some state-of-the-art methods.