Spectrogram-Based Classification Of Spoken Foul Language Using Deep CNN

  title={Spectrogram-Based Classification Of Spoken Foul Language Using Deep CNN},
  author={Abdulaziz Saleh Ba Wazir and Hezerul Abdul Karim and Mohd Haris Lye Abdullah and Sarina Mansor and Nouar Aldahoul and Mohammad Faizal Ahmad Fauzi and John See},
  journal={2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)},
  • A. Wazir, H. A. Karim, John See
  • Published 21 September 2020
  • Computer Science
  • 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)
Excessive content of profanity in audio and video files has proven to shape one’s character and behavior. Currently, conventional methods of manual detection and censorship are being used. Manual censorship method is time consuming and prone to misdetection of foul language. This paper proposed an intelligent model for foul language censorship through automated and robust detection by deep convolutional neural networks (CNNs). A dataset of foul language was collected and processed for the… 

Figures and Tables from this paper

Design and Implementation of Fast Spoken Foul Language Recognition with Different End-to-End Deep Neural Network Architectures
The proposed system outperformed state-of-the-art pre-trained neural networks on the novel foul language dataset and proved to reduce the computational cost with minimal trainable parameters.


Acoustic Scene Classification Using a CNN-SuperVector System Trained with Auditory and Spectrogram Image Features
This study analyzes the performance of a state-of-theart CNN system for different auditory image and spectrogram features, including Mel-scaled, logarithmically scaled, linearly scaled filterbank spectrograms, and Stabilized Auditory Image (SAI) features, and benchmark an MFCC based Gaussian Mixture Model (GMM) SuperVector (SV) system for acoustic scene classification.
Acoustic Pornography Recognition Using Recurrent Neural Network
The experimental results confirm the feasibility of the proposed acoustic-driven approach by demonstrating an accuracy of 86.50%, and F-score of 86-score, in the task of pornography recognition.
Robust sound event recognition using convolutional neural networks
This work proposes novel features derived from spectrogram energy triggering, allied with the powerful classification capabilities of a convolutional neural network (CNN), which demonstrates excellent performance under noise-corrupted conditions when compared against state-of-the-art approaches on standard evaluation tasks.
Spoken Arabic Digits Recognition Using Deep Learning
This research proposes an Arabic digits speech recognition model utilizing Recurrent Neural Network (RNN), which has the highest accuracy, i.e. 80%, when recognizing the digit zero.
Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network
Preliminary results indicate that the proposed approach based on freshly trained model is better than the fine-tuned model, and is capable of predicting emotions accurately and efficiently.
Acoustic Characteristics of Emotional Speech Using Spectrogram Image Classification
Amplitude-frequency analysis of emotional speech was performed to determine relative differences between seven emotional categories of speech in the Berlin Emotional Speech (EMO-DB) database.
ImageNet classification with deep convolutional neural networks
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Spoken Digit Recognition in Portuguese Using Line Spectral Frequencies
Line Spectral Frequencies (LSF) provides a set of highly predictive coefficients for digit recognition and it is shown that the choice of the right attribute extraction method is more important than the specific classification paradigm, and that the right combination of classifier and attributes can provide almost perfect accuracy.