Generalizing AUC Optimization to Multiclass Classification for Audio Segmentation With Limited Training Data

@article{Gimeno2021GeneralizingAO,
  title={Generalizing AUC Optimization to Multiclass Classification for Audio Segmentation With Limited Training Data},
  author={Pablo Gimeno and Victoria Mingote and Alfonso Ortega and Antonio Miguel and Eduardo Lleida},
  journal={IEEE Signal Processing Letters},
  year={2021},
  volume={28},
  pages={1135-1139}
}
Area under the ROC curve (AUC) optimisation techniques developed for neural networks have recently demonstrated their capabilities in different audio and speech related tasks. However, due to its intrinsic nature, AUC optimisation has focused only on binary tasks so far. In this paper, we introduce an extension to the AUC optimisation framework so that it can be easily applied to an arbitrary number of classes, aiming to overcome the issues derived from training data limitations in deep… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 29 REFERENCES
Partial AUC Optimisation Using Recurrent Neural Networks for Music Detection with Limited Training Data
TLDR
Experimental results show that partial AUC optimisation can improve the performance of music detection systems significantly compared to traditional training criteria such as cross entropy.
An adapted data selection for deep learning-based audio segmentation in multi-genre broadcast channel
TLDR
Experimental results show that the deep neural networks trained using data selected by this method are superior to those trained with data chosen by two comparing methods, and better performance could be obtained by combining the deep learning-based audio segmentation method with the adapted data selection method.
Multiclass audio segmentation based on recurrent neural networks for broadcast domain data
TLDR
This paper presents a new approach based on recurrent neural networks to the multiclass audio segmentation task whose goal is to classify an audio signal as speech, music, noise or a combination of these, and shows that removing redundant temporal information is beneficial for the segmentation system showing a relative improvement close to 5%.
Optimization of the Area Under the ROC Curve using Neural Network Supervectors for Text-Dependent Speaker Verification
TLDR
A general alignment mechanism to keep the temporal structure of each phrase and obtain a supervector with the speaker and phrase information and a novel back-end approach to train a neural network for detection tasks by optimizing the Area Under the Curve (AUC) as an alternative to the usual triplet loss function.
AUC Optimization for Deep Learning Based Voice Activity Detection
TLDR
This paper proposes to optimize the area under ROC Curve (AUC) by DNN, which can maximize the performance of VAD in terms of the ROC curve, and experimental results show that optimizing AUC byDNN results in higher performance than the common method of optimizing the minimum squared error by Dnn.
A Recurrent Neural Network Approach to Audio Segmentation for Broadcast Domain Data
TLDR
This system takes advantage of the capability of Bidirectional Long Short Term Memory Networks (BLSTM) for modeling temporal dynamics of the input signals and complemented by a resegmentation module, gaining long-term stability by means of the tied-state concept in Hidden Markov Models.
Partial AUC Optimization Based Deep Speaker Embeddings with Class-Center Learning for Text-Independent Speaker Verification
TLDR
A verification loss function, named the maximization of partial area under the Receiver-operating-characteristic (ROC) curve (pAUC), for deep embedding based text-independent speaker verification is proposed and experiments show that the proposed pAUC loss function is highly competitive with the state-of-the-art identification loss functions.
Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset
TLDR
This work aims to study the implementation of several neural network-based systems for speech and music event detection over a collection of 77,937 10-second audio segments, selected from the Google AudioSet dataset.
AUCμ: A Performance Metric for Multi-Class Machine Learning Models
TLDR
This work provides in this work a multi-class extension of AUC that is called AUCμ that is derived from first principles of the binary class AUC, which has similar computational complexity to AUC and maintains the properties of A UC critical to its interpretation and use.
Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora
TLDR
A new algorithm is proposed for audio classification, which is based on weighted GMM Networks (WGN), and a new false alarm compensation procedure is implemented, which can compensate the false alarm rate significantly with little cost to the miss rate.
...
1
2
3
...