Stochastic Shake-Shake Regularization for Affective Learning from Speech

@inproceedings{Huang2018StochasticSR,
  title={Stochastic Shake-Shake Regularization for Affective Learning from Speech},
  author={Che-Wei Huang and Shrikanth S. Narayanan},
  booktitle={INTERSPEECH},
  year={2018}
}
We propose stochastic Shake-Shake regularization based on multi-branch residual architectures to mitigate over-fitting in affective learning from speech. Inspired by recent Shake-Shake [1] and ShakeDrop [2] regularization techniques, we introduce negative scaling into the Shake-Shake regularization algorithm while still maintain a consistent stochastic convex combination of branches to encourage diversity among branches whether they are scaled by positive or negative coefficients. In addition… 

Figures and Tables from this paper

Normalization Before Shaking Toward Learning Symmetrically Distributed Representation Without Margin in Speech Emotion Recognition

TLDR
This work presents an investigation on Shake-Shake regularization, drawing connections to the vicinal risk minimization principle and discriminative feature learning in verification tasks, and identifies a strong resemblance between batch normalized residual blocks and batch normalized recurrent neural networks.

On Role and Location of Normalization before Model-based Data Augmentation in Residual Blocks for Classification Tasks

TLDR
One of the findings illustrates the phenomenon that batch normalization in residual blocks is indispensable when shaking is applied to model branches, along with which it is empirically demonstrated the most effective location to place a batchnormalization layer in a shaking regularized residual block.

Emotion Recognition Robust to Indoor Environmental Distortions and Non-targeted Emotions Using Out-Of-Distribution Detection

TLDR
This work significantly improves realistic considerations for emotion detection by more comprehensively assessing different situations and combining CNN with out-of-data distribution detection, and increases the situations where emotions can be effectively detected and outperforms a state of theart baseline.

Emotion Recognition Robust to Indoor Environmental Distortions and Non-targeted Emotions Using Out-of-distribution Detection

TLDR
This work significantly improves realistic considerations for emotion detection by more comprehensively assessing different situations and combining CNN with out-of-data distribution detection, and increases the situations where emotions can be effectively detected and outperforms a state of theart baseline.

References

SHOWING 1-10 OF 27 REFERENCES

Shaking Acoustic Spectral Sub-Bands can Letxer Regularize Learning in Affective Computing

TLDR
The experimental results demonstrate that independently shaking subbands delivers favorable models compared to shaking the entire spectral-temporal feature maps, and with proper patience in early stopping, the proposed models can simultaneously outperform the baseline and maintain a smaller performance gap between training and validation.

ShakeDrop regularization

TLDR
ShakeDrop is inspired by Shake-Shake regularization that decreases error rates by disturbing learning and can be applied to not only ResNeXt but also ResNet, Wide ResNet and PyramidNet in a memory efficient way.

Adam: A Method for Stochastic Optimization

TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition

TLDR
A deep convolutional recurrent neural network for speech emotion recognition based on the log-Mel filterbank energies is presented, where the Convolutional layers are responsible for the discriminative feature learning and aconvolutional attention mechanism is proposed to learn the utterance structure relevant to the task.

Shake-Shake regularization

The method introduced in this paper aims at helping deep learning practitioners faced with an overfit problem. The idea is to replace, in a multi-branch network, the standard summation of parallel

IEMOCAP: interactive emotional dyadic motion capture database

TLDR
A new corpus named the “interactive emotional dyadic motion capture database” (IEMOCAP), collected by the Speech Analysis and Interpretation Laboratory at the University of Southern California (USC), which provides detailed information about their facial expressions and hand movements during scripted and spontaneous spoken communication scenarios.

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

Dropout: a simple way to prevent neural networks from overfitting

TLDR
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

Convolutional Neural Networks for Speech Recognition

TLDR
It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features.

EMOVO Corpus: an Italian Emotional Speech Database

TLDR
It is observed that emotions less easy to recognize are joy and disgust, whereas the most easy to detect are anger, sadness and the neutral state.