• Corpus ID: 3087174

Suprisal-Driven Zoneout

  title={Suprisal-Driven Zoneout},
  author={Kamil Rocki},
We propose a novel method of regularization for recurrent neural networks called suprisal-driven zoneout. In this method, states zoneout (maintain their previous value rather than updating), when the suprisal (discrepancy between the last state’s prediction and target) is small. Thus regularization is adaptive and input-driven on a per-neuron basis. We demonstrate the effectiveness of this idea by achieving state-of-the-art bits per character of 1.31 on the Hutter Prize Wikipedia dataset… 

Figures and Tables from this paper


Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
This work proposes zoneout, a novel method for regularizing RNNs that uses random noise to train a pseudo-ensemble, improving generalization and performs an empirical investigation of various RNN regularizers, and finds that zoneout gives significant performance improvements across tasks.
Surprisal-Driven Feedback in Recurrent Networks
This paper introduces surprisal-driven recurrent networks, which take into account past error information when making new predictions, which outperforms other stochastic and fully deterministic approaches on enwik8 character level prediction task.
On Multiplicative Integration with Recurrent Neural Networks
This work introduces a general and simple structural design called Multiplicative Integration, which changes the way in which information from difference sources flows and is integrated in the computational building block of an RNN, while introducing almost no extra parameters.
Recurrent Memory Array Structures
It is shown that the nondeterministic Array-LSTM approach improves state-of-the-art performance on character level text prediction achieving 1.402 BPC on enwik8 dataset.
Generating Text with Recurrent Neural Networks
The power of RNNs trained with the new Hessian-Free optimizer by applying them to character-level language modeling tasks is demonstrated, and a new RNN variant that uses multiplicative connections which allow the current input character to determine the transition matrix from one hidden state vector to the next is introduced.
Gated Feedback Recurrent Neural Networks
The empirical evaluation of different RNN units revealed that the proposed gated-feedback RNN outperforms the conventional approaches to build deep stacked RNNs in the tasks of character-level language modeling and Python program evaluation.
Grid Long Short-Term Memory
The Grid LSTM is used to define a novel two-dimensional translation model, the Reencoder, and it is shown that it outperforms a phrase-based reference system on a Chinese-to-English translation task.
Hierarchical Multiscale Recurrent Neural Networks
A novel multiscale approach, called the hierarchical multiscales recurrent neural networks, which can capture the latent hierarchical structure in the sequence by encoding the temporal dependencies with different timescales using a novel update mechanism is proposed.
Layer Normalization
Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called
Understanding the difficulty of training deep feedforward neural networks
The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.