• Corpus ID: 3087174

Suprisal-Driven Zoneout

  title={Suprisal-Driven Zoneout},
  author={Kamil Rocki},
We propose a novel method of regularization for recurrent neural networks called suprisal-driven zoneout. In this method, states zoneout (maintain their previous value rather than updating), when the suprisal (discrepancy between the last state’s prediction and target) is small. Thus regularization is adaptive and input-driven on a per-neuron basis. We demonstrate the effectiveness of this idea by achieving state-of-the-art bits per character of 1.31 on the Hutter Prize Wikipedia dataset… 

Figures and Tables from this paper



Surprisal-Driven Feedback in Recurrent Networks

This paper introduces surprisal-driven recurrent networks, which take into account past error information when making new predictions, which outperforms other stochastic and fully deterministic approaches on enwik8 character level prediction task.

On Multiplicative Integration with Recurrent Neural Networks

This work introduces a general and simple structural design called Multiplicative Integration, which changes the way in which information from difference sources flows and is integrated in the computational building block of an RNN, while introducing almost no extra parameters.

Recurrent Memory Array Structures

It is shown that the nondeterministic Array-LSTM approach improves state-of-the-art performance on character level text prediction achieving 1.402 BPC on enwik8 dataset.

Generating Text with Recurrent Neural Networks

The power of RNNs trained with the new Hessian-Free optimizer by applying them to character-level language modeling tasks is demonstrated, and a new RNN variant that uses multiplicative connections which allow the current input character to determine the transition matrix from one hidden state vector to the next is introduced.

Gated Feedback Recurrent Neural Networks

The empirical evaluation of different RNN units revealed that the proposed gated-feedback RNN outperforms the conventional approaches to build deep stacked RNNs in the tasks of character-level language modeling and Python program evaluation.

Grid Long Short-Term Memory

The Grid LSTM is used to define a novel two-dimensional translation model, the Reencoder, and it is shown that it outperforms a phrase-based reference system on a Chinese-to-English translation task.

Hierarchical Multiscale Recurrent Neural Networks

A novel multiscale approach, called the hierarchical multiscales recurrent neural networks, which can capture the latent hierarchical structure in the sequence by encoding the temporal dependencies with different timescales using a novel update mechanism is proposed.

Layer Normalization

Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called

Understanding the difficulty of training deep feedforward neural networks

The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.

Long Short-Term Memory

A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.