Training Recurrent Networks by Evolino

  title={Training Recurrent Networks by Evolino},
  author={J{\"u}rgen Schmidhuber and Daan Wierstra and Matteo Gagliolo and Faustino J. Gomez},
  journal={Neural Computation},
In recent years, gradient-based LSTM recurrent neural networks (RNNs) solved many previously RNN-unlearnable tasks. Sometimes, however, gradient information is of little use for training RNNs, due to numerous local minima. For such cases, we present a novel method: EVOlution of systems with LINear Outputs (Evolino). Evolino evolves weights to the nonlinear, hidden nodes of RNNs while computing optimal linear mappings from hidden state to output, using methods such as pseudo-inverse-based linear… 

Supervised and Evolutionary Learning of Echo State Networks

This paper proposes to apply CMA-ES, the state-of-the-art method in evolutionary continuous parameter optimization, to the evolutionary learning of ESN parameters, and shows that the evolutionary ESN obtain results that are comparable with those of the best topology-learning methods.

Evolutionary Echo State Network: evolving reservoirs in the Fourier space

A new computational model of the ESN type, that represents the reservoir weights in the Fourier space and performs a fine-tuning of these weights applying genetic algorithms in the frequency domain is proposed, thus providing a dimensionality reduction transformation of the initial method.

Evolving reservoir weights in the frequency domain

This work introduces an evolutionary method for adjusting the reservoir non-null weights, called EvoESN (Evolutionary ESN), which combines an evolutionary search in the Fourier space with supervised learning for the readout weights.

Knowledge-based recurrent neural networks in Reinforcement Learning

Several methods which have the potential of transferring of knowledge in RL using RNN are presented: Directed Transfer, Cascade-Correlation, Mixture of Expert Systems, and Two-Level Architecture.

The Power of Linear Recurrent Neural Networks.

P predictive neural networks outperform the previous state-of-the-art for the MSO task with a minimal number of units and can approximate any time-dependent function f(t) given by a number of function values by simply solving a linear equation system.

The Power of Linear Recurrent Neural Networks - Predictive Neural Networks

Predictive neural networks outperform the previous state-of-the-art for the MSO task with a minimal number of units and can effectively be learned by simply solving a linear equation system; no backpropagation or similar methods are needed.

LSTM: A Search Space Odyssey

This paper presents the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling, and observes that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.

Re-visiting Reservoir Computing architectures optimized by Evolutionary Algorithms

A systematic brief survey about applications of the EAs on the domain of the recurrent NNs named Reservoir Computing, where EAs are helpful tools to bring out optimal RC architectures.

A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures

The LSTM cell and its variants are reviewed and their variants are explored to explore the learning capacity of the LSTm cell and the L STM networks are divided into two broad categories:LSTM-dominated networks and integrated LSTS networks.



Evolino: Hybrid Neuroevolution/Optimal Linear Search for Sequence Learning

A general framework for sequence learning, EVOlution of recurrent systems with LINear outputs (Evolino), which uses evolution to discover good RNN hidden node weights, while using methods such as linear regression or quadratic programming to compute optimal linear mappings from hidden state to output.

Modeling systems with internal state using evolino

This work uses the general framework for sequence learning, EVOlution of recurrent systems with LINear Outputs (Evolino), to discover good RNN hidden node weights through evolution, while using linear regression to compute an optimal linear mapping from hidden state to output.

Co-evolving recurrent neurons learn deep memory POMDPs

A new neuroevolution algorithm called Hierarchical Enforced SubPopulations that simultaneously evolves networks at two levels of granularity: full networks and network components or neurons is introduced.

Evolino for recurrent support vector machines

This work introduces a new class of recurrent, truly sequential SVM-like devices with internal adaptive states, trained by a novel method called EVOlution of systems with KErnel-based outputs (Evoke), an instance of the recent Evolino class of methods.

Learning to Forget: Continual Prediction with LSTM

This work identifies a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset, and proposes a novel, adaptive forget gate that enables an LSTm cell to learn to reset itself at appropriate times, thus releasing internal resources.

Gradient calculations for dynamic recurrent neural networks: a survey

The author discusses advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones and presents some "tricks of the trade" for training, using, and simulating continuous time and recurrent neural networks.

Sequential Behavior and Learning in Evolved Dynamical Neural Networks

This article explores the use of a real-valued modular genetic algorithm to evolve continuous-time recurrent neural networks capable of sequential behavior and learning and utilizes concepts from dynamical systems theory to understand the operation of some of these evolved networks.

Learning Precise Timing with LSTM Recurrent Networks

This work finds that LSTM augmented by "peephole connections" from its internal cells to its multiplicative gates can learn the fine distinction between sequences of spikes spaced either 50 or 49 time steps apart without the help of any short training exemplars.