• Corpus ID: 220525932

Shuffling Recurrent Neural Networks

  title={Shuffling Recurrent Neural Networks},
  author={Michael Rotman and Lior Wolf},
We propose a novel recurrent neural network model, where the hidden state $h_t$ is obtained by permuting the vector elements of the previous hidden state $h_{t-1}$ and adding the output of a learned function $b(x_t)$ of the input $x_t$ at time $t$. In our model, the prediction is given by a second learned function, which is applied to the hidden state $s(h_t)$. The method is easy to implement, extremely efficient, and does not suffer from vanishing nor exploding gradients. In an extensive set… 

Figures and Tables from this paper

Deep Incremental RNN for Learning Sequential Data: A Lyapunov Stable Dynamical System
This paper proves that DIRNN is essentially a Lyapunov stable dynamical system where there is no vanishing or exploding gradient in training, and proposes a novel network, namely, deep incremental RNN (DIRNN), which is evaluated on seven benchmark datasets, and achieves state-of-the-art results.
Research on the Feasibility of Applying GRU and Attention Mechanism Combined with Technical Indicators in Stock Trading Strategies
: The vigorous development of Time Series Neural Network in recent years has brought many potential possibilities to the application of financial technology. This research proposes a stock trend


Unitary Evolution Recurrent Neural Networks
This work constructs an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned, and demonstrates the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.
A Simple Way to Initialize Recurrent Networks of Rectified Linear Units
This paper proposes a simpler solution that use recurrent neural networks composed of rectified linear units that is comparable to LSTM on four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.
Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN
It is shown that an IndRNN can be easily regulated to prevent the gradient exploding and vanishing problems while allowing the network to learn long-term dependencies and work with non-saturated activation functions such as relu and be still trained robustly.
Orthogonal Recurrent Neural Networks with Scaled Cayley Transform
This work proposes a simpler and novel update scheme to maintain orthogonal recurrent weight matrices without using complex valued matrices by parametrizing with a skew-symmetric matrix using the Cayley transform.
Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs
This work presents a new architecture for implementing an Efficient Unitary Neural Network (EUNNs), and finds that this architecture significantly outperforms both other state-of-the-art unitary RNNs and the LSTM architecture, in terms of the final performance and/or the wall-clock training speed.
Capacity and Trainability in Recurrent Neural Networks
It is found that for several tasks it is the per-task parameter capacity bound that determines performance, and two novel RNN architectures are proposed, one of which is easier to train than the LSTM or GRU for deeply stacked architectures.
Towards Non-saturating Recurrent Units for Modelling Long-term Dependencies
This work proposes a new recurrent architecture (Non-saturating Recurrent Unit; NRU) that relies on a memory mechanism but forgoes both saturating activation functions and saturating gates, in order to further alleviate vanishing gradients.
Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections
A new parametrisation of the transition matrix is presented which allows efficient training of an RNN while ensuring that the matrix is always orthogonal, and gives similar benefits to the unitary constraint, without the time complexity limitations.
Full-Capacity Unitary Recurrent Neural Networks
This work provides a theoretical argument to determine if a unitary parameterization has restricted capacity, and shows how a complete, full-capacity unitary recurrence matrix can be optimized over the differentiable manifold of unitary matrices.
Long Short-Term Memory
A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.