Corpus ID: 3331622

Kronecker Recurrent Units

@inproceedings{Jose2018KroneckerRU,
  title={Kronecker Recurrent Units},
  author={C. Jose and Moustapha Ciss{\'e} and F. Fleuret},
  booktitle={ICML},
  year={2018}
}
Our work addresses two important issues with recurrent neural networks: (1) they are over-parameterized, and (2) the recurrence matrix is ill-conditioned. The former increases the sample complexity of learning and the training time. The latter causes the vanishing and exploding gradient problem. We present a flexible recurrent neural network model called Kronecker Recurrent Units (KRU). KRU achieves parameter efficiency in RNNs through a Kronecker factored recurrent matrix. It overcomes the ill… Expand
AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks
TLDR
This paper draws connections between recurrent networks and ordinary differential equations and proposes a special form of recurrent networks called AntisymmetricRNN, able to capture long-term dependencies thanks to the stability property of its underlying differential equation. Expand
ANTISYMMETRICRNN: A DYNAMICAL SYSTEM VIEW
  • 2018
Recurrent neural networks have gained widespread use in modeling sequential data. Learning long-term dependencies using these models remains difficult though, due to exploding or vanishing gradients.Expand
Stable Recurrent Models
TLDR
Theoretically, stable recurrent neural networks are well approximated by feed-forward networks for the purpose of both inference and training by gradient descent and it is demonstrated stable recurrent models often perform as well as their unstable counterparts on benchmark sequence tasks. Expand
When Recurrent Models Don't Need To Be Recurrent
TLDR
It is proved stable recurrent neural networks are well approximated by feed-forward networks for the purpose of both inference and training by gradient descent and recurrent models satisfying the stability assumption of the theory can have excellent performance on real sequence learning tasks. Expand
WARDS BETTER OPTIMIZATION
Recurrent neural networks are known for their notorious exploding and vanishing gradient problem (EVGP). This problem becomes more evident in tasks where the information needed to correctly solveExpand
Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs
TLDR
This work develops a mean field theory of signal propagation in LSTMs and GRUs that enables it to calculate the time scales for signal propagation as well as the spectral properties of the state-to-state Jacobians, and derives a novel initialization scheme that eliminates or reduces training instabilities. Expand
Monotonic Kronecker-Factored Lattice
TLDR
This paper proves that the function class of an ensemble of M base KFL models strictly increases as M increases up to a certain threshold, and shows that every multilinear interpolated lattice function can be expressed. Expand
Short-Term Memory Optimization in Recurrent Neural Networks by Autoencoder-based Initialization
TLDR
An initialization schema that pretrains the weights of a recurrent neural network to approximate the linear autoencoder of the input sequences is introduced and it is shown how such pretraining can better support solving hard classification tasks with long sequences. Expand
FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network
TLDR
The FastRNN and FastGRNN algorithms are developed to address the twin RNN limitations of inaccurate training and inefficient prediction and to be deployed on severely resource-constrained IoT microcontrollers too tiny to store other RNN models. Expand
Deterministic Inference of Neural Stochastic Differential Equations
TLDR
A novel algorithm is introduced that solves a generic NSDE using only deterministic approximation methods and comes with theoretical guarantees on numerical stability and convergence to the true solution, enabling its computational use for robust, accurate, and efficient prediction of long sequences. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 47 REFERENCES
Unitary Evolution Recurrent Neural Networks
TLDR
This work constructs an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned, and demonstrates the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies. Expand
Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections
TLDR
A new parametrisation of the transition matrix is presented which allows efficient training of an RNN while ensuring that the matrix is always orthogonal, and gives similar benefits to the unitary constraint, without the time complexity limitations. Expand
Full-Capacity Unitary Recurrent Neural Networks
TLDR
This work provides a theoretical argument to determine if a unitary parameterization has restricted capacity, and shows how a complete, full-capacity unitary recurrence matrix can be optimized over the differentiable manifold of unitary matrices. Expand
On orthogonality and learning recurrent networks with long term dependencies
TLDR
This paper proposes a weight matrix factorization and parameterization strategy through which the degree of expansivity induced during backpropagation can be controlled and finds that hard constraints on orthogonality can negatively affect the speed of convergence and model performance. Expand
Low-rank passthrough neural networks
TLDR
This work proposes simple, yet effective, low- rank and low-rank plus diagonal matrix parametrizations for Passthrough Networks which exploit this decoupling property, reducing the data complexity and memory requirements of the network while preserving its memory capacity. Expand
Gated Feedback Recurrent Neural Networks
TLDR
The empirical evaluation of different RNN units revealed that the proposed gated-feedback RNN outperforms the conventional approaches to build deep stacked RNNs in the tasks of character-level language modeling and Python program evaluation. Expand
A Simple Way to Initialize Recurrent Networks of Rectified Linear Units
TLDR
This paper proposes a simpler solution that use recurrent neural networks composed of rectified linear units that is comparable to LSTM on four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem. Expand
Recurrent Orthogonal Networks and Long-Memory Tasks
TLDR
This work carefully analyzes two synthetic datasets originally outlined in (Hochreiter and Schmidhuber, 1997) which are used to evaluate the ability of RNNs to store information over many time steps and explicitly construct RNN solutions to these problems. Expand
Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs
TLDR
This work presents a new architecture for implementing an Efficient Unitary Neural Network (EUNNs), and finds that this architecture significantly outperforms both other state-of-the-art unitary RNNs and the LSTM architecture, in terms of the final performance and/or the wall-clock training speed. Expand
Optimizing Neural Networks with Kronecker-factored Approximate Curvature
TLDR
K-FAC is an efficient method for approximating natural gradient descent in neural networks which is based on an efficiently invertible approximation of a neural network's Fisher information matrix which is neither diagonal nor low-rank, and in some cases is completely non-sparse. Expand
...
1
2
3
4
5
...