• Corpus ID: 236772790

Coordinate descent on the orthogonal group for recurrent neural network training

@article{Massart2021CoordinateDO,
title={Coordinate descent on the orthogonal group for recurrent neural network training},
author={Estelle M. Massart and Vinayak Abrol},
journal={ArXiv},
year={2021},
volume={abs/2108.00051}
}
• Published 30 July 2021
• Computer Science
• ArXiv
We propose to use stochastic Riemannian coordinate descent on the orthogonal group for recurrent neural network training. The algorithm rotates successively two columns of the recurrent matrix, an operation that can be efficiently implemented as a multiplication by a Givens matrix. In the case when the coordinate is selected uniformly at random at each iteration, we prove the convergence of the proposed algorithm under standard assumptions on the loss function, stepsize and minibatch noise. In…

References

SHOWING 1-10 OF 47 REFERENCES
Orthogonal Recurrent Neural Networks with Scaled Cayley Transform
• Computer Science
ICML
• 2018
This work proposes a simpler and novel update scheme to maintain orthogonal recurrent weight matrices without using complex valued matrices by parametrizing with a skew-symmetric matrix using the Cayley transform.
Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections
• Computer Science
ICML
• 2017
A new parametrisation of the transition matrix is presented which allows efficient training of an RNN while ensuring that the matrix is always orthogonal, and gives similar benefits to the unitary constraint, without the time complexity limitations.
Complex Unitary Recurrent Neural Networks using Scaled Cayley Transform
• Computer Science
AAAI
• 2019
In the experiments conducted, the scaled Cayley unitary recurrent neural network (scuRNN) achieves comparable or better results than scoRNN and other unitary RNNs without fixing the scaling matrix.
Coordinate-descent for learning orthogonal matrices through Givens rotations
• Computer Science
ICML
• 2014
A framework for optimizing orthogonal matrices, that is the parallel of coordinate-descent in Euclidean spaces, based on Givens-rotations, a fast-to-compute operation that affects a small number of entries in the learned matrix, and preserves orthogonality is proposed.
Unitary Evolution Recurrent Neural Networks
• Computer Science
ICML
• 2016
This work constructs an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned, and demonstrates the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.
Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group
• Computer Science, Mathematics
ICML
• 2019
A novel approach to perform first-order optimization with orthogonal and unitary constraints based on a parametrization stemming from Lie group theory through the exponential map is introduced, showing faster, accurate, and more stable convergence in several tasks designed to test RNNs.
Learning Unitary Operators with Help From u(n)
• Mathematics, Computer Science
AAAI
• 2017
This work describes a parametrization using the Lie algebra $\mathfrak{u}(n)$ associated with the Lie group $U( n)$ of $n \times n$ unitary matrices, and provides a simple space in which to do gradient descent.
Orthogonal Deep Neural Networks
• Computer Science
IEEE Transactions on Pattern Analysis and Machine Intelligence
• 2021
This paper proves that DNNs are of local isometry on data distributions of practical interest, and establishes a new generalization error bound that is both scale- and range-sensitive to singular value spectrum of each of networks’ weight matrices.
On the difficulty of training recurrent neural networks
• Computer Science
ICML
• 2013
This paper proposes a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem and validates empirically the hypothesis and proposed solutions.
Stochastic Gradient Descent on Riemannian Manifolds
• S. Bonnabel
• Computer Science, Mathematics
IEEE Transactions on Automatic Control
• 2013
This paper develops a procedure extending stochastic gradient descent algorithms to the case where the function is defined on a Riemannian manifold and proves that, as in the Euclidian case, the gradient descent algorithm converges to a critical point of the cost function.