• Corpus ID: 236772790

Coordinate descent on the orthogonal group for recurrent neural network training

@article{Massart2021CoordinateDO,
  title={Coordinate descent on the orthogonal group for recurrent neural network training},
  author={Estelle M. Massart and Vinayak Abrol},
  journal={ArXiv},
  year={2021},
  volume={abs/2108.00051}
}
We propose to use stochastic Riemannian coordinate descent on the orthogonal group for recurrent neural network training. The algorithm rotates successively two columns of the recurrent matrix, an operation that can be efficiently implemented as a multiplication by a Givens matrix. In the case when the coordinate is selected uniformly at random at each iteration, we prove the convergence of the proposed algorithm under standard assumptions on the loss function, stepsize and minibatch noise. In… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 47 REFERENCES
Orthogonal Recurrent Neural Networks with Scaled Cayley Transform
TLDR
This work proposes a simpler and novel update scheme to maintain orthogonal recurrent weight matrices without using complex valued matrices by parametrizing with a skew-symmetric matrix using the Cayley transform.
Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections
TLDR
A new parametrisation of the transition matrix is presented which allows efficient training of an RNN while ensuring that the matrix is always orthogonal, and gives similar benefits to the unitary constraint, without the time complexity limitations.
Complex Unitary Recurrent Neural Networks using Scaled Cayley Transform
TLDR
In the experiments conducted, the scaled Cayley unitary recurrent neural network (scuRNN) achieves comparable or better results than scoRNN and other unitary RNNs without fixing the scaling matrix.
Coordinate-descent for learning orthogonal matrices through Givens rotations
TLDR
A framework for optimizing orthogonal matrices, that is the parallel of coordinate-descent in Euclidean spaces, based on Givens-rotations, a fast-to-compute operation that affects a small number of entries in the learned matrix, and preserves orthogonality is proposed.
Unitary Evolution Recurrent Neural Networks
TLDR
This work constructs an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned, and demonstrates the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.
Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group
TLDR
A novel approach to perform first-order optimization with orthogonal and unitary constraints based on a parametrization stemming from Lie group theory through the exponential map is introduced, showing faster, accurate, and more stable convergence in several tasks designed to test RNNs.
Learning Unitary Operators with Help From u(n)
TLDR
This work describes a parametrization using the Lie algebra $\mathfrak{u}(n)$ associated with the Lie group $U( n)$ of $n \times n$ unitary matrices, and provides a simple space in which to do gradient descent.
Orthogonal Deep Neural Networks
TLDR
This paper proves that DNNs are of local isometry on data distributions of practical interest, and establishes a new generalization error bound that is both scale- and range-sensitive to singular value spectrum of each of networks’ weight matrices.
On the difficulty of training recurrent neural networks
TLDR
This paper proposes a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem and validates empirically the hypothesis and proposed solutions.
Stochastic Gradient Descent on Riemannian Manifolds
  • S. Bonnabel
  • Computer Science, Mathematics
    IEEE Transactions on Automatic Control
  • 2013
TLDR
This paper develops a procedure extending stochastic gradient descent algorithms to the case where the function is defined on a Riemannian manifold and proves that, as in the Euclidian case, the gradient descent algorithm converges to a critical point of the cost function.
...
1
2
3
4
5
...