• Corpus ID: 3286670

Expressive power of recurrent neural networks

  title={Expressive power of recurrent neural networks},
  author={Valentin Khrulkov and Alexander Novikov and I. Oseledets},
Deep neural networks are surprisingly efficient at solving practical tasks, but the theory behind this phenomenon is only starting to catch up with the practice. Numerous works show that depth is the key to this efficiency. A certain class of deep convolutional networks -- namely those that correspond to the Hierarchical Tucker (HT) tensor decomposition -- has been proven to have exponentially higher expressive power than shallow networks. I.e. a shallow network of exponential width is required… 

Figures and Tables from this paper

Generalized Tensor Models for Recurrent Neural Networks
This work attempts to reduce the gap between theory and practice by extending the theoretical analysis to RNNs which employ various nonlinearities, such as Rectified Linear Unit (ReLU), and shows that they also benefit from properties of universality and depth efficiency.
Tucker Decomposition Network: Expressive Power and Comparison
The main contribution of this paper is to develop a deep network based on Tucker tensor decomposition, and analyze its expressive power.
On the Memory Mechanism of Tensor-Power Recurrent Models
This work proves that a large degree p is an essential condition to achieve the long memory effect, yet it would lead to unstable dynamical behaviors, and extends the degree p from discrete to a differentiable domain, such that it is efficiently learnable from a variety of datasets.
Depth Enables Long-Term Memory for Recurrent Neural Networks
  • A. Ziv
  • Computer Science
  • 2020
It is proved that deep recurrent networks support Start-End separation ranks which are combinatorially higher than those supported by their shallow counterparts, and established that depth brings forth an overwhelming advantage in the ability of recurrent networks to model long-term dependencies.
Connecting Weighted Automata, Tensor Networks and Recurrent Neural Networks through Spectral Learning
In this paper, we present connections between three models used in different research fields: weighted finite automata~(WFA) from formal languages and linguistics, recurrent neural networks used in
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks
Inspired by the theory, explicit regularization discouraging locality is designed and demonstrated its ability to improve the performance of modern convolutional networks on non-local tasks, in defiance of conventional wisdom by which architectural changes are needed.
On the Long-Term Memory of Deep Recurrent Networks
It is established that depth brings forth an overwhelming advantage in the ability of recurrent networks to model long-term dependencies, and an exemplar of quantifying this key attribute which may be readily extended to other RNN architectures of interest, e.g. variants of LSTM networks.
Compact Neural Architecture Designs by Tensor Representations
A framework of tensorial neural networks (TNNs) extending existing linear layers on low- order tensors to multilinear operations on higher-order tensors is proposed, demonstrating that TNNs outperform the state-of-the-art low-rank methods on a wide range of backbone networks and datasets.
Adaptive Learning of Tensor Network Structures
This work develops a generic and efficient adaptive algorithm to jointly learn the structure and the parameters of a TN from data that outperforms the state-of-the-art evolutionary topology search introduced in [18] for tensor decomposition of images and finds efficient structures to compress neural networks outperforming popular TT based approaches.
Tensor-Train Recurrent Neural Networks for Interpretable Multi-Way Financial Forecasting
It is shown, through the analysis of TT-factors, that the physical meaning underlying tensor decomposition, enables the TT-RNN model to aid the interpretability of results, thus mitigating the notorious “black-box” issue associated with neural networks.


On the Expressive Power of Deep Learning: A Tensor Analysis
It is proved that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require exponential size in order to be realized (or even approximated) by a shallow network.
Convolutional Rectifier Networks as Generalized Tensor Decompositions
Developing effective methods for training convolutional arithmetic circuits may give rise to a deep learning architecture that is provably superior to Convolutional rectifier networks, which has so far been overlooked by practitioners.
On the Expressive Power of Deep Neural Networks
We propose a new approach to the problem of neural network expressivity, which seeks to characterize how structural properties of a neural network family affect the functions it is able to compute.
Opening the Black Box of Deep Neural Networks via Information
This work demonstrates the effectiveness of the Information-Plane visualization of DNNs and shows that the training time is dramatically reduced when adding more hidden layers, and the main advantage of the hidden layers is computational.
On the Number of Linear Regions of Deep Neural Networks
We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. Deep
Deep Residual Learning for Image Recognition
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
On the Expressive Efficiency of Sum Product Networks
A result is established which establishes the existence of a relatively simple distribution with fully tractable marginal densities which cannot be efficiently captured by D&C SPNs of any depth, but which can be efficiently capture by various other deep generative models.
Shallow vs. Deep Sum-Product Networks
It is proved there exist families of functions that can be represented much more efficiently with a deep network than with a shallow one, i.e. with substantially fewer hidden units.
On the importance of initialization and momentum in deep learning
It is shown that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs to levels of performance that were previously achievable only with Hessian-Free optimization.
Speech recognition with deep recurrent neural networks
This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.