# Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs

@article{Gilboa2019DynamicalIA, title={Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs}, author={Dar Gilboa and Bo Chang and Minmin Chen and Greg Yang and Samuel S. Schoenholz and Ed Huai-hsin Chi and Jeffrey Pennington}, journal={ArXiv}, year={2019}, volume={abs/1901.08987} }

Training recurrent neural networks (RNNs) on long sequence tasks is plagued with difficulties arising from the exponential explosion or vanishing of signals as they propagate forward or backward through the network. Many techniques have been proposed to ameliorate these issues, including various algorithmic and architectural modifications. Two of the most successful RNN architectures, the LSTM and the GRU, do exhibit modest improvements over vanilla RNN cells, but they still suffer from… CONTINUE READING

#### Citations

##### Publications citing this paper.

SHOWING 1-10 OF 18 CITATIONS

## Gating creates slow modes and controls phase-space complexity in GRUs and LSTMs

VIEW 1 EXCERPT

CITES METHODS

## WORKS BY JACOBIAN SPECTRUM EVALUATION

VIEW 3 EXCERPTS

CITES BACKGROUND & METHODS

HIGHLY INFLUENCED

## One-Shot Pruning of Recurrent Neural Networks by Jacobian Spectrum Evaluation

VIEW 2 EXCERPTS

CITES BACKGROUND & METHODS

## A Mean Field Theory of Batch Normalization

VIEW 2 EXCERPTS

CITES BACKGROUND

## MetaInit: Initializing learning by learning to initialize

VIEW 1 EXCERPT

CITES BACKGROUND

## A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off

VIEW 6 EXCERPTS

CITES RESULTS & BACKGROUND

#### References

##### Publications referenced by this paper.

SHOWING 1-10 OF 30 REFERENCES

## Learning Longer-term Dependencies in RNNs with Auxiliary Losses

VIEW 2 EXCERPTS

HIGHLY INFLUENTIAL

## A Mean Field Theory of Batch Normalization

VIEW 4 EXCERPTS

## h-detach: Modifying the LSTM Gradient Towards Better Optimization

VIEW 4 EXCERPTS

HIGHLY INFLUENTIAL

## Learning Precise Timing with LSTM Recurrent Networks

VIEW 3 EXCERPTS

## A Convergence Theory for Deep Learning via Over-Parameterization

VIEW 4 EXCERPTS

HIGHLY INFLUENTIAL