• Publications
  • Influence
Long Short-Term Memory
Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter'sExpand
  • 31,005
  • 5989
  • PDF
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
Generative Adversarial Networks (GANs) excel at creating realistic images with complex models for which maximum likelihood is infeasible. However, the convergence of GAN training has still not beenExpand
  • 1,754
  • 643
  • PDF
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
We introduce the "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies. Like rectified linear units (ReLUs), leaky ReLUsExpand
  • 2,402
  • 321
  • PDF
Self-Normalizing Neural Networks
Deep Learning has revolutionized vision via convolutional neural networks (CNNs) and natural language processing via recurrent neural networks (RNNs). However, success stories of Deep Learning withExpand
  • 824
  • 114
  • PDF
GANs Trained by a Two Time-Scale Update Rule Converge to a Nash Equilibrium
Generative Adversarial Networks (GANs) excel at creating realistic images with complex models for which maximum likelihood is infeasible. However, the convergence of GAN training has still not beenExpand
  • 249
  • 84
  • PDF
LSTM can Solve Hard Long Time Lag Problems
Standard recurrent nets cannot deal with long minimal time lags between relevant signals. Several recent NIPS papers propose alternative methods. We first show: problems used to promote variousExpand
  • 344
  • 44
  • PDF
The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions
  • S. Hochreiter
  • Mathematics, Computer Science
  • Int. J. Uncertain. Fuzziness Knowl. Based Syst.
  • 1 April 1998
Recurrent nets are in principle capable to store past inputs to produce the currently desired output. Because of this property recurrent nets are used in time series prediction and process control.Expand
  • 900
  • 37
  • PDF
Untersuchungen zu dynamischen neuronalen Netzen
  • 490
  • 32
FABIA: factor analysis for bicluster acquisition
Motivation: Biclustering of transcriptomic data groups genes and samples simultaneously. It is emerging as a standard tool for extracting knowledge from gene expression measurements. We propose aExpand
  • 227
  • 29
  • PDF
Flat Minima
We present a new algorithm for finding low-complexity neural networks with high generalization capability. The algorithm searches for a flat minimum of the error function. A flat minimum is a largeExpand
  • 302
  • 18