• Publications
  • Influence
Long Short-Term Memory
A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
This work proposes a two time-scale update rule (TTUR) for training GANs with stochastic gradient descent on arbitrary GAN loss functions and introduces the "Frechet Inception Distance" (FID) which captures the similarity of generated images to real ones better than the Inception Score.
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
The "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies and significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers.
Self-Normalizing Neural Networks
Self-normalizing neural networks (SNNs) are introduced to enable high-level abstract representations and it is proved that activations close to zero mean and unit variance that are propagated through many network layers will converge towards zero meanand unit variance -- even under the presence of noise and perturbations.
LSTM can Solve Hard Long Time Lag Problems
This work shows that problems used to promote various previous algorithms can be solved more quickly by random weight guessing than by the proposed algorithms, and uses LSTM, its own recent algorithm, to solve a hard problem.
Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies
D3EGF(FIH)J KMLONPEGQSRPETN UCV.WYX(Z R.[ V R6\M[ X N@]_^O\`JaNcb V RcQ W d EGKeL(^(QgfhKeLOE?i)^(QSj ETNPfPQkRl[ V R)m"[ X ^(KeLOEG^ npo qarpo m"[ X ^(KeLOEG^tsAu EGNPb V ^ v wyx
The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions
  • S. Hochreiter
  • Computer Science
    Int. J. Uncertain. Fuzziness Knowl. Based Syst.
  • 1 April 1998
The de-caying error flow is theoretically analyzed, methods trying to overcome vanishing gradients are briefly discussed, and experiments comparing conventional algorithms and alternative methods are presented.
Flat Minima
A new algorithm for finding low-complexity neural networks with high generalization capability that outperforms conventional backprop, weight decay, and optimal brain surgeon/optimal brain damage and requires the computation of second-order derivatives, but has backpropagation's order of complexity.
cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate
‘Copy Number estimation by a Mixture Of PoissonS’ (cn.MOPS), a data processing pipeline for CNV detection in NGS data outperformed its five competitors in terms of precision (1–FDR) and recall for both gains and losses in all benchmark data sets.