Trainability of ReLU networks and Data-dependent Initialization.

@article{Shin2019TrainabilityOR,
  title={Trainability of ReLU networks and Data-dependent Initialization.},
  author={Yeonjong Shin and G. Karniadakis},
  journal={arXiv: Learning},
  year={2019}
}
  • Yeonjong Shin, G. Karniadakis
  • Published 2019
  • Computer Science
  • arXiv: Learning
  • In this paper, we study the trainability of rectified linear unit (ReLU) networks. A ReLU neuron is said to be dead if it only outputs a constant for any input. Two death states of neurons are introduced; tentative and permanent death. A network is then said to be trainable if the number of permanently dead neurons is sufficiently small for a learning task. We refer to the probability of a network being trainable as trainability. We show that a network being trainable is a necessary condition… CONTINUE READING
    3 Citations

    Figures and Topics from this paper

    Probabilistic bounds on data sensitivity in deep rectifier networks
    • PDF
    Training Linear Neural Networks: Non-Local Convergence and Complexity Results
    • 3
    • PDF

    References

    SHOWING 1-10 OF 39 REFERENCES
    A Limited Memory
    • arXiv preprint,
    • 2018
    Adam: A Method for Stochastic Optimization
    • 56,887
    • Highly Influential
    • PDF
    Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
    • 8,865
    • Highly Influential
    • PDF
    Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks
    • 112
    • Highly Influential
    • PDF
    Gradient Descent Finds Global Minima of Deep Neural Networks
    • 410
    • Highly Influential
    • PDF
    Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks
    • 229
    • Highly Influential
    • PDF
    Gradient Desce
    • escent Finds Global Minima of Deep Neural Networks, arXiv preprint,
    • 2018
    Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
    • 297
    • Highly Influential
    • PDF
    Overview of Mini-Batch Gradient Descent
    • 2014
    A Limited Memory Algorithm for Bound Constrained Optimization
    • 2,804
    • Highly Influential
    • PDF