Corpus ID: 221397331

Learning explanations that are hard to vary

  title={Learning explanations that are hard to vary},
  author={Giambattista Parascandolo and Alexander Neitz and Antonio Orvieto and Luigi Gresele and B. Sch{\"o}lkopf},
  • Giambattista Parascandolo, Alexander Neitz, +2 authors B. Schölkopf
  • Published 2020
  • Computer Science, Mathematics
  • ArXiv
  • In this paper, we investigate the principle that `good explanations are hard to vary' in the context of deep learning. We show that averaging gradients across examples -- akin to a logical OR of patterns -- can favor memorization and `patchwork' solutions that sew together different strategies, instead of identifying invariances. To inspect this, we first formalize a notion of consistency for minima of the loss surface, which measures to what extent a minimum appears only when examples are… CONTINUE READING


    Publications referenced by this paper.
    Adam: A Method for Stochastic Optimization
    • 50,856
    • Highly Influential
    • PDF
    Automatic differentiation in pytorch, 2017
    • 2017
    Quantifying Generalization in Reinforcement Learning
    • 153
    • Highly Influential
    • PDF
    Domain-Adversarial Training of Neural Networks
    • 2,082
    • Highly Influential
    • PDF
    Geometric Means
    • 147
    • Highly Influential
    • PDF
    Invariant Risk Minimization
    • 125
    • Highly Influential
    • PDF
    Proximal Policy Optimization Algorithms
    • 2,795
    • Highly Influential
    • PDF
    Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent
    • 136
    • PDF
    Accelerated Mirror Descent in Continuous and Discrete Time
    • 130
    • PDF
    AdaGrad stepsizes: sharp convergence over nonconvex landscapes
    • 35
    • PDF