#### Filter Results:

- Full text PDF available (7)

#### Publication Year

2014

2017

- This year (4)
- Last 5 years (8)
- Last 10 years (8)

#### Publication Type

#### Co-author

#### Publication Venue

#### Key Phrases

Learn More

- Pratik Chaudhari, Anna Choromanska, +6 authors Riccardo Zecchina
- ArXiv
- 2016

This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape at solutions found by gradient descent. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative… (More)

- Matthew Dunn, Levent Sagun, Mike Higgins, V. Ugur Güney, Volkan Cirik, Kyunghyun Cho
- ArXiv
- 2017

We publicly release a new large-scale dataset, called SearchQA, for machine comprehension, or question-answering. Unlike recently released datasets, such as DeepMind CNN/DailyMail and SQuAD, the proposed SearchQA was constructed to reflect a full pipeline of general question-answering. That is, we start not from an existing article and generate a… (More)

- Levent Sagun, V. Ugur Güney, Yann LeCun
- ArXiv
- 2014

Finding minima of a real valued non-convex function over a high dimensional space is a major challenge in science. We provide evidence that some such functions that are defined on high dimensional domains have a narrow band of values whose pre-image contains the bulk of its critical points. This is in contrast with the low dimensional picture in which this… (More)

- Levent Sagun, Léon Bottou, Yann LeCun
- ArXiv
- 2016

We look at the eigenvalues of the Hessian of a loss function before and after training. The eigenvalue distribution is seen to be composed of two parts, the bulk which is concentrated around zero, and the edges which are scattered away from zero. We present empirical evidence for the bulk indicating how over-parametrized the system is, and for the edges… (More)

- Levent Sagun, Thomas Trogdon, Yann LeCun
- ArXiv
- 2015

The authors present empirical universal distributions for the halting time (measured by the number of iterations to reach a given accuracy) of optimization algorithms applied to two random systems: spin glasses and deep learning. Given an algorithm, which we take to be both the optimization routine and the form of the random landscape, the fluctuations of… (More)

- Andrew J Ballard, Ritankar Das, +4 authors David J Wales
- Physical chemistry chemical physics : PCCP
- 2017

Machine learning techniques are being increasingly used as flexible non-linear fitting and prediction tools in the physical sciences. Fitting functions that exhibit multiple solutions as local minima can be analysed in terms of the corresponding machine learning landscape. Methods to explore and visualise molecular potential energy landscapes can be applied… (More)

- Andrew J. Ballard, Ritankar Das, +4 authors David J. Wales
- ArXiv
- 2017

UK Machine learning techniques are being increasingly used as flexible non-linear fitting and prediction tools in the physical sciences. Fitting functions that exhibit multiple solutions as local minima can be analysed in terms of the corresponding machine learning landscape. Methods to explore and visualise molecular potential energy landscapes can be… (More)

- Levent Sagun, Utku Evci, V. Ugur Güney, Yann Dauphin, Léon Bottou
- ArXiv
- 2017

We study the properties of common loss surfaces through their Hessian matrix. In particular, in the context of deep learning, we empirically show that the spectrum of the Hessian is composed of two parts: (1) the bulk centered near zero, (2) and outliers away from the bulk. We present numerical evidence and mathematical justifications to the following… (More)

- ‹
- 1
- ›