#### Filter Results:

- Full text PDF available (17)

#### Publication Year

2007

2016

- This year (0)
- Last 5 years (12)
- Last 10 years (20)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Key Phrases

Learn More

Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum. In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule… (More)

- James Martens
- ICML
- 2010

We develop a 2 nd-order optimization method based on the " Hessian-free " approach, and apply it to training deep auto-encoders. Without using pre-training, we obtain results superior to those reported by Hinton & Salakhutdinov (2006) on the same tasks they considered. Our method is practical, easy to use, scales nicely to very large datasets, and isn't… (More)

- Ilya Sutskever, James Martens, Geoffrey E. Hinton
- ICML
- 2011

Recurrent Neural Networks (RNNs) are very powerful sequence models that do not enjoy widespread use because it is extremely difficult to train them properly. Fortunately, recent advances in Hessian-free optimization have been able to overcome the difficulties associated with training RNNs, making it possible to apply them successfully to challenging… (More)

- James Martens, Ilya Sutskever
- ICML
- 2011

In this work we resolve the long-outstanding problem of how to effectively train recurrent neu-ral networks (RNNs) on complex and difficult sequence modeling problems which may contain long-term data dependencies. Utilizing recent advances in the Hessian-free optimization approach (Martens, 2010), together with a novel damping scheme, we successfully train… (More)

- Arvind Neelakantan, Luke Vilnis, +4 authors James Martens
- ArXiv
- 2015

Deep feedforward and recurrent networks have achieved impressive results in many perception and language processing applications. This success is partially attributed to architectural innovations such as convolutional and long short-term memory networks. The main motivation for these architectural innovations is that they capture better domain knowledge,… (More)

- James Martens, Ilya Sutskever
- Neural Networks: Tricks of the Trade
- 2012

- James Martens, Roger B. Grosse
- ICML
- 2015

We propose an efficient method for approximating natural gradient descent in neural networks which we call Kronecker-factored Approximate Curvature (K-FAC). K-FAC is based on an efficiently invertible approximation of a neural net-work's Fisher information matrix which is neither diagonal nor low-rank, and in some cases is completely non-sparse. It is… (More)

- James Martens
- ICML
- 2010

We develop a new algorithm, based on EM, for learning the Linear Dynamical System model. Called the method of Approximated Second-Order Statistics (ASOS) our approach achieves dramatically superior computational performance over standard EM through its use of approximations, which we justify with both intuitive explanations and rigorous convergence results.… (More)

- James Martens, Ilya Sutskever, Kevin Swersky
- ICML
- 2012

In this work we develop Curvature Propagation (CP), a general technique for efficiently computing unbiased approximations of the Hessian of any function that is computed using a computational graph. At the cost of roughly two gradient evaluations, CP can give a rank-1 approximation of the whole Hessian, and can be repeatedly applied to give increasingly… (More)

- James Martens, Ilya Sutskever
- AISTATS
- 2010

Markov Random Fields (MRFs) are an important class of probabilistic models which are used for density estimation, classification , denoising, and for constructing Deep Belief Networks. Every application of an MRF requires addressing its inference problem , which can be done using deterministic inference methods or using stochastic Markov Chain Monte Carlo… (More)