#### Filter Results:

- Full text PDF available (19)

#### Publication Year

2007

2017

- This year (1)
- Last 5 years (13)
- Last 10 years (21)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Key Phrases

Learn More

- James Martens
- ICML
- 2010

We develop a 2nd-order optimization method based on the “Hessian-free” approach, and apply it to training deep auto-encoders. Without using pre-training, we obtain results superior to those reported by Hinton & Salakhutdinov (2006) on the same tasks they considered. Our method is practical, easy to use, scales nicely to very large datasets, and isn’t… (More)

Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum. In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule… (More)

- Ilya Sutskever, James Martens, Geoffrey E. Hinton
- ICML
- 2011

Recurrent Neural Networks (RNNs) are very powerful sequence models that do not enjoy widespread use because it is extremely difficult to train them properly. Fortunately, recent advances in Hessian-free optimization have been able to overcome the difficulties associated with training RNNs, making it possible to apply them successfully to challenging… (More)

- James Martens, Ilya Sutskever
- ICML
- 2011

In this work we resolve the long-outstanding problem of how to effectively train recurrent neural networks (RNNs) on complex and difficult sequence modeling problems which may contain long-term data dependencies. Utilizing recent advances in the Hessian-free optimization approach (Martens, 2010), together with a novel damping scheme, we successfully train… (More)

- Arvind Neelakantan, Luke Vilnis, +4 authors James Martens
- ArXiv
- 2015

Deep feedforward and recurrent networks have achieved impressive results in many perception and language processing applications. This success is partially attributed to architectural innovations such as convolutional and long short-term memory networks. A major reason for these architectural innovations is that they capture better domain knowledge, and… (More)

- James Martens, Ilya Sutskever
- Neural Networks: Tricks of the Trade
- 2012

Hessian-Free optimization (HF) is an approach for unconstrained minimization of real-valued smooth objective functions. Like standard Newton’s method, it uses local quadratic approximations to generate update proposals. It belongs to the broad class of approximate Newton methods that are practical for problems of very high dimensionality, such as the… (More)

- James Martens, Roger B. Grosse
- ICML
- 2015

We propose an efficient method for approximating natural gradient descent in neural networks which we call Kronecker-factored Approximate Curvature (K-FAC). K-FAC is based on an efficiently invertible approximation of a neural network’s Fisher information matrix which is neither diagonal nor low-rank, and in some cases is completely non-sparse. It is… (More)

- James Martens, Ilya Sutskever
- AISTATS
- 2010

Markov Random Fields (MRFs) are an important class of probabilistic models which are used for density estimation, classification, denoising, and for constructing Deep Belief Networks. Every application of an MRF requires addressing its inference problem, which can be done using deterministic inference methods or using stochastic Markov Chain Monte Carlo… (More)

This paper examines the question: What kinds of distributions can be efficiently represented by Restricted Boltzmann Machines (RBMs)? We characterize the RBM’s unnormalized log-likelihood function as a type of neural network, and through a series of simulation results relate these networks to ones whose representational properties are better understood. We… (More)

- James Martens, Venkatesh Medabalimi
- ArXiv
- 2014

Sum Product Networks (SPNs) are a recently developed class of deep generative models which compute their associated unnormalized density functions using a special type of arithmetic circuit. When certain sufficient structural conditions are imposed on these circuits (called the decomposability and completeness conditions or D&C conditions), marginal… (More)