# Scaling learning algorithms towards AI

@inproceedings{Bengio2007ScalingLA, title={Scaling learning algorithms towards AI}, author={Yoshua Bengio and Yann LeCun}, year={2007} }

One long-term goal of machine learning research is to produce methods that are applicable to highly complex tasks, such as perception (vision, audition), reasoning, intelligent control, and other artificially intelligent behaviors. We argue that in order to progress toward this goal, the Machine Learning community must endeavor to discover algorithms that can learn highly complex functions, with minimal need for prior knowledge, and with minimal human intervention. We present mathematical and…

## 1,199 Citations

### Learning Deep Architectures for AI

- Computer ScienceFound. Trends Mach. Learn.
- 2007

The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.

### How do We Train Deep Architectures ?

- Computer Science
- 2009

The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.

### Learning and Evaluaing Deep Bolztmann Machines

- Computer Science
- 2009

A fast, greedy learning algorithm for Deep Belief Networks that can be performed in a single bottom-up pass, and will fail to adequately account for uncertainty when interpreting am biguous sensory inputs.

### Learning deep generative models

- Computer Science
- 2009

The aim of the thesis is to demonstrate that deep generative models that contain many layers of latent variables and millions of parameters can be learned efficiently, and that the learned high-level feature representations can be successfully applied in a wide spectrum of application domains, including visual object recognition, information retrieval, and classification and regression tasks.

### Tradeoffs in Neural Variational Inference Thesis

- Computer Science
- 2017

A thorough comparison between several selected improvements of Variational Auto-Encoders is presented and the modeling times for these various approaches are compared to provide practical guidelines regarding the trade-offs between the variational lower bound achieved and the run time required for training.

### Deep Learning

- Computer ScienceEncyclopedia of Big Data Technologies
- 2019

Deep learning is a learning method with the deep architecture and the good learning algorithms, which can perform the intellectual learning like learning the features, point out a new direction toward.

### Category: Learning Algorithms Deep Woods

- Computer Science
- 2008

The principle of training a deep architecture by greedy layer-wise unsupervised training has been shown to be successful for deep connectionist architectures and this work attempts to exploit this principle to develop new deep architectures based on deterministic or stochastic decision trees.

### Deep Learning of Representations

- Computer ScienceHandbook on Neural Information Processing
- 2013

This chapter reviews the main motivations and ideas behind deep learning algorithms and their representation-learning components, as well as recent results, and proposes a vision of challenges and hopes on the road ahead, focusing on the questions of invariance and disentangling.

### Building a Subspace of Policies for Scalable Continual Learning

- Computer ScienceArXiv
- 2022

Continual Subspace of Policies (CSP), a method that iteratively learns a subspace of policies in the continual reinforcement learning setting where tasks are presented sequentially, outperforms state-of-the-art methods on a wide range of scenarios in two different domains.

## References

SHOWING 1-10 OF 53 REFERENCES

### Greedy Layer-Wise Training of Deep Networks

- Computer ScienceNIPS
- 2006

These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.

### A Fast Learning Algorithm for Deep Belief Nets

- Computer ScienceNeural Computation
- 2006

A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.

### The Curse of Highly Variable Functions for Local Kernel Machines

- Computer ScienceNIPS
- 2005

We present a series of theoretical arguments supporting the claim that a large class of modern learning algorithms that rely solely on the smoothness prior - with similarity between examples…

### Practical issues in temporal difference learning

- Computer ScienceMachine Learning
- 2004

It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which surpasses comparable networks trained on a massive human expert data set.

### The Curse of Dimensionality for Local Kernel Machines

- Computer Science
- 2005

We present a series of theoretical arguments supporting the claim that a large class of modern learning algorithms based on local kernels are sensitive to the curse of dimensionality. These include…

### Efficient Non-Parametric Function Induction in Semi-Supervised Learning

- Computer ScienceAISTATS
- 2005

Experiments show that the proposed non-parametric algorithms which provide an estimated continuous label for the given unlabeled examples are extended to function induction algorithms that correspond to the minimization of a regularization criterion applied to an out-of-sample example, and happens to have the form of a Parzen windows regressor.

### Fast Kernel Classifiers with Online and Active Learning

- Computer ScienceJ. Mach. Learn. Res.
- 2005

This contribution presents an online SVM algorithm based on the premise that active example selection can yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels.

### Many-Layered Learning

- Computer ScienceNeural Computation
- 2002

This work explores incremental assimilation of new knowledge by sequential learning, and demonstrates a method for simultaneously acquiring and organizing a collection of concepts and functions as a network from a stream of unstructured information.

### Large-scale Learning with SVM and Convolutional for Generic Object Categorization

- Computer Science2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)
- 2006

It is shown that architectures such as convolutional networks are good at learning invariant features, but not always optimal for classification, while Support Vector Machines are good for producing decision surfaces from wellbehaved feature vectors, but cannot learn complicated invariances.

### Training Products of Experts by Minimizing Contrastive Divergence

- Computer ScienceNeural Computation
- 2002

A product of experts (PoE) is an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary because it is hard even to approximate the derivatives of the renormalization term in the combination rule.