• Corpus ID: 15559637

Scaling learning algorithms towards AI

@inproceedings{Bengio2007ScalingLA,
  title={Scaling learning algorithms towards AI},
  author={Yoshua Bengio and Yann LeCun},
  year={2007}
}
One long-term goal of machine learning research is to produce methods that are applicable to highly complex tasks, such as perception (vision, audition), reasoning, intelligent control, and other artificially intelligent behaviors. We argue that in order to progress toward this goal, the Machine Learning community must endeavor to discover algorithms that can learn highly complex functions, with minimal need for prior knowledge, and with minimal human intervention. We present mathematical and… 

Figures and Tables from this paper

Learning Deep Architectures for AI

The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.

How do We Train Deep Architectures ?

The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.

Learning and Evaluaing Deep Bolztmann Machines

A fast, greedy learning algorithm for Deep Belief Networks that can be performed in a single bottom-up pass, and will fail to adequately account for uncertainty when interpreting am biguous sensory inputs.

Learning deep generative models

The aim of the thesis is to demonstrate that deep generative models that contain many layers of latent variables and millions of parameters can be learned efficiently, and that the learned high-level feature representations can be successfully applied in a wide spectrum of application domains, including visual object recognition, information retrieval, and classification and regression tasks.

Tradeoffs in Neural Variational Inference Thesis

A thorough comparison between several selected improvements of Variational Auto-Encoders is presented and the modeling times for these various approaches are compared to provide practical guidelines regarding the trade-offs between the variational lower bound achieved and the run time required for training.

Deep Learning

  • 奥村 義和
  • Computer Science
    Encyclopedia of Big Data Technologies
  • 2019
Deep learning is a learning method with the deep architecture and the good learning algorithms, which can perform the intellectual learning like learning the features, point out a new direction toward.

Category: Learning Algorithms Deep Woods

The principle of training a deep architecture by greedy layer-wise unsupervised training has been shown to be successful for deep connectionist architectures and this work attempts to exploit this principle to develop new deep architectures based on deterministic or stochastic decision trees.

Deep Learning of Representations

This chapter reviews the main motivations and ideas behind deep learning algorithms and their representation-learning components, as well as recent results, and proposes a vision of challenges and hopes on the road ahead, focusing on the questions of invariance and disentangling.

Building a Subspace of Policies for Scalable Continual Learning

Continual Subspace of Policies (CSP), a method that iteratively learns a subspace of policies in the continual reinforcement learning setting where tasks are presented sequentially, outperforms state-of-the-art methods on a wide range of scenarios in two different domains.
...

References

SHOWING 1-10 OF 53 REFERENCES

Greedy Layer-Wise Training of Deep Networks

These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.

A Fast Learning Algorithm for Deep Belief Nets

A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.

The Curse of Highly Variable Functions for Local Kernel Machines

We present a series of theoretical arguments supporting the claim that a large class of modern learning algorithms that rely solely on the smoothness prior - with similarity between examples

Practical issues in temporal difference learning

It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which surpasses comparable networks trained on a massive human expert data set.

The Curse of Dimensionality for Local Kernel Machines

We present a series of theoretical arguments supporting the claim that a large class of modern learning algorithms based on local kernels are sensitive to the curse of dimensionality. These include

Efficient Non-Parametric Function Induction in Semi-Supervised Learning

Experiments show that the proposed non-parametric algorithms which provide an estimated continuous label for the given unlabeled examples are extended to function induction algorithms that correspond to the minimization of a regularization criterion applied to an out-of-sample example, and happens to have the form of a Parzen windows regressor.

Fast Kernel Classifiers with Online and Active Learning

This contribution presents an online SVM algorithm based on the premise that active example selection can yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels.

Many-Layered Learning

This work explores incremental assimilation of new knowledge by sequential learning, and demonstrates a method for simultaneously acquiring and organizing a collection of concepts and functions as a network from a stream of unstructured information.

Large-scale Learning with SVM and Convolutional for Generic Object Categorization

  • F. HuangYann LeCun
  • Computer Science
    2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)
  • 2006
It is shown that architectures such as convolutional networks are good at learning invariant features, but not always optimal for classification, while Support Vector Machines are good for producing decision surfaces from wellbehaved feature vectors, but cannot learn complicated invariances.

Training Products of Experts by Minimizing Contrastive Divergence

A product of experts (PoE) is an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary because it is hard even to approximate the derivatives of the renormalization term in the combination rule.
...