A Practical Bayesian Framework for Backpropagation Networks

@article{Mackay1992APB,
  title={A Practical Bayesian Framework for Backpropagation Networks},
  author={David J. C. Mackay},
  journal={Neural Computation},
  year={1992},
  volume={4},
  pages={448-472}
}
  • D. Mackay
  • Published 1 May 1992
  • Computer Science
  • Neural Computation
A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible (1) objective comparisons between solutions using alternative network architectures, (2) objective stopping rules for network pruning or growing procedures, (3) objective choice of magnitude and type of weight decay terms or additive regularizers (for penalizing large weights, etc.), (4) a measure of the effective number of well-determined parameters in a… 
Bayesian Methods for Backpropagation Networks
TLDR
This chapter describes numerical techniques based on Gaussian approximations for implementation of powerful and practical methods for controlling, comparing, and using adaptive network models.
Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks
TLDR
Practical techniques based on Gaussian approximations for implementation of these powerful methods for controlling, comparing and using adaptive networks are described.
Bayesian Regularization of Neural Networks
TLDR
This chapter outlines the equations that define the BRANN method plus a flowchart for producing a BRANN-QSAR model, and some results of the use of BRANNs on a number of data sets are illustrated and compared with other linear and nonlinear models.
Ace of Bayes : Application of Neural
TLDR
Bayesian backprop is applied in the prediction of fat content in minced meat from near infrared spectra and outperforms \early stopping" as well as quadratic regression.
Bayesian Backprop in Action: Pruning, Committees, Error Bars and an Application to Spectroscopy
TLDR
The Bayesian framework for backpropagation is extended to pruned nets, leading to an Ockham Factor for "tuning the architecture to the data".
Bayesian Learning for Neural Networks
TLDR
Bayesian Learning for Neural Networks shows that Bayesian methods allow complex neural network models to be used without fear of the "overfitting" that can occur with traditional neural network learning methods.
Bayesian approach for neural networks--review and case studies
Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks
TLDR
This work presents a novel scalable method for learning Bayesian neural networks, called probabilistic backpropagation (PBP), which works by computing a forward propagation of probabilities through the network and then doing a backward computation of gradients.
Robust Full Bayesian Learning for Radial Basis Networks
TLDR
It is shown that by calibrating the full hierarchical Bayesian prior, the classical Akaike Information criterion, Bayesian information criterion, and minimum description length model selection criteria within a penalized likelihood framework can be obtained.
New prior distribution for Bayesian neural network and learning via Hamiltonian Monte Carlo
TLDR
A new prior law for weights parameters which motivate the network regularization more than $$l_ {1}$$l1 and$$l2 early proposed is proposed which is based on Hamiltonian Monte Carlo to simulate the prior and the posterior distribution.
...
...

References

SHOWING 1-10 OF 29 REFERENCES
Consistent inference of probabilities in layered networks: predictions and generalizations
TLDR
The problem of learning a general input-output relation using a layered neural network is discussed in a statistical framework and the authors arrive at a Gibbs distribution on a canonical ensemble of networks with the same architecture.
A statistical approach to learning and generalization in layered neural networks
TLDR
The proposed formalism is applied to the problems of selecting an optimal architecture and the prediction of learning curves and the Gibbs distribution on the ensemble of networks with a fixed architecture is derived.
The Evidence Framework Applied to Classification Networks
  • D. Mackay
  • Computer Science
    Neural Computation
  • 1992
TLDR
It is demonstrated that the Bayesian framework for model comparison described for regression models in MacKay (1992a,b) can also be applied to classification problems and an information-based data selection criterion is derived and demonstrated within this framework.
Bayesian Interpolation
  • D. Mackay
  • Computer Science
    Neural Computation
  • 1992
TLDR
The Bayesian approach to regularization and model-comparison is demonstrated by studying the inference problem of interpolating noisy data by examining the posterior probability distribution of regularizing constants and noise levels.
Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures
TLDR
An unsupervised algorithm which is an alternative to the classical winner-take-all competitive algorithms and a supervised modular architecture in which a number of simple "expert" networks compete to solve distinct pieces of a large task are considered.
Note on generalization, regularization and architecture selection in nonlinear learning systems
  • J. Moody
  • Computer Science
    Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop
  • 1991
The author proposes a new estimate of generalization performance for nonlinear learning systems called the generalized prediction error (GPE) which is based upon the notion of the effective number of
Transforming Neural-Net Output Levels to Probability Distributions
TLDR
A method for computing the first two moments of the probability distribution indicating the range of outputs that are consistent with the input and the training data is presented and shed new light on and generalize the well-known "softmax" scheme.
Exact Calculation of the Hessian Matrix for the Multilayer Perceptron
TLDR
This paper presents an extended backpropagation algorithm that allows all elements of the Hessian matrix to be evaluated exactly for a feedforward network of arbitrary topology.
Developments in Maximum Entropy Data Analysis
The Bayesian derivation of “Classic” MaxEnt image processing (Skilling 1989a) shows that exp(αS(f,m)), where S(f,m) is the entropy of image f relative to model m, is the only consistent prior
Learning representations by back-propagating errors
TLDR
Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.
...
...