# A Practical Bayesian Framework for Backprop Networks

@article{Mackay1991APB, title={A Practical Bayesian Framework for Backprop Networks}, author={David J. C. Mackay}, journal={Neural Computation}, year={1991} }

A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible: (1) objective comparisons between solutions using alternative network architectures; (2) objective stopping rules for deletion of weights; (3) objective choice of magnitude and type of weight decay terms or additive regularisers (for penalising large weights, etc.); (4) a measure of the e ective number of well{determined parameters in a model; (5) quanti…

## Figures from this paper

## 253 Citations

Posterior Simulation for FeedForward Neural Network

- Computer Science
- 1996

Bayesian inference and prediction with feed-forward neural network models (FFNN's), speciically those with one hidden layer with M hidden nodes, p input nodes, 1 output node and logistic activation functions, is interested.

Bayesian Methods for Adaptive Models

- Computer Science
- 2011

The Bayesian framework for model comparison and regularisation is demonstrated by studying interpolation and classification problems modelled with both linear and non–linear models and it is shown that the careful incorporation of error bar information into a classifier’s predictions yields improved performance.

Reenement of Theories Represented on Bayesian Networks Cnrs Research Project Proposal

- Computer Science

This project presents a research project to develop learning algorithms for a particular knowledge representation formalism known as Bayesian (or belief) networks, and links the problems encountered into learning of Bayesian networks with those of neural networks, the previous area of interest.

Bayesian Training of Backpropagation Networks by theHybrid Monte

- Computer Science
- 1993

It is shown that Bayesian training of backpropagation neural networks can feasibly be performed by the Hybrid Monte Carlo method, and the method has been applied to a test problem, demonstrating that it can produce good predictions, as well as an indication of the uncertainty of these predictions.

Bayesian training of backpropagation networks by the hybrid Monte-Carlo method

- Computer Science
- 1992

It is shown that Bayesian training of backpropagation neural networks can feasibly be performed by the Hybrid Monte Carlo method, and the method has been applied to a test problem, demonstrating that it can produce good predictions, as well as an indication of the uncertainty of these predictions.

Natural-Parameter Networks: A Class of Probabilistic Neural Networks

- Computer ScienceNIPS
- 2016

A class of probabilistic neural networks, dubbed natural-parameter networks (NPN), is proposed as a novel and lightweight Bayesian treatment of NN, which allows the usage of arbitrary exponential-family distributions to model the weights and neurons.

Sparse Bayesian learning and the relevance multi-layer perceptron network

- Computer ScienceProceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005.
- 2005

A Bayesian prior is adopted that includes separate hyperparameters for each weight, allowing redundant weights and hidden layer units to be identified and subsequently pruned from the network, whilst also providing a means to avoid over-fitting the training data.

A Comprehensive guide to Bayesian Convolutional Neural Network with Variational Inference

- Computer ScienceArXiv
- 2019

This paper predicts how certain the model prediction is based on the epistemic and aleatoric uncertainties and empirically shows how the uncertainty can decrease, allowing the decisions made by the network to become more deterministic as the training accuracy increases.

A bayesian decision theoretical approach to supervised learning, selective sampling, and empirical function optimization

- Computer Science
- 2010

The Extended Bayesian Formalism model can serve as a cohesive theory and framework in which a broad range of questions can be analyzed and studied and is used to reanalyze many important theoretical issues in Machine Learning, including No-Free-Lunch, utility implications, and active learning.

Bayesian Regularization and Pruning Using a Laplace Prior

- Computer ScienceNeural Computation
- 1995

Standard techniques for improved generalization from neural networks include weight decay and pruning and a comparison is made with results of MacKay using the evidence framework and a gaussian regularizer.

## References

SHOWING 1-10 OF 41 REFERENCES

A statistical approach to learning and generalization in layered neural networks

- Computer ScienceCOLT 1989
- 1989

Consistent inference of probabilities in layered networks: predictions and generalizations

- Computer ScienceInternational 1989 Joint Conference on Neural Networks
- 1989

The problem of learning a general input-output relation using a layered neural network is discussed in a statistical framework and the authors arrive at a Gibbs distribution on a canonical ensemble of networks with the same architecture.

The Evidence Framework Applied to Classification Networks

- Computer ScienceNeural Computation
- 1992

It is demonstrated that the Bayesian framework for model comparison described for regression models in MacKay (1992a,b) can also be applied to classification problems and an information-based data selection criterion is derived and demonstrated within this framework.

Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures

- Computer Science
- 1991

An unsupervised algorithm which is an alternative to the classical winner-take-all competitive algorithms and a supervised modular architecture in which a number of simple "expert" networks compete to solve distinct pieces of a large task are considered.

A statistical approach to learning and generalization in layered neural networks

- Computer ScienceCOLT '89
- 1989

The proposed formalism is applied to the problems of selecting an optimal architecture and the prediction of learning curves and the Gibbs distribution on the ensemble of networks with a fixed architecture is derived.

Note on generalization, regularization and architecture selection in nonlinear learning systems

- Computer ScienceNeural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop
- 1991

The author proposes a new estimate of generalization performance for nonlinear learning systems called the generalized prediction error (GPE) which is based upon the notion of the effective number of…

Transforming Neural-Net Output Levels to Probability Distributions

- Computer ScienceNIPS
- 1990

A method for computing the first two moments of the probability distribution indicating the range of outputs that are consistent with the input and the training data is presented and shed new light on and generalize the well-known "softmax" scheme.

The Use of Bayesian and Entropic Methods in Neural Network Theory

- Computer Science
- 1989

This paper introduces a novel method of cluster decomposing a PDF by using topographic mappings and presents a technique for designing MRF potentials with low information redundancy for modelling image texture.

Exact Calculation of the Hessian Matrix for the Multilayer Perceptron

- Computer ScienceNeural Computation
- 1992

This paper presents an extended backpropagation algorithm that allows all elements of the Hessian matrix to be evaluated exactly for a feedforward network of arbitrary topology.

Developments in Maximum Entropy Data Analysis

- Mathematics
- 1989

The Bayesian derivation of “Classic” MaxEnt image processing (Skilling 1989a) shows that exp(αS(f,m)), where S(f,m) is the entropy of image f relative to model m, is the only consistent prior…