#### Filter Results:

#### Publication Year

1991

2017

#### Publication Type

#### Co-author

#### Publication Venue

#### Data Set Used

#### Key Phrases

Learn More

- Michael E. Tippingmtipping, Christopher M. Bishop
- 1997

Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor… (More)

- Michael E. Tipping, Christopher M. Bishop
- Neural Computation
- 1999

Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However,… (More)

- Christopher M. Bishop, Markus Svensén, Christopher K. I. Williams
- Neural Computation
- 1998

Latent variable models represent the probability density of data in a space of several dimensions in terms of a smaller number of latent, or hidden, variables. A familiar example is factor analysis, which is based on a linear transformation between the latent space and the data space. In this article, we introduce a form of nonlinear latent variable model… (More)

One of the central issues in the use of principal component analysis (PCA) for data modelling is that of choosing the appropriate number of retained components. This problem was recently addressed through the formulation of a Bayesian treatment of PCA (Bishop, 1999a) in terms of a probabilistic latent variable model. A central feature of this approach is… (More)

- John M. Winn, Christopher M. Bishop
- Journal of Machine Learning Research
- 2005

Bayesian inference is now widely established as one of the principal foundations for machine learning. In practice, exact inference is rarely possible, and so a variety of approximation techniques have been developed, one of the most widely used being a deterministic framework called varia-tional inference. In this paper we introduce Variational Message… (More)

The Support Vector Machine (SVM) of Vap-nik [9] has become widely established as one of the leading approaches to pattern recognition and machine learning. It expresses predictions in terms of a linear combination of kernel functions centred on a subset of the training data, known as support vectors. Despite its widespread success, the SVM suffers from some… (More)

- Adrian Corduneanu, Christopher M. Bishop, Morgan Kaufmann
- 2002

Mixture models, in which a probability distribution is represented as a linear superposition of component distributions, are widely used in statistical modelling and pattern recognition. One of the key tasks in the application of mixture models is the determination of a suitable number of components. Conventional approaches based on cross-validation are… (More)

- Christopher M. Bishop, Markus Svensén
- UAI
- 2003

The Hierarchical Mixture of Experts (HME) is a well-known tree-structured model for regression and classification, based on soft probabilis-tic splits of the input space. In its original formulation its parameters are determined by maximum likelihood, which is prone to severe over-fitting, including singularities in the likelihood function. Furthermore the… (More)

It is well known that the addition of noise to the input data of a neural network during training can, in some circumstances, lead to signiicant improvements in generalization performance. Previous work has shown that such training with noise is equivalent to a form of regularization in which an extra term is added to the error function. However, the… (More)

- Julia A. Lasserre, Christopher M. Bishop, Tom Minka
- 2006 IEEE Computer Society Conference on Computer…
- 2006

When labelled training data is plentiful, discriminative techniques are widely used since they give excellent generalization performance. However, for large-scale applications such as object recognition, hand labelling of data is expensive, and there is much interest in semi-supervised techniques based on generative models in which the majority of the… (More)