#### Filter Results:

- Full text PDF available (129)

#### Publication Year

1998

2017

- This year (9)
- Last 5 years (58)
- Last 10 years (101)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Data Set Used

#### Key Phrases

Learn More

- Geoffrey E. Hinton, Simon Osindero, Yee Whye Teh
- Neural Computation
- 2006

We show how to use "complementary priors" to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected… (More)

We consider problems involving groups of data, where each observation within a group is a draw from a mixture model, and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet… (More)

- Yee Whye Teh
- ACL
- 2006

We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model… (More)

- Max Welling, Yee Whye Teh
- ICML
- 2011

In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic gradient optimization algorithm we show that the iterates will converge to samples from the true posterior distribution as we anneal the stepsize. This seamless… (More)

Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and this variety motivates the need for careful empirical… (More)

- Yee Whye Teh, David Newman, Max Welling
- NIPS
- 2006

Latent Dirichlet allocation (LDA) is a Bayesian network that has recently gained much popularity in applications ranging from document modeling to computer vision. Due to the large scale nature of these applications, current inference procedures like variational Bayes and Gibbs sampling have been found lacking. In this paper we propose the collapsed… (More)

- Andriy Mnih, Yee Whye Teh
- ICML
- 2012

In spite of their superior performance, neural probabilistic language models (NPLMs) remain far less widely used than n-gram models due to their notoriously long training times, which are measured in weeks even for moderately-sized datasets. Training NPLMs is computationally expensive because they are explicitly normalized, which leads to having to consider… (More)

- Tamara L. Berg, Alexander C. Berg, +5 authors David A. Forsyth
- Proceedings of the 2004 IEEE Computer Society…
- 2004

We show quite good face clustering is possible for a dataset of inaccurately and ambiguously labelled face images. Our dataset is 44,773 face images, obtained by applying a face finder to approximately half a million captioned news images. This dataset is more realistic than usual face recognition datasets, because it contains faces captured "in the wild"… (More)

- Yew Jin Lim, Yee Whye Teh
- 2007

Singular value decomposition (SVD) is a matrix decomposition algorithm that returns the optimal (in the sense of squared error) low-rank decomposition of a matrix. SVD has found widespread use across a variety of machine learning applications, where its output is interpreted as compact and informative representations of data. The Netflix Prize challenge,… (More)

- Yee Whye Teh, Dilan Görür, Zoubin J. C. Ghahramani
- AISTATS
- 2007

The Indian buffet process (IBP) is a Bayesian nonparametric distribution whereby objects are modelled using an unbounded number of latent features. In this paper we derive a stick-breaking representation for the IBP. Based on this new representation, we develop slice samplers for the IBP that are efficient, easy to implement and are more generally… (More)