Corpus ID: 6153756

A simple example of Dirichlet process mixture inconsistency for the number of components

@inproceedings{Miller2013ASE,
  title={A simple example of Dirichlet process mixture inconsistency for the number of components},
  author={Jeffrey W. Miller and M. Harrison},
  booktitle={NIPS},
  year={2013}
}
For data assumed to come from a finite mixture with an unknown number of components, it has become common to use Dirichlet process mixtures (DPMs) not only for density estimation, but also for inferences about the number of components. The typical approach is to use the posterior distribution on the number of clusters — that is, the posterior on the number of components represented in the observed data. However, it turns out that this posterior is not consistent — it does not concentrate at the… Expand

Figures and Topics from this paper

Inconsistency of Pitman-Yor process mixtures for the number of components
TLDR
It is shown that the posterior on data from a finite mixture does not concentrate at the true number of components, and this result applies to a large class of nonparametric mixtures, including DPMs and PYMs, over a wide variety of families of component distributions. Expand
Finite mixture models are typically inconsistent for the number of components
Scientists and engineers are often interested in learning the number of subpopulations (or components) present in a data set. Practitioners commonly use a Dirichlet process mixture model (DPMM) forExpand
Mixture Models With a Prior on the Number of Components
TLDR
It turns out that many of the essential properties of DPMs are also exhibited by MFMs, and the MFM analogues are simple enough that they can be used much like the corresponding DPM properties; this simplifies the implementation of MFMs and can substantially improve mixing. Expand
Posterior Distribution for the Number of Clusters in Dirichlet Process Mixture Models
TLDR
This work provides a rigorous study for the posterior distribution of the number of clusters in DPMM under different prior distributions on the parameters and constraints on the distributions of the data. Expand
Inference for the Number of Topics in the Latent Dirichlet Allocation Model via Bayesian Mixture Modeling
TLDR
A variant of the Metropolis–Hastings algorithm is presented that can be used to estimate the posterior distribution of the number of topics and it is evaluated on synthetic data and with procedures that are currently used in the machine learning literature. Expand
Infinite Gaussian Mixture Modeling with an Improved Estimation of the Number of Clusters
TLDR
This paper shows that IGMM provides an inconsistent estimation of the number of clusters and proposes a modified training procedure that uses the inverse χ for this purpose and demonstrates good results when compared to other methods used to evaluate model order, using realworld databases. Expand
From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering
TLDR
The concept of sparse finite mixture is illustrated that the concept is very generic and easily extended to cluster various types of non-Gaussian data, in particular discrete data and continuous multivariate data arising from non- Gaussian clusters. Expand
Infinite mixtures of multivariate normal-inverse Gaussian distributions for clustering of skewed data
Mixtures of multivariate normal inverse Gaussian (MNIG) distributions can be used to cluster data that exhibit features such as skewness and heavy tails. However, for cluster analysis, using aExpand
Dirichlet process mixtures under affine transformations of the data
TLDR
This work devise a coherent prior specification of the model which makes posterior inference invariant with respect to affine transformations of the data and shows that mild assumptions on the true data generating process are sufficient to ensure that DPM-G models feature such a property. Expand
Dynamic mixtures of finite mixtures and telescoping sampling
TLDR
A novel sampling scheme is proposed for MFMs called the telescoping sampler which allows Bayesian inference for mixtures with arbitrary component distributions and the ease of its application using different component distributions is demonstrated on real data sets. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 32 REFERENCES
Inconsistency of Pitman-Yor process mixtures for the number of components
TLDR
It is shown that the posterior on data from a finite mixture does not concentrate at the true number of components, and this result applies to a large class of nonparametric mixtures, including DPMs and PYMs, over a wide variety of families of component distributions. Expand
The L1-consistency of Dirichlet mixtures in multivariate Bayesian density estimation
TLDR
The L"1-consistency of Dirichlet mixutures in the multivariate density estimation setting is extended and the Kullback-Leibler property of the prior holds and the size of the sieve in the parameter space in terms of L" 1-metric entropy is not larger than the order of n. Expand
Bayesian finite mixtures with an unknown number of components: The allocation sampler
TLDR
A new Markov chain Monte Carlo method for the Bayesian analysis of finite mixture distributions with an unknown number of components is presented and can be used for mixtures of components from any parametric family, under the assumption that the component parameters can be integrated out of the model analytically. Expand
A NONPARAMETRIC BAYESIAN APPROACH TO DETECT THE NUMBER OF REGIMES IN MARKOV SWITCHING MODELS
The literature on Markov switching models is increasing and producing interesting results both at theoretical and applied levels. Most often the number of regimes, i.e., of data generating processes,Expand
Bayesian analysis of finite mixture distributions using the allocation sampler
Finite mixture distributions are receiving more and more attention from statisticians in many different fields of research because they are a very flexible class of models. They are typically usedExpand
Particle filters for mixture models with an unknown number of components
TLDR
The performance of this particle filter, when analyzing both simulated and real data from a Gaussian mixture model, is uniformly better than the particle filter algorithm of Chen and Liu, and in many situations it outperforms a Gibbs Sampler. Expand
How many clusters
The title poses a deceptively simple question that must be addressed by any statistical model or computational algorithm for the clustering of points. Two distinct interpretations are possible, oneExpand
Computing Nonparametric Hierarchical Models
TLDR
The ease with which the strict parametric assumptions common to most standard Bayesian hierarchical models can be relaxed to incorporate uncertainties about functional forms using Dirichlet process components is illustrated, partly enabled by the approach to computation using MCMC methods. Expand
Modelling Heterogeneity With and Without the Dirichlet Process
We investigate the relationships between Dirichlet process (DP) based models and allocation models for a variable number of components, based on exchangeable distributions. It is shown that the DPExpand
Dirichlet Process
  • Y. Teh
  • Computer Science
  • Encyclopedia of Machine Learning
  • 2010
The Dirichlet process is a stochastic proces used in Bayesian nonparametric models of data, particularly in Dirichlet process mixture models (also known as infinite mixture models). It is aExpand
...
1
2
3
4
...