• Corpus ID: 126191494

Statistical Convergence Analysis of Gradient EM on General Gaussian Mixture Models

@article{Yan2017StatisticalCA,
  title={Statistical Convergence Analysis of Gradient EM on General Gaussian Mixture Models},
  author={Bowei Yan and Mingzhang Yin and Purnamrita Sarkar},
  journal={ArXiv},
  year={2017},
  volume={abs/1705.08530}
}
In this paper, we study convergence properties of the gradient Expectation-Maximization algorithm \cite{lange1995gradient} for Gaussian Mixture Models for general number of clusters and mixing coefficients. We derive the convergence rate depending on the mixing coefficients, minimum and maximum pairwise distances between the true centers and dimensionality and number of components; and obtain a near-optimal local contraction radius. While there have been some recent notable works that derive… 

Figures from this paper

Sınıflandırıcı Performanslarının Gauss Karışım Modeline Uygulanan Beklenti-Maksimizasyon Algoritmasına Göre Analiz Edilmesi

Maksimum olabilirlik, karisim modeli, bayes sonucu ve maksimum entropi gibi parametric yogunluk kestirimleri dagilimin cesidi bilindiginde veya tahmin edilebilir oldugunda siklikla kullanilmaktadir.

References

SHOWING 1-10 OF 26 REFERENCES

Convergence of the EM Algorithm for Gaussian Mixtures with Unbalanced Mixing Coefficients

TLDR
A deterministic anti-annealing algorithm is proposed, that significantly improves the speed of convergence of EM for such mixtures with unbalanced mixing coefficients, and is compared against other standard optimization techniques.

Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences

TLDR
It is established that a first-order variant of EM will not converge to strict saddle points almost surely, indicating that the poor performance of the first- order method can be attributed to the existence of bad local maxima rather than bad saddle points.

Statistical guarantees for the EM algorithm: From population to sample-based analysis

TLDR
A general framework for proving rigorous guarantees on the performance of the EM algorithm and a variant known as gradient EM and consequences of the general theory for three canonical examples of incomplete-data problems are developed.

ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM

Two convergence aspects of the EM algorithm are studied: (i) does the EM algorithm find a local maximum or a stationary value of the (incompletedata) likelihood function? (ii) does the sequence of

Mixture densities, maximum likelihood, and the EM algorithm

TLDR
This work discusses the formulation and theoretical and practical properties of the EM algorithm, a specialization to the mixture density context of a general algorithm used to approximate maximum-likelihood estimates for incomplete data problems.

Clustering subgaussian mixtures by semidefinite programming

TLDR
A model-free relax-and-round algorithm for k-means clustering based on a semidefinite relaxation due to Peng and Wei is introduced and a generic method for proving performance guarantees for this algorithm is provided.

A Two-Round Variant of EM for Gaussian Mixtures

We show that, given data from a mixture of k well-separated spherical Gaussians in Rn, a simple two-round variant of EM will, with high probability, learn the centers of the Gaussians to near-optimal

Learning mixtures of Gaussians

  • S. Dasgupta
  • Computer Science
    40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039)
  • 1999
TLDR
This work presents the first provably correct algorithm for learning a mixture of Gaussians, which returns the true centers of the Gaussian to within the precision specified by the user with high probability.

A gradient algorithm locally equivalent to the EM algorithm

TLDR
This EM gradient algorithm approximately solves the M-step of the EM algorithm by one iteration of Newton's method, and the proof of global convergence applies and improves existing theory for the EM algorithms.

On Robustness of Kernel Clustering

TLDR
A semidefinite programming relaxation is introduced for the kernel clustering problem and it is proved that under a suitable model specification, both the K-SVD and SDP approaches are consistent in the limit, albeit SDP is strongly consistent, whereas K- SVD is weakly consistent.