• Corpus ID: 221970230

Likelihood Landscape and Local Minima Structures of Gaussian Mixture Models

@article{Chen2020LikelihoodLA,
  title={Likelihood Landscape and Local Minima Structures of Gaussian Mixture Models},
  author={Yudong Chen and Xumei Xi},
  journal={ArXiv},
  year={2020},
  volume={abs/2009.13040}
}
In this paper, we study the landscape of the population negative log-likelihood function of Gaussian Mixture Models with a general number of components. Due to nonconvexity, there exist multiple local minima that are not globally optimal, even when the mixture is well-separated. We show that all local minima share the same form of structure that partially identifies the component centers of the true mixture, in the sense that each local minimum involves a non-overlapping combination of fitting… 

Figures from this paper

Estimating Gaussian mixtures using sparse polynomial moment systems

TLDR
This work presents an algorithm that performs parameter recovery, and therefore density estimation, for high dimensional Gaussian mixture models that scales linearly in the dimension.

A Geometric Approach to $k$-means

TLDR
This work proposes a general algorithmic framework for escaping undesirable local solutions and recovering the global solution (or the ground truth) of k-means clustering by alternating between the following two steps iteratively.

References

SHOWING 1-10 OF 34 REFERENCES

Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences

TLDR
It is established that a first-order variant of EM will not converge to strict saddle points almost surely, indicating that the poor performance of the first- order method can be attributed to the existence of bad local maxima rather than bad saddle points.

Are There Local Maxima in the Infinite-Sample Likelihood of Gaussian Mixture Estimation?

Consider the problem of estimating the centers Open image in new window of a uniform mixture of unit-variance spherical Gaussians in Open image in new window , Open image in new window (1)

Convergence of Gradient EM on Multi-component Mixture of Gaussians

TLDR
The convergence properties of the gradient variant of Expectation-Maximization algorithm for Gaussian Mixture Models for arbitrary number of clusters and mixing coefficients are studied and a near-optimal local contraction radius is obtained.

Strong identifiability and optimal minimax rates for finite mixture estimation

We study the rates of estimation of finite mixing distributions, that is, the parameters of the mixture. We prove that under some regularity and strong identifiability conditions, around a given mixing

Challenges with EM in application to weakly identifiable mixture models

TLDR
This work demonstrates via simulation studies a broad range of over-specified mixture models for which the EM algorithm converges very slowly, both in one and higher dimensions, and reveals distinct regimes in the convergence behavior of EM as a function of the dimension $d.

Ten Steps of EM Suffice for Mixtures of Two Gaussians

TLDR
This work shows that the population version of EM, where the algorithm is given access to infinitely many samples from the mixture, converges geometrically to the correct mean vectors, and provides simple, closed-form expressions for the convergence rate.

Benefits of over-parameterization with EM

TLDR
It is proved that introducing the (statistically redundant) weight parameters enables EM to find the global maximizer of the log-likelihood starting from almost any initial mean parameters, whereas EM without this over-parameterization may very often fail.

Singularity, misspecification and the convergence rate of EM

TLDR
This work makes use of a careful form of localization in the associated empirical process, and develops a recursive argument to progressively sharpen the statistical rate of the EM algorithm in over-specified settings.

Global Convergence of EM Algorithm for Mixtures of Two Component Linear Regression

TLDR
It is shown here that EM converges for mixed linear regression with two components (it is known that it may fail to converge for three or more), and moreover that this convergence holds for random initialization.

Global Convergence of Least Squares EM for Demixing Two Log-Concave Densities

TLDR
It is demonstrated that Least Squares EM, a variant of the EM algorithm, converges to the true location parameter from a randomly initialized point, and this global convergence property is robust under model mis-specification.