• Corpus ID: 7150984

A Method of Moments for Mixture Models and Hidden Markov Models

@inproceedings{Anandkumar2012AMO,
  title={A Method of Moments for Mixture Models and Hidden Markov Models},
  author={Anima Anandkumar and Daniel J. Hsu and Sham M. Kakade},
  booktitle={COLT},
  year={2012}
}
Mixture models are a fundamental tool in applied statistics and machine learning for treating data taken from multiple subpopulations. The current practice for estimating the parameters of such models relies on local search heuristics (e.g., the EM algorithm) which are prone to failure, and existing consistent methods are unfavorable due to their high computational and sample complexity which typically scale exponentially with the number of mixture components. This work develops an efficient… 

Figures from this paper

Estimating Mixture Models via Mixtures of Polynomials
TLDR
This work presents Polymom, an unifying framework based on method of moments in which estimation procedures are easily derivable, just as in EM, and allows us to cast estimation as a Generalized Moment Problem.
Learning High-Dimensional Mixtures of Graphical Models
TLDR
This work proposes a novel approach for estimating the mixture components, and its output is a tree-mixture model which serves as a good approximation to the underlying graphical model mixture.
Learning Mixtures of Tree Graphical Models
TLDR
A novel method is proposed for estimating the mixture components with provable guarantees of discrete graphical models, where the class variable is hidden and each mixture component can have a potentially different Markov graph structure and parameters over the observed variables.
Efficient Learning for Time Series Models by Non-Negative Moment Matrix Factorization
  • Computer Science
  • 2014
TLDR
This paper develops a MoM-based approach using Non-negative Matrix Factorization (NMF) for learning several time series models, including the Mixture of HMMs (MHMM), Switching HMM, Switching SHMM and Factorial HMM.
On the method of moments for estimation in latent linear models
TLDR
This thesis introduces several semiparametric models in the topic modeling context and for multi-view models and develops moment matching-based estimation methods for the estimation in these models, which come with improved sample complexity results compared to the previously proposed methods.
Online and Distributed learning of Gaussian mixture models by Bayesian Moment Matching
TLDR
This work proposes a Bayesian learning technique that lends itself naturally to online and distributed computation and compares favorably to online EM in terms of time and accuracy on a set of data modeling benchmarks.
Learning latent variable models: efficient algorithms and applications
TLDR
This thesis extends the existing theory of methods of moments to learn models that are traditionally used to do topic modeling – like the single-topic model and Latent Dirichlet Allocation – providing improved learning techniques and comparing them with existing methods, which prove to outperform in terms of speed and learning accuracy.
Fast and Consistent Learning of Hidden Markov Models by Incorporating Non-Consecutive Correlations
TLDR
This paper proposes extending method of moments methods for HMMs by also including non-consecutive correlations in a way which does not significantly increase the computational cost (which scales linearly with the number of additional lags included).
...
...

References

SHOWING 1-10 OF 45 REFERENCES
A Spectral Algorithm for Learning Hidden Markov Models
PAC Learning Mixtures of Axis-Aligned Gaussians with No Separation Assumption
We propose and analyze a new vantage point for the learning of mixtures of Gaussians: namely, the PAC-style model of learning probability distributions introduced by Kearns et al. [13]. Here the task
The Spectral Method for General Mixture Models
TLDR
An algorithm for learning a mixture of distributions based on spectral projection is presented and it is shown that the resulting algorithm is efficient when the components of the mixture are logconcave distributions in $\Re^{n}$ whose means are separated.
Multivariate Normal Mixtures: A Fast Consistent Method of Moments
Abstract A longstanding difficulty in multivariate statistics is identifying and evaluating nonnormal data structures in high dimensions with high statistical efficiency and low search effort. Here
Efficiently learning mixtures of two Gaussians
TLDR
This work provides a polynomial-time algorithm for this problem for the case of two Gaussians in $n$ dimensions (even if they overlap), with provably minimal assumptions on theGaussians, and polynometric data requirements, and efficiently performs near-optimal clustering.
On Spectral Learning of Mixtures of Distributions
TLDR
It is proved that a very simple algorithm, namely spectral projection followed by single-linkage clustering, properly classifies every point in the sample, and there are many Gaussian mixtures such that each pair of means is separated, yet upon spectral projection the mixture collapses completely.
Settling the Polynomial Learnability of Mixtures of Gaussians
  • Ankur Moitra, G. Valiant
  • Computer Science
    2010 IEEE 51st Annual Symposium on Foundations of Computer Science
  • 2010
TLDR
This paper gives the first polynomial time algorithm for proper density estimation for mixtures of k Gaussians that needs no assumptions on the mixture, and proves that such a dependence is necessary.
Learning mixtures of arbitrary gaussians
TLDR
This paper presents the first algorithm that provably learns the component gaussians in time that is polynomial in the dimension.
Mixture densities, maximum likelihood, and the EM algorithm
TLDR
This work discusses the formulation and theoretical and practical properties of the EM algorithm, a specialization to the mixture density context of a general algorithm used to approximate maximum-likelihood estimates for incomplete data problems.
Observable Operator Models for Discrete Stochastic Time Series
  • H. Jaeger
  • Mathematics, Computer Science
    Neural Computation
  • 2000
TLDR
A novel, simple characterization of linearly dependent processes, called observable operator models, is provided, which leads to a constructive learning algorithm for the identification of linially dependent processes.
...
...