A Method of Moments for Mixture Models and Hidden Markov Models
@inproceedings{Anandkumar2012AMO, title={A Method of Moments for Mixture Models and Hidden Markov Models}, author={Anima Anandkumar and Daniel J. Hsu and Sham M. Kakade}, booktitle={COLT}, year={2012} }
Mixture models are a fundamental tool in applied statistics and machine learning for treating data taken from multiple subpopulations. The current practice for estimating the parameters of such models relies on local search heuristics (e.g., the EM algorithm) which are prone to failure, and existing consistent methods are unfavorable due to their high computational and sample complexity which typically scale exponentially with the number of mixture components. This work develops an efficient…Â
301 Citations
Estimating Mixture Models via Mixtures of Polynomials
- Computer ScienceNIPS
- 2015
This work presents Polymom, an unifying framework based on method of moments in which estimation procedures are easily derivable, just as in EM, and allows us to cast estimation as a Generalized Moment Problem.
Evaluation of Spectral Learning for the Identification of Hidden Markov Models
- Computer ScienceArXiv
- 2015
Learning High-Dimensional Mixtures of Graphical Models
- Computer Science, MathematicsArXiv
- 2012
This work proposes a novel approach for estimating the mixture components, and its output is a tree-mixture model which serves as a good approximation to the underlying graphical model mixture.
Learning Mixtures of Tree Graphical Models
- Computer Science, MathematicsNIPS
- 2012
A novel method is proposed for estimating the mixture components with provable guarantees of discrete graphical models, where the class variable is hidden and each mixture component can have a potentially different Markov graph structure and parameters over the observed variables.
Efficient Learning for Time Series Models by Non-Negative Moment Matrix Factorization
- Computer Science
- 2014
This paper develops a MoM-based approach using Non-negative Matrix Factorization (NMF) for learning several time series models, including the Mixture of HMMs (MHMM), Switching HMM, Switching SHMM and Factorial HMM.
Probabilistic sequence clustering with spectral learning
- Computer ScienceDigit. Signal Process.
- 2014
On the method of moments for estimation in latent linear models
- Computer Science
- 2016
This thesis introduces several semiparametric models in the topic modeling context and for multi-view models and develops moment matching-based estimation methods for the estimation in these models, which come with improved sample complexity results compared to the previously proposed methods.
Online and Distributed learning of Gaussian mixture models by Bayesian Moment Matching
- Computer ScienceArXiv
- 2016
This work proposes a Bayesian learning technique that lends itself naturally to online and distributed computation and compares favorably to online EM in terms of time and accuracy on a set of data modeling benchmarks.
Learning latent variable models: efficient algorithms and applications
- Computer Science
- 2019
This thesis extends the existing theory of methods of moments to learn models that are traditionally used to do topic modeling – like the single-topic model and Latent Dirichlet Allocation – providing improved learning techniques and comparing them with existing methods, which prove to outperform in terms of speed and learning accuracy.
Fast and Consistent Learning of Hidden Markov Models by Incorporating Non-Consecutive Correlations
- Computer ScienceICML
- 2020
This paper proposes extending method of moments methods for HMMs by also including non-consecutive correlations in a way which does not significantly increase the computational cost (which scales linearly with the number of additional lags included).
References
SHOWING 1-10 OF 45 REFERENCES
PAC Learning Mixtures of Axis-Aligned Gaussians with No Separation Assumption
- Computer ScienceArXiv
- 2006
We propose and analyze a new vantage point for the learning of mixtures of Gaussians: namely, the PAC-style model of learning probability distributions introduced by Kearns et al. [13]. Here the task…
The Spectral Method for General Mixture Models
- Computer ScienceCOLT
- 2005
An algorithm for learning a mixture of distributions based on spectral projection is presented and it is shown that the resulting algorithm is efficient when the components of the mixture are logconcave distributions in $\Re^{n}$ whose means are separated.
Multivariate Normal Mixtures: A Fast Consistent Method of Moments
- Mathematics
- 1993
Abstract A longstanding difficulty in multivariate statistics is identifying and evaluating nonnormal data structures in high dimensions with high statistical efficiency and low search effort. Here…
Efficiently learning mixtures of two Gaussians
- Computer Science, MathematicsSTOC '10
- 2010
This work provides a polynomial-time algorithm for this problem for the case of two Gaussians in $n$ dimensions (even if they overlap), with provably minimal assumptions on theGaussians, and polynometric data requirements, and efficiently performs near-optimal clustering.
On Spectral Learning of Mixtures of Distributions
- Mathematics, Computer ScienceCOLT
- 2005
It is proved that a very simple algorithm, namely spectral projection followed by single-linkage clustering, properly classifies every point in the sample, and there are many Gaussian mixtures such that each pair of means is separated, yet upon spectral projection the mixture collapses completely.
Settling the Polynomial Learnability of Mixtures of Gaussians
- Computer Science2010 IEEE 51st Annual Symposium on Foundations of Computer Science
- 2010
This paper gives the first polynomial time algorithm for proper density estimation for mixtures of k Gaussians that needs no assumptions on the mixture, and proves that such a dependence is necessary.
Learning mixtures of arbitrary gaussians
- Computer ScienceSTOC '01
- 2001
This paper presents the first algorithm that provably learns the component gaussians in time that is polynomial in the dimension.
Mixture densities, maximum likelihood, and the EM algorithm
- Computer Science
- 1984
This work discusses the formulation and theoretical and practical properties of the EM algorithm, a specialization to the mixture density context of a general algorithm used to approximate maximum-likelihood estimates for incomplete data problems.
Observable Operator Models for Discrete Stochastic Time Series
- Mathematics, Computer ScienceNeural Computation
- 2000
A novel, simple characterization of linearly dependent processes, called observable operator models, is provided, which leads to a constructive learning algorithm for the identification of linially dependent processes.