• Corpus ID: 4693805

Hilbert Space Embeddings of Hidden Markov Models

@inproceedings{Song2010HilbertSE,
  title={Hilbert Space Embeddings of Hidden Markov Models},
  author={Le Song and Byron Boots and Sajid M. Siddiqi and Geoffrey J. Gordon and Alex Smola},
  booktitle={ICML},
  year={2010}
}
Hidden Markov Models (HMMs) are important tools for modeling sequence data. However, they are restricted to discrete latent states, and are largely restricted to Gaussian and discrete observations. And, learning algorithms for HMMs have predominantly relied on local search heuristics, with the exception of spectral methods such as those described below. We propose a nonparametric HMM that extends traditional HMMs to structured and non-Gaussian continuous distributions. Furthermore, we derive a… 

Figures from this paper

Using Regression for Spectral Estimation of HMMs
TLDR
This work introduces a new spectral model for HMM estimation, a corresponding spectral bilinear regression model, and systematically compare them with a variety of competing simplified models, explaining when and why each method gives superior performance.
A Spectral Algorithm for Learning Hidden Markov Models
Regularization of Hidden Markov Models Embedded into Reproducing Kernel Hilbert Space
TLDR
An approach to extend the application of HMMs to non-Gaussian continuous distributions by embedding the belief about the state into a reproducing kernel Hilbert space (RKHS), and reduce tendency to overfitting and computational complexity of algorithm by means of various regularization techniques, specifically, Nystrom subsampling is discussed.
Spectral dimensionality reduction for HMMs
TLDR
This work provides a new spectral method which significantly reduces the number of model parameters that need to be estimated, and generates a sample complexity that does not depend on the size of the observation vocabulary.
On learning parametric-output HMMs
We present a novel approach for learning an HMM whose outputs are distributed according to a parametric family. This is done by {\em decoupling} the learning task into two steps: first estimating the
Discriminative spectral learning of hidden markov models for human activity recognition
TLDR
The spectral learning of HMMs is extended, a moment matching learning technique free from local maxima, to discriminative HMMs, and the resulting method provides the posterior probabilities of the classes without explicitly determining the HMM parameters, and is able to deal with missing labels.
Identification of hidden Markov models using spectral learning with likelihood maximization
TLDR
This paper proposes a two-step procedure that combines spectral learning with a single Newton-like iteration for maximum likelihood estimation and demonstrates an improved statistical performance using the proposed algorithm in numerical simulations.
Spectral Learning of Hidden Markov Models
The Hidden Markov Model is the most fundamental model of partially observable uncontrolled systems in Machine Learning. The state of the art over the last years was the Baum-Welch algorithm which
Learning HMMs with Nonparametric Emissions via Spectral Decompositions of Continuous Matrices
TLDR
This paper studies the estimation of an $m$-state hidden Markov model (HMM) with only smoothness assumptions, such as Holderian conditions, on the emission densities and develops a computationally efficient spectral algorithm for learning nonparametric HMMs.
Spectral estimation of hidden Markov models
TLDR
It is shown that spectral estimation of hidden Markov models can be factored into two major componentsestimation of the hidden state space dynamics, and estimation of the observation probability distributions, which leads to extremely flexible estimation procedures that can be tailored precisely for the task of interest.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 28 REFERENCES
A Spectral Algorithm for Learning Hidden Markov Models
Reduced-Rank Hidden Markov Models
TLDR
This paper proves a tighter nite-sample error bound for the case of Reduced-Rank HMMs, i.e., HMMs with low-rank transition matrices, and generalizes the algorithm and bounds to models where multiple observations are needed to disambiguate state, and to models that emit multivariate real-valued observations.
Observable Operator Models for Discrete Stochastic Time Series
  • H. Jaeger
  • Mathematics, Computer Science
    Neural Computation
  • 2000
TLDR
A novel, simple characterization of linearly dependent processes, called observable operator models, is provided, which leads to a constructive learning algorithm for the identification of linially dependent processes.
Automatic state discovery for unstructured audio scene classification
TLDR
A novel scheme for unstructured audio scene classification that possesses three highly desirable and powerful features: autonomy, scalability, and robustness that has proven to be highly effective for building real-world applications and has been integrated into a commercial surveillance system as an event detection component.
Hilbert space embeddings of conditional distributions with applications to dynamical systems
TLDR
This paper derives a kernel estimate for the conditional embedding, and shows its connection to ordinary embeddings, and aims to derive a nonparametric method for modeling dynamical systems where the belief state of the system is maintained as a conditional embeddedding.
Random Features for Large-Scale Kernel Machines
TLDR
Two sets of random features are explored, provided convergence bounds on their ability to approximate various radial basis kernels, and it is shown that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large- scale kernel machines.
A Hilbert Space Embedding for Distributions
We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert
A general regression technique for learning transductions
TLDR
A novel and conceptually cleaner formulation of kernel dependency estimation provides a simple framework for estimating the regression coefficients, and an efficient algorithm for computing the pre-image from the regression coefficient extends the applicability of Kernel dependency estimation to output sequences.
Injective Hilbert Space Embeddings of Probability Measures
TLDR
This work considers more broadly the problem of specifying characteristic kernels, defined as kernels for which the RKHS embedding of probability measures is injective, and restricts ourselves to translation-invariant kernels on Euclidean space.
Large Margin Methods for Structured and Interdependent Output Variables
TLDR
This paper proposes to appropriately generalize the well-known notion of a separation margin and derive a corresponding maximum-margin formulation and presents a cutting plane algorithm that solves the optimization problem in polynomial time for a large class of problems.
...
1
2
3
...