# Estimation of Markov Chain via Rank-constrained Likelihood

@article{Li2018EstimationOM, title={Estimation of Markov Chain via Rank-constrained Likelihood}, author={Xudong Li and Mengdi Wang and Anru R. Zhang}, journal={ArXiv}, year={2018}, volume={abs/1804.00795} }

This paper studies the estimation of low-rank Markov chains from empirical trajectories. We propose a non-convex estimator based on rank-constrained likelihood maximization. Statistical upper bounds are provided for the Kullback-Leiber divergence and the $\ell_2$ risk between the estimator and the true transition matrix. The estimator reveals a compressed state space of the Markov chain. We also develop a novel DC (difference of convex function) programming algorithm to tackle the rank…

## 12 Citations

Maximum Likelihood Tensor Decomposition of Markov Decision Process

- Computer Science2019 IEEE International Symposium on Information Theory (ISIT)
- 2019

A tensor-rank-constrained maximum likelihood estimator is developed and proved to prove statistical upper bounds of the Kullback-Leiber divergence error and the ℓ2 error between the estimated model and true model.

State Compression of Markov Processes via Empirical Low-Rank Estimation

- Computer ScienceArXiv
- 2018

A spectral method is proposed for estimating the frequency and transition matrices, estimating the compressed state spaces, and recovering the state aggregation structure if there is any and upper bounds for the estimation and recovery errors are provided and matching minimax lower bounds are provided.

Adaptive Low-Nonnegative-Rank Approximation for State Aggregation of Markov Chains

- Computer ScienceSIAM J. Matrix Anal. Appl.
- 2020

This paper develops a low-nonnegative-rank approximation method to identify the state aggregation structure of a finite-state Markov chain under an assumption that the state space can be mapped into…

Spectral thresholding for the estimation of Markov chain transition operators

- MathematicsElectronic Journal of Statistics
- 2021

We consider nonparametric estimation of the transition operator $P$ of a Markov chain and its transition density $p$ where the singular values of $P$ are assumed to decay exponentially fast. This is…

Spectral State Compression of Markov Processes

- Computer Science, MathematicsIEEE Transactions on Information Theory
- 2020

Model reduction of Markov processes is a basic problem in modeling state-transition systems. Motivated by the state aggregation approach rooted in control theory, we study the statistical state…

Identifying Low-Dimensional Structures in Markov Chains: A Nonnegative Matrix Factorization Approach

- Computer ScienceArXiv
- 2019

The task of representation learning is formulated as that of mapping the state space of the model to a low-dimensional state space, referred to as the kernel space, which contains a set of meta states which are desired to be representative of only a small subset of original states.

State Aggregation Learning from Markov Transition Data

- Computer ScienceNeurIPS
- 2019

A tractable algorithm that estimates the probabilistic aggregation map from the system's trajectory and generates a data-driven state aggregation map with nice interpretations, and proves sharp error bounds for estimating the aggregation and disaggregation distributions and for identifying anchor states.

Statistical inference in high-dimensional matrix models

- Mathematics, Computer Science
- 2020

This thesis exemplarily considers three matrix models, matrix completion, Principal Component Analysis (PCA) with Gaussian data and transition operators of Markov chains, and investigates the existence of adaptive confidence sets in the ’Bernoulli’ and ’trace-regression’ models.

Learning low-dimensional state embeddings and metastable clusters from time series data

- Computer ScienceNeurIPS
- 2019

This paper studies how to find compact state embeddings from high-dimensional Markov state trajectories, where the transition kernel has a small intrinsic rank and Sharp statistical error bounds and misclassification rate are proved.

Can Agents Learn by Analogy? An Inferable Model for PAC Reinforcement Learning

- Computer ScienceAAMAS
- 2020

A new model-based method called Greedy Inference Model (GIM) is proposed that infers the unknown dynamics from known dynamics based on the internal spectral properties of the environment, which means GIM can "learn by analogy".

## References

SHOWING 1-10 OF 57 REFERENCES

Model reduction of Markov chains via low-rank approximation

- Computer Science, Mathematics2012 American Control Conference (ACC)
- 2012

A nuclear-norm regularized optimization problem is proposed for model reduction for Markov chain models, in which the Kullback-Leibler divergence rate is used to measure the similarity between two Markov chains, and the nuclear norm is use to approximate the rank function.

State Compression of Markov Processes via Empirical Low-Rank Estimation

- Computer ScienceArXiv
- 2018

A spectral method is proposed for estimating the frequency and transition matrices, estimating the compressed state spaces, and recovering the state aggregation structure if there is any and upper bounds for the estimation and recovery errors are provided and matching minimax lower bounds are provided.

Optimal Kullback-Leibler Aggregation via Spectral Theory of Markov Chains

- Mathematics, Computer ScienceIEEE Transactions on Automatic Control
- 2011

This paper shows that for a certain relaxation of the bi-partition model reduction problem, the solution is shown to be characterized by an associated eigenvalue problem, closely related to the Markov spectral theory for model reduction.

Minimax Estimation of Discrete Distributions Under $\ell _{1}$ Loss

- Mathematics, Computer ScienceIEEE Transactions on Information Theory
- 2015

This work provides tight upper and lower bounds on the maximum risk of the empirical distribution, and the minimax risk in regimes where the support size S may grow with the number of observations n, and shows that a hard-thresholding estimator oblivious to the unknown upper bound H, is essentially minimax.

Spectral State Compression of Markov Processes

- Computer Science, MathematicsIEEE Transactions on Information Theory
- 2020

Model reduction of Markov processes is a basic problem in modeling state-transition systems. Motivated by the state aggregation approach rooted in control theory, we study the statistical state…

Restricted strong convexity and weighted matrix completion: Optimal bounds with noise

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2012

The matrix completion problem under a form of row/column weighted entrywise sampling is considered, including the case of uniformentrywise sampling as a special case, and it is proved that with high probability, it satisfies a forms of restricted strong convexity with respect to weighted Frobenius norm.

Near-optimal stochastic approximation for online principal component estimation

- Computer ScienceMath. Program.
- 2018

A nearly optimal finite-sample error bound is proved for the first time for the online PCA algorithm under the subgaussian assumption, and it is shown that the finite- sample error bound closely matches the minimax information lower bound.

A Majorized Penalty Approach for Calibrating Rank Constrained Correlation Matrix Problems

- Mathematics, Computer Science
- 2010

This paper first considers a penalized version of this problem and applies the essential ideas of the majorization method to the penalized problem by solving iteratively a sequence of least squares correlation matrix problems without the rank constraint.

Diffusion Approximations for Online Principal Component Estimation and Global Convergence

- MathematicsNIPS
- 2017

The diffusion approximation tools are adopted to study the dynamics of Oja's iteration which is an online stochastic gradient method for the principal component analysis and it is shown that the Ojas iteration for the top eigenvector generates a continuous-state discrete-time Markov chain over the unit sphere.

Stochastic DCA for the Large-sum of Non-convex Functions Problem and its Application to Group Variable Selection in Classification

- Computer Science, MathematicsICML
- 2017

A stochastic version of DCA (Difference of Convex functions Algorithm) is presented to solve a class of optimization problems whose objective function is a large sum of nonconveX functions and a regularization term.