Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction

Abstract

We introduce an entropic prior for multinomial parameter estimation problems and solve for its maximum a posteriori (MAP) estimator. The prior is a bias for maximally structured and minimally ambiguous models. In conditional probability models with hidden state, iterative MAP estimation drives weakly supported parameters toward extinction , effectively turning them off. Thus structure discovery is folded into parameter estimation. We then establish criteria for simplifying a probabilistic model's graphical structure by trimming parameters and states, with a guarantee that any such deletion will increase the posterior probability of the model. Trimming accelerates learning by sparsifying the model. All operations monotonically and maximally increase the posterior probability, yielding structure-learning algorithms only slightly slower than parameter estimation via expectation-maximization (EM), and orders of magnitude faster than search-based structure induction. When applied to hidden Markov model (HMM) training, the resulting models show superior generalization to held-out test data. In many cases the resulting models are so sparse and concise that they are interpretable, with hidden states that strongly correlate with meaningful categories. This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Information Technology Center America; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Information Technology Center America. All rights reserved. Abstract We introduce an entropic prior for multinomial parameter estimation problems and solve for its maximum a posteriori (MAP) estimator. The prior is a bias for maximally structured and minimally ambiguous models. In conditional probability models with hidden state, iterative MAP estimation drives weakly supported parameters toward extinction, effectively turning them off. Thus structure discovery is folded into parameter estimation. We then establish criteria for simplifying a probabilistic model's graphical structure by trimming parameters and states, with a guarantee that any such deletion will increase the posterior probability of the model. Trimming accelerates learning by sparsifying the model. All operations monotonically and maximally increase the posterior probability, yielding structure-learning algorithms only slightly slower than parameter estimation via expectation-maximization (EM), and orders of …

DOI: 10.1162/089976699300016395

Extracted Key Phrases

12 Figures and Tables

Showing 1-10 of 44 references

Entropic estimation blends continuous and discrete optimization

  • Brand, M Brand
  • 1998

UCI repository of machine learning databases

  • Merz, Murphy Merz, C Murphy
  • 1998

A wearablecomputer based American sign language recognizer

  • Starner, Pentland, T Starner, A P Pentland
  • 1997
Showing 1-10 of 96 extracted citations

Statistics

01020'99'01'03'05'07'09'11'13'15'17
Citations per Year

176 Citations

Semantic Scholar estimates that this publication has received between 124 and 252 citations based on the available data.

See our FAQ for additional information.