The power of amnesia: Learning probabilistic automata with variable memory length

@article{Ron1996ThePO,
  title={The power of amnesia: Learning probabilistic automata with variable memory length},
  author={Dana Ron and Yoram Singer and Naftali Tishby},
  journal={Machine Learning},
  year={1996},
  volume={25},
  pages={117-149}
}
We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the algorithm we present can efficiently learn distributions generated by PSAs. In particular, we show that for any target PSA, the KL-divergence… 

Learning Probability Distributions Generated by Finite-State Machines

We review methods for inference of probability distributions generated by probabilistic automata and related models for sequence generation. We focus on methods that can be proved to learn in the

Probabilistic Deterministic Infinite Automata

The results suggest that the probabilistic deterministic infinite automata (PDIA) presents an attractive compromise between the computational cost of hidden Markov models and the storage requirements of hierarchically smoothedMarkov models.

Learning deterministic probabilistic automata from a model checking perspective

This paper shows how to extend the basic algorithm to also learn automata models for both reactive and timed systems and establishes theoretical convergence properties for the learning algorithm as well as for probability estimates of system properties expressed in linear time temporal logic and linear continuous stochastic logic.

Probabilistic Trees and Automata for Application Behavior Modeling

Methods for inferring and using probabilistic m odels that capture characteristic pattern structures that may ex ist in symbolic data sequences that can be used for real-time data monitoring by means of a matching a l algorithm are described.

Learning Markov Decision Processes for Model Checking

An algorithm for automatically learning a deterministic labeled Markov decision process model from the observed behavior of a reactive system, adapted from algorithms for learning deterministic Probabilistic finite automata and extended to include both probabilistic and nondeterministic transitions.

Simple Variable Length N-grams for Probabilistic Automata Learning

Experiments show that, using the test sets provided by the 2012 Probabilistic Automata Learning Competition, the variable-length approach works better than fixed 3-grams.

Automata Cascades: Expressivity and Sample Complexity

The results show that one can in principle learn automata with infinite input alphabets and a number of states that is exponential in the amount of data available.

Sample Complexity of Automata Cascades

This work shows that the sample complexity of automata is linear in the number of components and the maximum complexity of a single component, modulo logarithmic fac- tors, which opens the possibility of learning large dynamical systems consisting of many parts interacting with each other.

Lower Bounds for Learning Discrete

A class of eeciently learnable distributions which has the following interesting property: while a computationally unbounded learning algorithm can learn the class from O(1) examples, the computational sample complexity of the class is essentially (1==) in the sense that any polynomial-time learning algorithm must use at least this many examples.

The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

This paper takes a decision theoretic view of PSTs for the task of sequence prediction and presents an online PST learning algorithm that generates a bounded-depth PST while being competitive with any fixed PST determined in hindsight.
...

References

SHOWING 1-10 OF 48 REFERENCES

On the learnability and usage of acyclic probabilistic finite automata

It is proved that the algorithm proposed can efficiently learn distributions generated by the subclass of APFAs it considers, and it is shown that the KL-divergence between the distributiongenerated by the target source and the distribution generated by the authors' hypothesis can be made arbitrarily small with high confidence in polynomial time.

Efficient learning of typical finite automata from random walks

The main contribution of this paper is in presenting the first efficient algorithms for learning nontrivial classes of automata in an entirely passive learning model.

The Power of Amnesia

The algorithm is based on minimizing the statistical prediction error by extending the memory, or state length, adaptively, until the total prediction error is sufficiently small and using less than 3000 states the model's performance is far superior to that of fixed memory models with similar number of states.

On the computational complexity of approximating distributions by probabilistic automata

We introduce a rigorous performance criterion for training algorithms for probabilistic automata (PAs) and hidden Markov models (HMMs), used extensively for speech recognition, and analyze the

On the learnability of discrete distributions

A new model of learning probability distributions from independent draws is introduced, inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled examples, in the sense that it emphasizes efficient and approximate learning, and it studies the learnability of restricted classes of target distributions.

Inference and minimization of hidden Markov chains

It is proved that inference is hard: any algorithm for inference must make exponentially many oracle calls, and from this there follows a new algorithm for equivalence of hmc's.

Learning decision trees using the Fourier spectrum

The authors demonstrate that any functionf whose L -norm is polynomial can be approximated by a polynomially sparse function, and prove that boolean decision trees with linear operations are a subset of this class of functions.

Discrete Sequence Prediction and Its Applications

This work presents a simple and practical algorithm (TDAG) for discrete sequence prediction based on a text-compression method that limits the growth of storage by retaining the most likely prediction contexts and discarding less likely ones.

Applications of DAWGs to data compression

This paper presents two algorithms for string compression using DAWGs, the first is a very simple idea which generalizes run-length coding, but is provably non-optimal, and the second combines the main idea of the first with arithmetic coding, resulting in a great improvement in performance.

Markov Source Modeling of Text Generation

A language model is a conceptual device which, given a string of past words, provides an estimate of the probability that any given word from an allowed vocabulary will follow the string. In speech