# The power of amnesia: Learning probabilistic automata with variable memory length

@article{Ron1996ThePO, title={The power of amnesia: Learning probabilistic automata with variable memory length}, author={Dana Ron and Yoram Singer and Naftali Tishby}, journal={Machine Learning}, year={1996}, volume={25}, pages={117-149} }

We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the algorithm we present can efficiently learn distributions generated by PSAs. In particular, we show that for any target PSA, the KL-divergence…

## 500 Citations

### Learning Probability Distributions Generated by Finite-State Machines

- Computer Science
- 2016

We review methods for inference of probability distributions generated by probabilistic automata and related models for sequence generation. We focus on methods that can be proved to learn in the…

### Probabilistic Deterministic Infinite Automata

- Computer ScienceNIPS
- 2010

The results suggest that the probabilistic deterministic infinite automata (PDIA) presents an attractive compromise between the computational cost of hidden Markov models and the storage requirements of hierarchically smoothedMarkov models.

### Learning deterministic probabilistic automata from a model checking perspective

- Computer ScienceMachine Learning
- 2016

This paper shows how to extend the basic algorithm to also learn automata models for both reactive and timed systems and establishes theoretical convergence properties for the learning algorithm as well as for probability estimates of system properties expressed in linear time temporal logic and linear continuous stochastic logic.

### Probabilistic Trees and Automata for Application Behavior Modeling

- Computer Science
- 2003

Methods for inferring and using probabilistic m odels that capture characteristic pattern structures that may ex ist in symbolic data sequences that can be used for real-time data monitoring by means of a matching a l algorithm are described.

### Learning Markov Decision Processes for Model Checking

- Computer ScienceQFM
- 2012

An algorithm for automatically learning a deterministic labeled Markov decision process model from the observed behavior of a reactive system, adapted from algorithms for learning deterministic Probabilistic finite automata and extended to include both probabilistic and nondeterministic transitions.

### Simple Variable Length N-grams for Probabilistic Automata Learning

- Computer ScienceICGI
- 2012

Experiments show that, using the test sets provided by the 2012 Probabilistic Automata Learning Competition, the variable-length approach works better than fixed 3-grams.

### Automata Cascades: Expressivity and Sample Complexity

- Computer Science
- 2022

The results show that one can in principle learn automata with inﬁnite input alphabets and a number of states that is exponential in the amount of data available.

### Sample Complexity of Automata Cascades

- Computer ScienceArXiv
- 2022

This work shows that the sample complexity of automata is linear in the number of components and the maximum complexity of a single component, modulo logarithmic fac- tors, which opens the possibility of learning large dynamical systems consisting of many parts interacting with each other.

### Lower Bounds for Learning Discrete

- Computer Science, Mathematics
- 2007

A class of eeciently learnable distributions which has the following interesting property: while a computationally unbounded learning algorithm can learn the class from O(1) examples, the computational sample complexity of the class is essentially (1==) in the sense that any polynomial-time learning algorithm must use at least this many examples.

### The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

- Computer ScienceNIPS
- 2004

This paper takes a decision theoretic view of PSTs for the task of sequence prediction and presents an online PST learning algorithm that generates a bounded-depth PST while being competitive with any fixed PST determined in hindsight.

## References

SHOWING 1-10 OF 48 REFERENCES

### On the learnability and usage of acyclic probabilistic finite automata

- Computer ScienceCOLT '95
- 1995

It is proved that the algorithm proposed can efficiently learn distributions generated by the subclass of APFAs it considers, and it is shown that the KL-divergence between the distributiongenerated by the target source and the distribution generated by the authors' hypothesis can be made arbitrarily small with high confidence in polynomial time.

### Efficient learning of typical finite automata from random walks

- Computer ScienceSTOC
- 1993

The main contribution of this paper is in presenting the first efficient algorithms for learning nontrivial classes of automata in an entirely passive learning model.

### The Power of Amnesia

- Computer ScienceNIPS
- 1993

The algorithm is based on minimizing the statistical prediction error by extending the memory, or state length, adaptively, until the total prediction error is sufficiently small and using less than 3000 states the model's performance is far superior to that of fixed memory models with similar number of states.

### On the computational complexity of approximating distributions by probabilistic automata

- Computer ScienceMachine Learning
- 2004

We introduce a rigorous performance criterion for training algorithms for probabilistic automata (PAs) and hidden Markov models (HMMs), used extensively for speech recognition, and analyze the…

### On the learnability of discrete distributions

- Computer ScienceSTOC '94
- 1994

A new model of learning probability distributions from independent draws is introduced, inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled examples, in the sense that it emphasizes efficient and approximate learning, and it studies the learnability of restricted classes of target distributions.

### Inference and minimization of hidden Markov chains

- Computer Science, MathematicsCOLT '94
- 1994

It is proved that inference is hard: any algorithm for inference must make exponentially many oracle calls, and from this there follows a new algorithm for equivalence of hmc's.

### Learning decision trees using the Fourier spectrum

- Computer Science, MathematicsSTOC '91
- 1991

The authors demonstrate that any functionf whose L -norm is polynomial can be approximated by a polynomially sparse function, and prove that boolean decision trees with linear operations are a subset of this class of functions.

### Discrete Sequence Prediction and Its Applications

- Computer ScienceMachine Learning
- 2004

This work presents a simple and practical algorithm (TDAG) for discrete sequence prediction based on a text-compression method that limits the growth of storage by retaining the most likely prediction contexts and discarding less likely ones.

### Applications of DAWGs to data compression

- Computer Science
- 1990

This paper presents two algorithms for string compression using DAWGs, the first is a very simple idea which generalizes run-length coding, but is provably non-optimal, and the second combines the main idea of the first with arithmetic coding, resulting in a great improvement in performance.

### Markov Source Modeling of Text Generation

- Computer Science
- 1985

A language model is a conceptual device which, given a string of past words, provides an estimate of the probability that any given word from an allowed vocabulary will follow the string. In speech…