Computing entropy rate of symbol sources & a distribution-free limit theorem

  title={Computing entropy rate of symbol sources \& a distribution-free limit theorem},
  author={Ishanu Chattopadhyay and Hod Lipson},
  journal={2014 48th Annual Conference on Information Sciences and Systems (CISS)},
  • I. Chattopadhyay, Hod Lipson
  • Published 3 January 2014
  • Mathematics, Computer Science
  • 2014 48th Annual Conference on Information Sciences and Systems (CISS)
Entropy rate of sequential data-streams naturally quantifies the complexity of the generative process. Thus entropy rate fluctuations could be used as a tool to recognize dynamical perturbations in signal sources, and could potentially be carried out without explicit background noise characterization. However, state of the art algorithms to estimate the entropy rate have markedly slow convergence; making such entropic approaches non-viable in practice. We present here a fundamentally new… 
A high probability bound on the mutual information across an observed Discrete Memoryless Channel
  • M. A. Tope, Joel M. Morris
  • Mathematics, Computer Science
    2015 49th Annual Conference on Information Sciences and Systems (CISS)
  • 2015
Several new high probability bounds on the average mutual information, I (X; Y) between the input and output of a Discrete Memoryless Channel (DMC) based on a set of observed input-output samples are introduced.
Causality Networks
This work presents a new non-parametric test of Granger causality for quantized or symbolic data streams generated by ergodic stationary sources, and makes precise and computes the degree of causal dependence between data streams, without making any restrictive assumptions, linearity or otherwise.
A Tamper-Free Semi-Universal Communication System for Deletion Channels
A theoretical framework based on probabilistic finite-state automata is developed to define novel encoding and decoding schemes that ensure small error probability in both message decoding as well as tamper detecting.
Universal risk phenotype of US counties for flu-like transmission to improve county-specific COVID-19 incidence forecasts
This study demonstrates that knowledge of past epidemics may be used to chart the course of future ones, if transmission mechanisms are broadly similar, despite distinct disease processes and causative pathogens.


Entropy estimation of symbol sequences.
Algorithms for estimating the Shannon entropy h of finite symbol sequences with long range correlations are considered, and a scaling law is proposed for extrapolation from finite sample lengths.
Compression of individual sequences via variable-rate coding
The proposed concept of compressibility is shown to play a role analogous to that of entropy in classical information theory where one deals with probabilistic ensembles of sequences rather than with individual sequences.
Estimating the information content of symbol sequences and efficient codes
  • P. Grassberger
  • Mathematics, Computer Science
    IEEE Trans. Inf. Theory
  • 1989
Several variants of an algorithm for estimating Shannon entropies of symbol sequences are presented, which seem to be the optimal algorithms for sequences with strong long-range correlations, e.g. natural languages.
Numerical Methods for Computing Stationary Distributions of Finite Irreducible Markov Chains
In this chapter our attention will be devoted to computational methods for computing stationary distributions of finite irreducible Markov chains. We let q ij denote the rate at which an n-state
The sliding-window Lempel-Ziv algorithm is asymptotically optimal
The sliding-window version of the Lempel-Ziv data-compression algorithm is described, and it is shown that as the "window size," a quantity related to the memory and complexity of the procedure, goes to infinity, the compression rate approaches the source entropy.
Elements of Information Theory
The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
PAC-Learning of Markov Models with Hidden State
This work proposes a new PAC framework for learning both the topology and the parameters in partially observable Markov models, and learns a Probabilistic Deterministic Finite Automata (PDFA) which approximates a Hidden Markov Model (HMM) up to some desired degree of accuracy.
A note on Kolmogorov complexity and entropy
  • Y. Horibe
  • Mathematics, Computer Science
    Appl. Math. Lett.
  • 2003
It is shown that the Kolmogorov complexity per symbol of an n-sequence from a stationary ergodic source of finite alphabet approaches the entropy rate of the source in probability as n becomes large.
Probabilistic finite-state machines - part I
The relation of probabilistic finite-state automata with other well-known devices that generate strings as hidden Markov models and n-grams is studied and theorems, algorithms, and properties that represent a current state of the art of these objects are provided.
Abductive learning of quantized stochastic processes with probabilistic finite automata
  • I. Chattopadhyay, Hod Lipson
  • Mathematics, Medicine
    Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
  • 2013
We present an unsupervised learning algorithm (GenESeSS) to infer the causal structure of quantized stochastic processes, defined as stochastic dynamical systems evolving over discrete time, and