Asymptotically Normal Estimators for Zipf’s Law

  title={Asymptotically Normal Estimators for Zipf’s Law},
  author={Mikhail Chebunin and Artyom P. Kovalevskii},
  journal={Sankhya A},
Zipf's law states that sequential frequencies of words in a text correspond to a power function. Its probabilistic model is an infinite urn scheme with asymptotically power distribution. The exponent of this distribution must be estimated. We use the number of different words in a text and similar statistics to construct asymptotically normal estimators of the exponent. 
A statistical test for the Zipf's law by deviations from the Heaps' law
We explore a probabilistic model of an artistic text: words of the text are chosen independently of each other in accordance with a discrete probability distribution on an infinite dictionary. The
A statistical test for correspondence of texts to the Zipf-Mandelbrot law
An algorithm for approximate calculation of eigenvalues of the covariance function of the limit Gaussian process, and then an algorithm for calculating the probability distribution of the integral of the square of this process are developed and implemented.
Modifications of Simon text model
We discuss probability text models and its modifications. We have proved a theorem on the convergence of a multidimensional process of the number of urns containing a fixed number of balls in the


Rare Probability Estimation under Regularly Varying Heavy Tails
This paper uses Karamata’s theory of regular variation to prove that regularly varying heavy tails are sufficient for consistency, and derives a family of estimators which, in addition to being consistent, address some of the shortcomings of the Good-Turing estimator.
Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws ∗
This paper collects facts about the number of occupied boxes in the classical balls-in-boxes occupancy scheme with infinitely many positive frequencies: equivalently, about the number of species
Local limit theorems are derived for the number of occupied urns in general finite and infinite urn models under the minimum condition that the variance tends to infinity. Our results represent an
Bounds on Random Infinite Urn Model
Let N(n) be a Poisson random variable with parameter n. An in- finite urn model is defined as follows: N(n) balls are independently placed in an infinite set of urns and each ball has probability pk
Divergence rates for the number of rare numbers
Suppose thatX1,X2, ... is a sequence of i.i.d. random variables taking value inZ+. Consider the random sequenceA(X)≡(X1,X2,...). LetYn be the number of integers which appear exactly once in the
Gaps in Discrete Random Samples
Let (X i ) i∈ℕ be a sequence of independent and identically distributed random variables with values in the set ℕ0 of nonnegative integers. Motivated by applications in enumerative combinatorics and
Rare numbers
Suppose thatX1,X2,... is a sequence of iid random variables taking values inZ+. Consider the random sequenceA(X)≡(X1,X2,...). LetYn be the number of integers which appear exactly once in the firstn
Small counts in the infinite occupancy scheme
The paper is concerned with the classical occupancy scheme in which balls are thrown independently into infinitely many boxes, with given probability of hitting each of the boxes. We establish joint
Univariate approximations in the infinite occupancy scheme
The paper concerns the classical occupancy scheme with infinitely many boxes. We establish approximations to the distributions of the number of occupied boxes, and of the number of boxes containing