# Asymptotically Normal Estimators for Zipf’s Law

@article{Chebunin2018AsymptoticallyNE, title={Asymptotically Normal Estimators for Zipf’s Law}, author={Mikhail Chebunin and Artyom P. Kovalevskii}, journal={Sankhya A}, year={2018} }

Zipf's law states that sequential frequencies of words in a text correspond to a power function. Its probabilistic model is an infinite urn scheme with asymptotically power distribution. The exponent of this distribution must be estimated. We use the number of different words in a text and similar statistics to construct asymptotically normal estimators of the exponent.

## 3 Citations

A statistical test for the Zipf's law by deviations from the Heaps' law

- Mathematics
- 2017

We explore a probabilistic model of an artistic text: words of the text are chosen independently of each other in accordance with a discrete probability distribution on an infinite dictionary. The…

A statistical test for correspondence of texts to the Zipf-Mandelbrot law

- Computer ScienceSibirskie Elektronnye Matematicheskie Izvestiya
- 2020

An algorithm for approximate calculation of eigenvalues of the covariance function of the limit Gaussian process, and then an algorithm for calculating the probability distribution of the integral of the square of this process are developed and implemented.

Modifications of Simon text model

- Mathematics
- 2020

We discuss probability text models and its modifications. We have proved a theorem on the convergence of a multidimensional process of the number of urns containing a fixed number of balls in the…

## References

SHOWING 1-10 OF 24 REFERENCES

Rare Probability Estimation under Regularly Varying Heavy Tails

- Mathematics, Computer ScienceCOLT
- 2012

This paper uses Karamata’s theory of regular variation to prove that regularly varying heavy tails are sufficient for consistency, and derives a family of estimators which, in addition to being consistent, address some of the shortcomings of the Good-Turing estimator.

Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws ∗

- Mathematics
- 2007

This paper collects facts about the number of occupied boxes in the classical balls-in-boxes occupancy scheme with infinitely many positive frequencies: equivalently, about the number of species…

LOCAL LIMIT THEOREMS FOR FINITE AND INFINITE URN MODELS

- Mathematics
- 2008

Local limit theorems are derived for the number of occupied urns in general finite and infinite urn models under the minimum condition that the variance tends to infinity. Our results represent an…

Bounds on Random Infinite Urn Model

- Mathematics
- 2007

Let N(n) be a Poisson random variable with parameter n. An in- finite urn model is defined as follows: N(n) balls are independently placed in an infinite set of urns and each ball has probability pk…

Divergence rates for the number of rare numbers

- Mathematics
- 1996

Suppose thatX1,X2, ... is a sequence of i.i.d. random variables taking value inZ+. Consider the random sequenceA(X)≡(X1,X2,...). LetYn be the number of integers which appear exactly once in the…

Gaps in Discrete Random Samples

- MathematicsJournal of Applied Probability
- 2009

Let (X i ) i∈ℕ be a sequence of independent and identically distributed random variables with values in the set ℕ0 of nonnegative integers. Motivated by applications in enumerative combinatorics and…

Rare numbers

- Mathematics
- 1992

Suppose thatX1,X2,... is a sequence of iid random variables taking values inZ+. Consider the random sequenceA(X)≡(X1,X2,...). LetYn be the number of integers which appear exactly once in the firstn…

Small counts in the infinite occupancy scheme

- Mathematics
- 2008

The paper is concerned with the classical occupancy scheme in which balls are thrown independently into infinitely many boxes, with given probability of hitting each of the boxes. We establish joint…

Univariate approximations in the infinite occupancy scheme

- Mathematics
- 2009

The paper concerns the classical occupancy scheme with infinitely many boxes. We establish approximations to the distributions of the number of occupied boxes, and of the number of boxes containing…