- Lukasz Debowski
- IEEE Transactions on Information Theory
- 2011

This paper presents a new interpretation for Zipf-Mandelbrot's law in natural language which rests on two areas of information theory. Firstly, we construct a new class of grammar-based codes and,… (More)

- Lukasz Debowski
- Language Resources and Evaluation
- 2009

This paper discusses two new procedures for extracting verb valences from raw texts, with an application to the Polish language. The first novel technique, the EM selection algorithm, performs… (More)

- Lukasz Debowski
- Entropy
- 2015

The article discusses two mutually-incompatible hypotheses about the stochastic mechanism of the generation of texts in natural language, which could be related to entropy. The first hypothesis, the… (More)

- Lukasz Debowski
- Journal of Quantitative Linguistics
- 2006

Hilberg (1990) supposed that finite-order excess entropy of a random human text is proportional to the square root of the text length. Assuming that Hilberg's hypothesis is true, we derive Guiraud's… (More)

- Lukasz Debowski
- Challenges in Computational Statistics and Data…
- 2016

Subword complexity is a function that describes how many different substrings of a given length are contained in a given string. In this paper, two estimators of block entropy are proposed, based on… (More)

- Lukasz Debowski
- Journal of Quantitative Linguistics
- 2015

The relaxed Hilberg conjecture states that the mutual information between two adjacent blocks of text in natural language grows as a power of the block length. The present paper reviews recent… (More)

- Lukasz Debowski
- Exact Methods in the Study of Language and Text
- 2007

The aim of this article is to develop a discussion of Menzerath’s law from the point of view of information theory. More precisely, we shall seek for links between the law and the recently abstracted… (More)

- Lukasz Debowski
- Intelligent Information Systems
- 2004

We introduce an implementation of a plain trigram part-of-speech tagger which appears to work well on Polish texts. At this moment the tagger achieves 9.4% error rate, which makes it signficantly… (More)

- Lukasz Debowski, Jan Hajic, Vladislav Kubon
- Prague Bull. Math. Linguistics
- 2002

This paper deals with a problem of an application of an MT method developed for a pair of very closely related languages to a pair of languages whose degree of relatedness (and thus also the degree… (More)

- Lukasz Debowski
- Recent Contributions to Quantitative Linguistics
- 2015

Using a new universal distribution called switch distribution, we reveal a prominent statistical difference between a text in natural language and its unigram version. For the text in natural… (More)