Heiki-Jaan Kaalep

Learn More
The EU Copernicus project Multext-East has created a multilingual corpus of text and speech data, covering the six languages of the project: lexicons for each of the languages were developed. The corpus includes a parallel component consisting of Orwell's Nineteen Eighty-Four, with versions in all six languages tagged for part-of-speech and aligned to(More)
The paper describes a morphological analyser for Estonian and how using a text corpus influenced the process of creating it and the resulting program itself. The influence is not limited with the lexicon only, but is noticeable in the resulting algorithm and implementation too. When work on the analyser started, there was no computational treatment of(More)
This paper describes the experiments that apply phrase-based statistical machine translation to Estonian. The work has two main aims: the first one is to define the main problems in the output of Estonian-English statistical machine translation and set a baseline for further experiments with this language pair. The second is to compare the two available(More)
This paper describes automatic treatment of multi-word expressions in a morphologically complex flective language – Estonian. It focuses on a special type of multi-word expressions – the verbal multi-word expressions that can function as predicates. Authors describe two language resources – a database of verbal multi-word expressions and a corpus where(More)