This paper describes in detail an algorithm for the unsupervised learning of natural language morphology, with emphasis on challenges that are encountered in languages typologically similar to European languages. It utilizes the Minimum Description Length analysis described in Goldsmith 2001 and has been implemented in software that is available for… (More)
In this paper we report on an exploration of noun-noun compounds in a large German corpus. The morphological parsing providing the analysis of words into stems and suffixes was entirely data-driven, in that no knowledge of Ge:man was used to determine what the correct set of stems and suffixes was, nor how to break any given word into its component… (More)
Unsupervised learning of grammar is a problem that can be important in many areas ranging from text preprocessing for information retrieval and classification to machine translation. We describe an MDL based grammar of a language that contains morphology and lexical categories. We use an unsupervised learner of morphology to bootstrap the acquisition of… (More)
This paper describes a heuristic for morpheme-and morphology-learning based on string edit distance. Experiments with a 7,000 word corpus of Swahili, a language with a rich morphology, support the effectiveness of this approach.
Within the information-theoretical frame-, pointers are used to avoid repetition of phono-logical material. Work with which we are familiar has assumed that there is only one way in which items could be pointed to. The purpose of this paper is to describe and compare several different methods , each of which satisfies MDL's basic requirements, but which… (More)