• Publications
  • Influence
Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0
TLDR
The first public version of the Morfessor software is described, which is a program that takes as input a corpus of unannotated text and produces a segmentation of the word forms observed in the text. Expand
Unsupervised Discovery of Morphemes
TLDR
Two methods for unsupervised segmentation of words into morpheme-like units are presented based on the Minimum Description Length (MDL) principle and Maximum Likelihood (ML) optimization is used. Expand
INDUCING THE MORPHOLOGICAL LEXICON OF A NATURAL LANGUAGE FROM UNANNOTATED TEXT
TLDR
An algorithm for the unsupervised learning, or induction, of a simple morphology of a natural language, which builds hierarchical representations for a set of morphs, which are morpheme-like units discovered from unannotated text corpora. Expand
Unsupervised models for morpheme segmentation and morphology learning
TLDR
Morfessor can handle highly inflecting and compounding languages where words can consist of lengthy sequences of morphemes and is shown to perform very well compared to a widely known benchmark algorithm on Finnish data. Expand
Unsupervised models for morpheme segmentation and morphology learning
Induction of the morphology of natural language : unsupervised morpheme segmentation with application to automatic speech recognition
TLDR
The main objective of this thesis is to devise a method that discovers the likely locations of the morpheme boundaries in words of any language by learning a simple model of concatenative morphology (word forming) in an unsupervised manner from plain text. Expand
Morph-based speech recognition and modeling of out-of-vocabulary words across languages
TLDR
It is shown that the morph models do perform fairly well on OOVs without compromising the recognition accuracy on in-vocabulary words, and the Arabic experiment constitutes the only exception since here the standard word model outperforms the morph model. Expand
Unsupervised Segmentation of Words Using Prior Distributions of Morph Length and Frequency
We present a language-independent and unsupervised algorithm for the segmentation of words into morphs. The algorithm is based on a new generative probabilistic model, which makes use of relevantExpand
Unlimited vocabulary speech recognition with morph language models applied to Finnish
TLDR
This article presents a language-independent algorithm for discovering word fragments in an unsupervised manner from text that uses the Minimum Description Length principle to find an inventory of word fragments that is compact but models the training text effectively. Expand
Morphology-Aware Statistical Machine Translation Based on Morphs Induced in an Unsupervised Manner
TLDR
The proposed morph-based solution has clear benefits, as morpho logically well motivated structures (phrases) are learned, and the proportion of words left untranslated is clearly reduced. Expand
...
1
2
3
4
5
...