Learn More
The goal of Morpho Challenge 2009 was to evaluate unsuper-vised algorithms that provide morpheme analyses for words in different languages and in various practical applications. Morpheme analysis is particularly useful in speech recognition, information retrieval and machine translation for morphologically rich languages where the amount of different word(More)
This paper presents the evaluation of Morpho Challenge Competition 2 (information retrieval). The Competition 1 (linguistic gold standard) is described in a companion paper. In Morpho Challenge 2007, the objective was to design statistical machine learning algorithms that discover which morphemes (smallest individually meaningful units of language) words(More)
In this paper, we investigate methods for improving the performance of morph-based spoken document retrieval in Finnish by extracting relevant index terms from confusion networks. Our approach uses morpheme-like subword units ("morphs") for recognition and indexing. This alleviates the problem of out-of-vocabulary words, especially with inflectional(More)
Unsupervised and semi-supervised learning of morphology provide practical solutions for processing morphologically rich languages with less human labor than the traditional rule-based analyzers. Direct evaluation of the learning methods using linguistic reference analyses is important for their development, as evaluation through the final applications is(More)
Morph-based spoken document retrieval uses morpheme-like subword units for both language modeling and as index terms. Problems of out-of-vocabulary (OOV) words are avoided as the morph recognizer can recognize any word in speech as a sequence of subwords. The effect of previously unseen query words (i.e. words that are not in the language model training(More)
An important difference between the retrieval of spoken and written documents is that the indexing of the speech data is usually based on automatic speech transcripts that contain recognition errors. However, there are several ways of reducing the effect of incorrect index terms in the retrieval. This paper presents retrieval experiments with unlimited(More)
This article examines the use of statistically discovered morpheme-like units for Spoken Document Retrieval (SDR). The morpheme-like units (<i>morphs</i>) are used both for language modeling in speech recognition and as index terms. Traditional word-based methods suffer from out-of-vocabulary words. If a word is not in the recognizer vocabulary, any(More)