• Corpus ID: 2666779

The Study of Effect of Length in Morphological Segmentation of Agglutinative Languages

  title={The Study of Effect of Length in Morphological Segmentation of Agglutinative Languages},
  author={Loganathan Ramasamy and Z. Žabokrtsk{\'y} and Sowmya Vajjala},
Morph length is one of the indicative feature that helps learning the morphology of languages, in particular agglutinative languages. In this paper, we introduce a simple unsupervised model for morphological segmentation and study how the knowledge of morph length affect the performance of the segmentation task under the Bayesian framework. The model is based on (Goldwater et al., 2006) unigram word segmentation model and assumes a simple prior distribution over morph length. We experiment this… 
Ascertaining the morphological components of Tamil language using unsupervised approach
This paper focuses on unsupervised means of segmenting Tamil lexicons by using a novel algorithm across various parameters and shows a promising result in favour to the identification of morphemes with their suffixes.
CLARA: A New Generation of Researchers in Common Language Resources and Their Applications
The project has trained a new generation of researchers who can perform advanced research and development in language resources and technologies.
Unsupervised and Semi-Supervised Multilingual Learning for Resource-Poor Languages
The experiment shows that using extra features improves the performance of the unsupervised model and presents a generative model that could use word representation as extra features.


Unsupervised Learning of the Morphology of a Natural Language
This study reports the results of using minimum description length (MDL) analysis to model unsupervised learning of the morphological segmentation of European languages, using corpora ranging in size
High-Performance, Language-Independent Morphological Segmentation
This paper introduces an unsupervised morphological segmentation algorithm that shows robust performance for four languages with different levels of morphological complexity and achieves performance that is comparable to the best results for all three PASCAL evaluation datasets.
A Bayesian Model for Morpheme and Paradigm Identification
A system for unsupervised learning of morphological affixes from texts or word lists composed of a generative probability model and a search algorithm that can be formalized in terms of the lattice formed by subsets of suffixes under inclusion.
Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0
The first public version of the Morfessor software is described, which is a program that takes as input a corpus of unannotated text and produces a segmentation of the word forms observed in the text.
Unsupervised Segmentation of Words Using Prior Distributions of Morph Length and Frequency
We present a language-independent and unsupervised algorithm for the segmentation of words into morphs. The algorithm is based on a new generative probabilistic model, which makes use of relevant
Unsupervised Multilingual Learning for Morphological Segmentation
A nonparametric Bayesian model is presented that jointly induces morpheme segmentations of each language under consideration and at the same time identifies cross-lingual morphem patterns, or abstract morphemes, of multiple languages.
The present proposal is a demonstration of the optimal organization of linguistic database and its performance in computational environment by ensuring high precision and coverage in the parsing of wordforms and a detailed description of thedatabase and its compilation for the purpose of morphological analysis using Word and Paradigm Model.
Improving morphology induction by learning spelling rules
A Bayesian model for simultaneously inducing both morphology and spelling rules is developed and it is shown that the addition of spelling rules improves performance over the baseline morphology-only model.
A Language-Independent Unsupervised Model for Morphological Segmentation
An algorithm is described that draws from previous approaches and combines them into a simple model for morphological segmentation that outperforms other approaches on English and German, and also yields good results on agglutinative languages such as Finnish and Turkish.
Generic Morphological Analysis Shell
A generic shell which can be used to develop morphological analyzers for different languages particularly minority languages is described.