Corpus ID: 2666779

The Study of Effect of Length in Morphological Segmentation of Agglutinative Languages

  title={The Study of Effect of Length in Morphological Segmentation of Agglutinative Languages},
  author={L. Ramasamy and Z. Žabokrtsk{\'y} and Sowmya Vajjala},
Morph length is one of the indicative feature that helps learning the morphology of languages, in particular agglutinative languages. In this paper, we introduce a simple unsupervised model for morphological segmentation and study how the knowledge of morph length affect the performance of the segmentation task under the Bayesian framework. The model is based on (Goldwater et al., 2006) unigram word segmentation model and assumes a simple prior distribution over morph length. We experiment this… Expand
Ascertaining the morphological components of Tamil language using unsupervised approach
This paper focuses on unsupervised means of segmenting Tamil lexicons by using a novel algorithm across various parameters and shows a promising result in favour to the identification of morphemes with their suffixes. Expand
CLARA: A New Generation of Researchers in Common Language Resources and Their Applications
The project has trained a new generation of researchers who can perform advanced research and development in language resources and technologies. Expand
Unsupervised and Semi-Supervised Multilingual Learning for Resource-Poor Languages
The experiment shows that using extra features improves the performance of the unsupervised model and presents a generative model that could use word representation as extra features. Expand


Unsupervised Learning of the Morphology of a Natural Language
This study reports the results of using minimum description length (MDL) analysis to model unsupervised learning of the morphological segmentation of European languages, using corpora ranging in sizeExpand
High-Performance, Language-Independent Morphological Segmentation
This paper introduces an unsupervised morphological segmentation algorithm that shows robust performance for four languages with different levels of morphological complexity and achieves performance that is comparable to the best results for all three PASCAL evaluation datasets. Expand
A Bayesian Model for Morpheme and Paradigm Identification
A system for unsupervised learning of morphological affixes from texts or word lists composed of a generative probability model and a search algorithm that can be formalized in terms of the lattice formed by subsets of suffixes under inclusion. Expand
Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0
The first public version of the Morfessor software is described, which is a program that takes as input a corpus of unannotated text and produces a segmentation of the word forms observed in the text. Expand
Unsupervised Segmentation of Words Using Prior Distributions of Morph Length and Frequency
We present a language-independent and unsupervised algorithm for the segmentation of words into morphs. The algorithm is based on a new generative probabilistic model, which makes use of relevantExpand
Morphological Analyzer for Agglutinative Languages Using Machine Learning Approaches
This new and state of the art machine learning approach based on sequence labeling and training by kernel methods captures the non-linear relationships in the different aspect of morphological features of natural languages in a better and simpler way. Expand
Unsupervised Multilingual Learning for Morphological Segmentation
A nonparametric Bayesian model is presented that jointly induces morpheme segmentations of each language under consideration and at the same time identifies cross-lingual morphem patterns, or abstract morphemes, of multiple languages. Expand
A Morphological Analyzer (MA) is a program which compiles and analyses words of a natural language into their roots and their constituent morpho-syntactic elements along with their attributes. TheExpand
Improving morphology induction by learning spelling rules
A Bayesian model for simultaneously inducing both morphology and spelling rules is developed and it is shown that the addition of spelling rules improves performance over the baseline morphology-only model. Expand
A Language-Independent Unsupervised Model for Morphological Segmentation
An algorithm is described that draws from previous approaches and combines them into a simple model for morphological segmentation that outperforms other approaches on English and German, and also yields good results on agglutinative languages such as Finnish and Turkish. Expand