The Study of Effect of Length in Morphological Segmentation of Agglutinative Languages
@inproceedings{Ramasamy2012TheSO, title={The Study of Effect of Length in Morphological Segmentation of Agglutinative Languages}, author={Loganathan Ramasamy and Z. Žabokrtsk{\'y} and Sowmya Vajjala}, year={2012} }
Morph length is one of the indicative feature that helps learning the morphology of languages, in particular agglutinative languages. In this paper, we introduce a simple unsupervised model for morphological segmentation and study how the knowledge of morph length affect the performance of the segmentation task under the Bayesian framework. The model is based on (Goldwater et al., 2006) unigram word segmentation model and assumes a simple prior distribution over morph length. We experiment this…
3 Citations
Ascertaining the morphological components of Tamil language using unsupervised approach
- Linguistics, Computer Science2016 Online International Conference on Green Engineering and Technologies (IC-GET)
- 2016
This paper focuses on unsupervised means of segmenting Tamil lexicons by using a novel algorithm across various parameters and shows a promising result in favour to the identification of morphemes with their suffixes.
CLARA: A New Generation of Researchers in Common Language Resources and Their Applications
- Computer ScienceLREC
- 2014
The project has trained a new generation of researchers who can perform advanced research and development in language resources and technologies.
Unsupervised and Semi-Supervised Multilingual Learning for Resource-Poor Languages
- Computer Science
- 2012
The experiment shows that using extra features improves the performance of the unsupervised model and presents a generative model that could use word representation as extra features.
25 References
Unsupervised Learning of the Morphology of a Natural Language
- Computer ScienceCL
- 2001
This study reports the results of using minimum description length (MDL) analysis to model unsupervised learning of the morphological segmentation of European languages, using corpora ranging in size…
High-Performance, Language-Independent Morphological Segmentation
- LinguisticsNAACL
- 2007
This paper introduces an unsupervised morphological segmentation algorithm that shows robust performance for four languages with different levels of morphological complexity and achieves performance that is comparable to the best results for all three PASCAL evaluation datasets.
A Bayesian Model for Morpheme and Paradigm Identification
- LinguisticsACL
- 2001
A system for unsupervised learning of morphological affixes from texts or word lists composed of a generative probability model and a search algorithm that can be formalized in terms of the lattice formed by subsets of suffixes under inclusion.
Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0
- Linguistics, Computer Science
- 2005
The first public version of the Morfessor software is described, which is a program that takes as input a corpus of unannotated text and produces a segmentation of the word forms observed in the text.
Unsupervised Segmentation of Words Using Prior Distributions of Morph Length and Frequency
- Computer ScienceACL
- 2003
We present a language-independent and unsupervised algorithm for the segmentation of words into morphs. The algorithm is based on a new generative probabilistic model, which makes use of relevant…
Morphological Analyzer for Agglutinative Languages Using Machine Learning Approaches
- Computer ScienceARTCom
- 2009
This new and state of the art machine learning approach based on sequence labeling and training by kernel methods captures the non-linear relationships in the different aspect of morphological features of natural languages in a better and simpler way.
Unsupervised Multilingual Learning for Morphological Segmentation
- Computer Science, LinguisticsACL
- 2008
A nonparametric Bayesian model is presented that jointly induces morpheme segmentations of each language under consideration and at the same time identifies cross-lingual morphem patterns, or abstract morphemes, of multiple languages.
A TELUGU MORPHOLOGICAL ANALYZER
- Computer Science
- 2011
The present proposal is a demonstration of the optimal organization of linguistic database and its performance in computational environment by ensuring high precision and coverage in the parsing of wordforms and a detailed description of thedatabase and its compilation for the purpose of morphological analysis using Word and Paradigm Model.
Improving morphology induction by learning spelling rules
- Computer ScienceIJCAI 2009
- 2009
A Bayesian model for simultaneously inducing both morphology and spelling rules is developed and it is shown that the addition of spelling rules improves performance over the baseline morphology-only model.
A Language-Independent Unsupervised Model for Morphological Segmentation
- Computer ScienceACL
- 2007
An algorithm is described that draws from previous approaches and combines them into a simple model for morphological segmentation that outperforms other approaches on English and German, and also yields good results on agglutinative languages such as Finnish and Turkish.