Corpus ID: 15628229

A Rule based Approach to Word Lemmatization

@inproceedings{Plisson2004ARB,
  title={A Rule based Approach to Word Lemmatization},
  author={Jo{\"e}l Plisson and Nada Lavra{\vc} and Dunja Mladenic},
  year={2004}
}
Lemmatization is the process of finding the normalized form of a word. [...] Key Result When learning from a corpus of lemmatized Slovene words the RDR approach results in easy to understand rules of improved classification accuracy compared to the results of rule learning achieved in previous work.Expand
A Lemmatization Web Service Based on Machine Learning Techniques
Lemmatization is the process of finding the normalized form of words from surface word-forms as they appear in the running text. It is a useful pre-processing step for any number of languageExpand
Lemmatization in Balinese Language
Lemmatization is a process to extracting root word from an affixed word with the aim of reducing variations of the word into the root word. Previous researches on extraction of root word in BalineseExpand
Highly Language-Independent Word Lemmatization Using a Machine-Learning Classifier
TLDR
An open-source language-independent lemmatizer based on the Random Forest classification model, which is a supervised machine-learning algorithm with decision trees that are constructed corresponding to the grammatical features of the language. Expand
Design of a Rule Based Hindi Lemmatizer
TLDR
In this paper, an inflectional lemmatizer is created which generates the rules for extracting the suffixes and also added rules for generating a proper meaningful root word. Expand
Design and Development of a Rule-Based Urdu Lemmatizer
TLDR
The rule-based Urdu Lemmatizer is created that works by eliminating suffix from the root word and adds some required and relevant information to extract the meaningful root. Expand
A Comparative Study of Stemming Algorithms
TLDR
This paper has discussed different methods of stemming and their comparisons in terms of usage, advantages as well as limitations, and the basic difference between stemming and lemmatization. Expand
A Comparative Study of Stemming Algorithms Ms .
TLDR
This paper has discussed different methods of stemming and their comparisons in terms of usage, advantages as well as limitations, and the basic difference between stemming and lemmatization. Expand
Design and development of lemmatizer for Sindhi language in devanagri script
TLDR
This is the first attempt to develop a lemmatizer for Sindhi language in devanagri script and it is shown that correct root word or dictionary word generated by applying some rules for affix removal and some additional rules for making a correct dictionary word is correct. Expand
Automatic training of lemmatization rules that handle morphological changes in pre-, in- and suffixes alike
We propose a method to automatically train lemmatization rules that handle prefix, infix and suffix changes to generate the lemma from the full form of a word. We explain how the lemmatization rulesExpand
Lemmatisation for under-resourced languages with sequence-to-sequence learning: A case of Early Irish
Lemmatisation, which is one of the most important stages of text preprocessing, consists in grouping the inflected forms of a word together so they can be analysed as a single item, identified by theExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 14 REFERENCES
MACHINE LEARNING OF MORPHOSYNTACTIC STRUCTURE: LEMMATIZING UNKNOWN SLOVENE WORDS
TLDR
A statistics-based trigram tagger is used to learn morphosyntactic tagging and a first-order decision list learning system is usedto learn rules for morphological analysis, which produces the lemma from the word-form given the correct morphosynthesis tag. Expand
An algorithm for suffix stripping
TLDR
An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL and performs slightly better than a much more elaborate system with which it has been compared. Expand
The MULTEXT-East Slovene Lexicon
TLDR
The Slovene lexicon contains the full innectional paradigms for over 15,000 lemmas; it has over half a million entries, where each entry gives the word-form, its lemma and mor-phosyntactic description. Expand
Learning Decision Lists
This paper introduces a new representation for Boolean functions, called decision lists, and shows that they are efficiently learnable from examples. More precisely, this result is established forExpand
A Sequential Model for Multi-Class Classification
TLDR
A sequential learning model is suggested that utilizes classifiers to sequentially restrict the number of competing classes while maintaining, with high probability, the presence of the true outcome in the candidates set. Expand
Combinatorial Optimization in Inductive Concept Learning
TLDR
The main objective of this paper is an empirical analysis of different optimization algorithms and some of their combinations in comparison with a decision tree learning algorithm. Expand
The CN2 Induction Algorithm
TLDR
A description and empirical evaluation of a new induction system, CN2, designed for the efficient induction of simple, comprehensible production rules in domains where problems of poor description language and/or noise may be present. Expand
The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains
TLDR
The demonstration that by applying the proposed method of cover truncation and analogical matching, called TRUNC, one may drastically decrease the complexity of the knowledge base without affecting its performance accuracy is demonstrated. Expand
Learning word normalization using word suffix and context from unlabeled data
A Sequential Model for Multiclass Classification
  • Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP)
  • 2001
...
1
2
...