Sanskrit lemmatizer for improvisation of morphological analyzer

@article{Raulji2019SanskritLF,
  title={Sanskrit lemmatizer for improvisation of morphological analyzer},
  author={Jaideepsinh K. Raulji and Jatinderkumar R. Saini},
  journal={Journal of Statistics and Management Systems},
  year={2019},
  volume={22},
  pages={613 - 625}
}
Abstract The process of stripping off affixes from a word to arrive at root word or lemma is known as Lemmatization. The usefulness of lemmatizer in natural language operations cannot be overlooked especially if the language is rich in its morphology. A lexicon cum rule based lemmatizer is built for Sanskrit Language. The Lemmatizer has profound applications in NLP main stream tasks like Information Retrieval, Morphological Analyzer ,POS taggers, Question-Answering Systems, Machine Translation… 
Peer Analysis of “Sanguj” with Other Sanskrit Morphological Analyzers
TLDR
Here, 328 Sanskrit words are tested through four morphological analyzers namely—Samsaadhanii, morphological Analyzers by JNU and TDIL, both of which are available online and locally developed and installed Sanguj morphological analyzezer.
How low is too low? A monolingual take on lemmatisation in Indian languages
TLDR
It is shown that monolingual approaches with data augmentation can give competitive accuracy even in the low resource setting, which augurs well for NLP in low resourceSetting.
A Novel Framework for Sanskrit-Gujarati Symbolic Machine Translation System
—Sanskrit falls under the Indo-European language family category. Gujarati, which has descended from the Sanskrit language, is a widely spoken language particularly in the Indian state of Gujarat.
BaNeL: an encoder-decoder based Bangla neural lemmatizer
TLDR
This study presents an efficient framework of deriving lemma from an inflected Bangla word considering its parts-of-speech as context and an artificial neural network based efficient model for lemmatization that yields comparatively better performance than existing ones.
Measuring the Similarity between the Sanskrit Documents using the Context of the Corpus
TLDR
The proposed approach processes the oldest, untouched, one of the morphologically critical languages, Sanskrit and builds a document term matrix for Sanskrit (DTMS) and Document synset matrix Sanskrit (DSMS) to solve the problem of polysemy.
On Readability Metrics of Goal Statements of Universities and Brand-Promoting Lexicons for Industries
TLDR
The correlation between the found lexicons and the revenues generated by the considered companies is advocated and Pearson's correlation coefficient and Flesch Readability Index are deployed for the calculation of various metrics to form the basis of the conclusions.
Sanskrit Stemmer Design: A Literature Perspective

References

SHOWING 1-10 OF 18 REFERENCES
Design of rule based lemmatizer for Kannada inflectional words
  • R. Prathibha, M. Padma
  • Linguistics
    2015 International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT)
  • 2015
TLDR
The design of rule based lemmatizer is proposed by adding set of linguistic rules to extract proper and meaningful root from Kannada inflectional word.
BenLem (A Bengali Lemmatizer) and Its Role in WSD
TLDR
A lemmatization algorithm for Bengali has been developed and evaluated and is found to be capable of handling both inflectional and derivational morphology in Bengali, and improves the performance of all the WSD systems and the improvements are statistically significant.
Building a Wide Coverage Sanskrit Morphological Analyzer : A Practical Approach
TLDR
The complexity involved in building a wide coverage analyzer for Sanskrit is pointed out and a morphological analyzer that has been built using the available eresources, based on ad-hoc principles is described.
Treebank based deep grammar acquisition and Part-Of-Speech Tagging for Sanskrit sentences
  • N. Tapaswi, S. Jain
  • Linguistics, Computer Science
    2012 CSI Sixth International Conference on Software Engineering (CONSEG)
  • 2012
TLDR
This work presents simple rule-based POST for Sanskrit language that uses rule based approach to tag each word of the sentence and assigns suitable tag to each word automatically in the given Sanskrit sentence.
Sanskrit Morphological Analyser: Some Issues
TLDR
Sanskrit has rich inflectional as well as derivational morphology, and in spite of the existence of a formally defined and well described grammar, construction of a set of computational tools for the analysis of Sanskrit texts could not take a momentum for a long time.
A Distributed Platform for Sanskrit Processing
Sanskrit, the classical language of India, presents specific challenges for computational linguistics: exact phonetic transcription in writing that obscures word boundaries, rich morphology and an
An innovative lemmatization technique for Bangla nouns by using longest suffix stripping methodology in decreasing order
  • A. R. Pal, N. Dash, D. Saha
  • Linguistics
    2015 International Conference on Computing and Network Communications (CoCoNet)
  • 2015
TLDR
An attempt is made to find out the root part from inflected Bangla nouns by applying an innovative technique by using longest suffix stripping methodology in decreasing order.
Sanskrit as a Programming Language and Natural Language Processing
TLDR
This paper is presenting the work towards building a dependency parser for Sanskrit language that uses deterministic finite automata(DFA) for morphological analysis and 'utsarga apavaada' approach for relation analysis.
A Memory-Based Lemmatizer for Ancient Greek
TLDR
GLEM is the first publicly available lemmatizer for Ancient Greek that uses POS information to disambiguate and that also assigns output to unseen words, words that are not yet in the lexicon.
A Self-Learning Context-Aware Lemmatizer for German
TLDR
This work presents a self-learning lemmatizer capable of automatically creating a full-form lexicon by processing German documents.
...
...