• Corpus ID: 51951728

Building a Kannada POS Tagger Using Machine Learning and Neural Network Models

@article{Todi2018BuildingAK,
  title={Building a Kannada POS Tagger Using Machine Learning and Neural Network Models},
  author={Ketan Kumar Todi and Pruthwik Mishra and Dipti Misra Sharma},
  journal={ArXiv},
  year={2018},
  volume={abs/1808.03175}
}
POS Tagging serves as a preliminary task for many NLP applications. Kannada is a relatively poor Indian language with very limited number of quality NLP tools available for use. An accurate and reliable POS Tagger is essential for many NLP tasks like shallow parsing, dependency parsing, sentiment analysis, named entity recognition. We present a statistical POS tagger for Kannada using different machine learning and neural network models. Our Kannada POS tagger outperforms the state-of-the-art… 

Figures and Tables from this paper

Using machine learning to build POS tagger for under-resourced language: the case of Somali

  • Siraj Mohammed
  • Computer Science
    International Journal of Information Technology
  • 2020
This paper presents a statistical POS tagger for Somali language using different machine learning approaches (i.e., HMM and CRF) and neural network model and explores the use word embeddings for Somali POS tagging.

Using machine learning to build POS tagger for under-resourced language: the case of Somali

This paper presents a statistical POS tagger for Somali language using different machine learning approaches (i.e., HMM and CRF) and neural network model and explores the use word embeddings for Somali POS tagging.

Parts of Speech Tagging for Kannada and Hindi Languages using ML and DL models

The proposed work deals with the development of a POS tagger for both Kannada and Hindi by employing Machine Learning (ML) and Deep Learning (DL) algorithms.

Creation of Corpus and Analysis in Code-Mixed Kannada-English Social Media Data for POS Tagging

Kannada-English code-mixed social media corpus annotated with corresponding POS tags is presented and machine learning classification models CRF, Bi-LSTM, and Bi- lSTM-CRF models are experimented with on the authors' corpus.

A study on the performance of Recurrent Neural Network based models in Maithili Part of Speech Tagging

  • Ankur PriyadarshiS. Saha
  • Computer Science
    ACM Transactions on Asian and Low-Resource Language Information Processing
  • 2022
This paper presents the effort in developing a Maithili Part of Speech (POS) tagger, which employs several recurrent neural networks (RNN) based models, including Long-short Term Memory (LSTM), Gated Recurrent Unit (GRU), LSTM with a CRF layer (L STM-CRF), and GRU with aCRF layer(s) and performs a comparative study.

A comprehensive survey on Indian regional language processing

The various approaches and techniques contributed by the researchers for Indian regional language processing, including machine translation, Named Entity Recognition, Sentiment Analysis and Parts-Of-Speech tagging are reviewed with respect to Rule, Statistical and Neural based approaches.

Surfing the Modeling of pos Taggers in Low-Resource Scenarios

This work evaluates the early estimation of learning curves as a practical mechanism for selecting the most appropriate model in scenarios characterized by the use of non-deep learners in resource-lean settings and studies the reliability of such an approach in a different and much more demanding operational environment.

Creation of Corpus and analysis in Code-Mixed Kannada-English Twitter data for Emotion Prediction

This paper analyzes the problem of emotion prediction on corpus obtained from code-mixed Kannada-English extracted from Twitter annotated with their respective ‘Emotion’ for each tweet and experiments with machine learning prediction models using features like Character N-Grams, Word N- Grams, Repetitive characters, and others on SVM and LSTM on the corpus.

A Systematic Review on POS Tagging

This paper explains the strategies, followed by researchers, in the domain of text tagging to enhance the performance of existing POS taggers.

Critical Analysis of Existing Punjabi Grammar Checker and a Proposed Hybrid Framework Involving Machine Learning and Rule-Base Criteria

  • V. VermaS. Sharma
  • Computer Science
    ACM Transactions on Asian and Low-Resource Language Information Processing
  • 2022
A hybrid framework is proposed as an efficient way of analyzing correction in sentences through the said booming technique of Machine Learning explicitly using Deep Neural Networks in combination with the existing rule-based approach.

References

SHOWING 1-10 OF 20 REFERENCES

Kernel based part of speech tagger for Kannada

  • P. AntonyK. Soman
  • Computer Science
    2010 International Conference on Machine Learning and Cybernetics
  • 2010
The proposed paper presents the development of a part-of-speech tagger for Kannada language that can be used for analyzing and annotating Kannataka texts and finds that the result obtained was more efficient and accurate compared with earlier methods for Kannonada POS tagging.

Kannada Part-Of-Speech Tagging with Probabilistic Classifiers

Second order Hidden Markov Model (HMM) and Conditional Random Fields (CRF) is chosen in this work for POS tagging of Kannada language and the accuracy of the tools based on HMM and CRF is 79.9% and 84.58% respectively.

SVM Based Part of Speech Tagger for Malayalam

  • A. P.J.S. P. MohanSoman K.P.
  • Computer Science
    2010 International Conference on Recent Trends in Information, Telecommunication and Computing
  • 2010
The objective of this project was to identify the ambiguities in Malayalam lexical items and develop an efficient and accurate POS Tagger and found that the result obtained was moreefficient and accurate compared with earlier methods forMalayalam POS tagging.

Pattern Based Bootstrapping Technique for Tamil POS Tagging

A pattern based bootstrapping approach using only a small set of POS labeled suffix context patterns is presented, which generates new patterns by iteratively masking suffixes with low probability of occurrences in the suffix context, and replacing them with other co-occurring suffixes.

Improving statistical POS tagging using Linguistic feature for Hindi and Telugu

How adding features to HMM improves its accuracy is described, and a method for effective handling of compound words in Hindi and Telugu is described.

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

A model for constructing vector representations of words by composing characters using bidirectional LSTMs that requires only a single vector per character type and a fixed set of parameters for the compositional model, which yields state- of-the-art results in language modeling and part-of-speech tagging.

TnT - A Statistical Part-of-Speech Tagger

Contrary to claims found elsewhere in the literature, it is argued that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework.

Significance of an Accurate Sandhi-Splitter in Shallow Parsing of Dravidian Languages

This work provides an in-depth analysis of effect of Sandhi in developing a robust shallow parser pipeline with experimental results emphasizing on how sensitive the individual components of shallow parser are, towards the accuracy of a sandhi splitter.

Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms

Experimental results on part-of-speech tagging and base noun phrase chunking are given, in both cases showing improvements over results for a maximum-entropy tagger.