• Corpus ID: 226307957

The_Illiterati: Part-of-Speech Tagging for Magahi and Bhojpuri without even knowing the alphabet

  title={The\_Illiterati: Part-of-Speech Tagging for Magahi and Bhojpuri without even knowing the alphabet},
  author={Thomas Proisl and Peter Uhrig and Andreas Blombach and Natalie Dykes and Philipp Heinrich and Besim Kabashi and Sefora Mammarella},
In this paper, we describe the part-of-speechtagging experiments for Magahi and Bhojpuri that we conducted for our participation in the NSURL 2019 shared tasks 9 and 10 (Lowlevel NLP Tools for (Magahi|Bhojpuri) Language). We experiment with three different part-of-speech taggers and evaluate the impact of additional resources such as Brown clusters, word embeddings and transfer learning from additional tagged corpora in related languages. In a 10-fold cross-validation on the training data, our… 

Figures and Tables from this paper

