• Corpus ID: 227230540

KanCMD: Kannada CodeMixed Dataset for Sentiment Analysis and Offensive Language Detection

  title={KanCMD: Kannada CodeMixed Dataset for Sentiment Analysis and Offensive Language Detection},
  author={Adeep Hande and Ruba Priyadharshini and Bharathi Raja Chakravarthi},
We introduce Kannada CodeMixed Dataset (KanCMD), a multi-task learning dataset for sentiment analysis and offensive language identification. The KanCMD dataset highlights two realworld issues from the social media text. First, it contains actual comments in code mixed text posted by users on YouTube social media, rather than in monolingual text from the textbook. Second, it has been annotated for two tasks, namely sentiment analysis and offensive language detection for under-resourced Kannada… 

Figures and Tables from this paper

