yosm: A new yoruba sentiment corpus for movie reviews

@article{Shode2022yosmAN,
  title={yosm: A new yoruba sentiment corpus for movie reviews},
  author={Iyanuoluwa Shode and David Ifeoluwa Adelani and Anna Feldman},
  journal={ArXiv},
  year={2022},
  volume={abs/2204.09711}
}
Sentiment Analysis is a popular text classification task in natural language processing. It involves developing algorithms or machine learning models to determine the sentiment or opinion expressed in a piece of text. The results of this task can be used by business owners and product developers to understand their consumers’ perceptions of their products. Asides from customer feedback and product/service analysis, this task can be useful for social media monitoring (Martin et al., 2021). One of… 

Tables from this paper

References

SHOWING 1-10 OF 15 REFERENCES

SENTIMENTAL ANALYSIS FOR MOVIE REVIEWS

TLDR
This work considered two different datasets both pre-dominantly pertaining to IMDB as source, which composed only textual content which was processed by removing unnecessary contents and distributed into two categories namely positive and negative.

Sentiment Analysis on Movie Reviews

TLDR
This project explored the use of various supervised machine learning algorithms in learning sentiment classifier and tested the effectiveness of different feature selection algorithms in improving those classifiers.

Sentiment Analysis on Urdu Tweets Using Markov Chains

TLDR
A sentiment analysis approach based on Markov chains for predicting the sentiment of Urdu tweets outperforms the lexicon-based and traditional machine learning-based approaches of sentiment analysis.

Exploring Amharic Sentiment Analysis from Social Media Texts: Building Annotation Tools and Classification Models

TLDR
The challenges in building a sentiment analysis system for Amharic are investigated and it is found that the widespread usage of sarcasm and figurative speech are the main issues in dealing with the problem.

Sentiment Classification in Swahili Language Using Multilingual BERT

TLDR
This study performs sentiment classification on Swahili datasets by using the current state of the art model, multilingual BERT, and achieves the best accuracy of 87.59%.

NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis

TLDR
The first large-scale human-annotated Twitter sentiment dataset for Nigeria—Hausa, Igbo, Nigerian-Pidgin, and Yorùbá—consisting of around 30,000 annotated tweets per language is introduced, including a significant fraction of code-mixed tweets.

The Effect of Domain and Diacritics in Yoruba–English Neural Machine Translation

TLDR
This paper presents MENYO-20k, the first multi-domain parallel corpus with a special focus on clean orthography for Yorùbá–English with standardized train-test splits for benchmarking and investigates how and when this training condition affects the final quality and intelligibility of a translation.

Massive vs. Curated Embeddings for Low-Resourced Languages: the Case of Yorùbá and Twi

TLDR
This paper focuses on two African languages, Yorùbá and Twi, and uses different architectures that learn word representations both from surface forms and characters to further exploit all the available information which showed to be important for these languages.

Small Data? No Problem! Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages

TLDR
It is shown that it is possible to train competitive multilingual language models on less than 1 GB of text and results suggest that the “small data” approach based on similar languages may sometimes work better than joint training on large datasets with high-resource languages.

MasakhaNER: Named Entity Recognition for African Languages

TLDR
This work brings together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages and details the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks.