• Corpus ID: 227305851

FinnSentiment - A Finnish Social Media Corpus for Sentiment Polarity Annotation

  title={FinnSentiment - A Finnish Social Media Corpus for Sentiment Polarity Annotation},
  author={Krister Lind{\'e}n and T. Jauhiainen and Sam Hardwick},
Sentiment analysis and opinion mining is an important task with obvious application areas in social media, e.g. when indicating hate speech and fake news. In our survey of previous work, we note that there is no large-scale social media data set with sentiment polarity annotations for Finnish. This publications aims to remedy this shortcoming by introducing a 27,000 sentence data set annotated independently with sentiment polarity by three native annotators. We had the same three annotators for… 
Evaluating morphological typology in zero-shot cross-lingual transfer
This paper addresses what effects morphological typology has on zero-shot cross-lingual transfer for two tasks: Part-of-speech tagging and sentiment analysis and finds that transfer to another morphological type generally implies a higher loss than transfer toanother language with the same morphologicalTypology.
The Current State of Finnish NLP
This paper surveys recent papers focusing on Finnish NLP related to many different subcategories of NLP such as parsing, generation, semantics and speech.


RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian
RuSentiment, a new dataset for sentiment analysis of social media posts in Russian, and a new set of comprehensive annotation guidelines that are extensible to other languages are presented.
SenTube: A Corpus for Sentiment Analysis on YouTube Social Media
In this paper we present SenTube -- a dataset of user-generated comments on YouTube videos annotated for information content and sentiment polarity. It contains annotations that allow to develop
Annotating evaluative sentences for sentiment analysis: a dataset for Norwegian
This paper documents the creation of a large-scale dataset of evaluative sentences – i.e. both subjective and objective sentences that are found to be sentiment-bearing – based on mixed-domain
The Challenges of Multi-dimensional Sentiment Analysis Across Languages
This paper outlines a pilot study on multi-dimensional and multilingual sentiment analysis of social media content. We use parallel corpora of movie subtitles as a proxy for colloquial language in
An Annotated Corpus for Sentiment Analysis in Political News
A corpus of news texts in Brazilian Portuguese, segmented in paragraphs, and marked up by a group of four annotators, which built a gold standard, where paragraphs are classified according to the opinion of the majority of annotators.
Gold-standard for Topic-specific Sentiment Analysis of Economic Texts
The annotations of 297 documents and over 9000 sentences can be used for research purposes when developing methods for detecting topic-wise sentiment in financial text and are evaluated using a number of inter-annotator agreement metrics.
A Multilingual Social Media Linguistic Corpus
This paper focuses on multilingual social media and introduces the xLiMe Twitter Corpus that contains messages in German, Italian and Spanish manually annotated with Part-of-Speech, Named Entities,
Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates
It is concluded that the challenge in performing opinion mining in such type of content is correctly identifying the positive opinions, because they are much less frequent than negative opinions and they are particularly exposed to verbal irony.
Datasets for Aspect-Based Sentiment Analysis in French
Two datasets for the development and testing of ABSA systems for French which comprise user reviews annotated with relevant entities, aspects and polarity values are described.
An annotated corpus for Turkish sentiment analysis at sentence level
A Turkish sentiment corpus, which is comprised of user reviews and is annotated semi-automatically, is constructed and this dataset is made easy to use for Java applications by creating JSON data.