Learn More
SentimentWortschatz, or SentiWS for short, is a publicly available German-language resource for sentiment analysis, opinion mining etc. It lists positive and negative sentiment bearing words weighted within the interval of [−1; 1] plus their part of speech tag, and if applicable, their inflections. The current version of SentiWS (v1.8b) contains 1,650(More)
We present ExB Themis – a word alignment-based semantic textual similarity system developed for SemEval-2015 Task 2: Semantic Textual Similarity. It combines both string and semantic similarity measures as well as alignment features using Support Vector Regression. It occupies the first three places on Span-ish data and additionally places second on(More)
  • Robert Remus
  • 2012
We propose an approach to domain adaptation that selects instances from a source domain training set, which are most similar to a target domain. The factor by which the original source domain training set size is reduced is determined automatically by measuring domain similarity between source and target domain as well as their domain complexity variance.(More)
In this paper, we describe MLSA, a publicly available multi-layered reference corpus for German-language sentiment analysis. The construction of the corpus is based on the manual annotation of 270 German-language sentences considering three different layers of granularity. The sentence-layer annotation, as the most coarse-grained annotation, focuses on(More)
We show that the quality of sentence-level subjectivity classification, i.e. the task of deciding whether a sentence is subjective or objective, can be improved by incorporating hitherto unused features: readability measures. Hence we investigate in 6 different readability formulae and propose an own. Their performance is evaluated in a 10-fold cross(More)
An analysis of a diachronically organised corpus of Germanlanguage newspaper articles and blog posts on economy and finance is presented using a prototype dictionary of affect in German. The changes in the frequency of occurrence of positive and negative polarity words are rendered as return time series and the properties of this time series are described.(More)
This paper describes University of Leipzig's approach to SemEval-2013 task 2B on Sentiment Analysis in Twitter: message polarity classification. Our system is designed to function as a baseline, to see what we can accomplish with well-understood and purely data-driven lexical features, simple generalizations as well as standard machine learning techniques:(More)
We propose a scheme for explicitly modeling and representing negation of word n-grams in an augmented word n-gram feature space. For the purpose of negation scope detection, we compare 2 methods: the simpler regular expression-based NegEx, and the more sophisticated Conditional Random Field-based LingScope. Additionally, we capture negation implicitly via(More)