Corpus ID: 199448337

Atalaya at TASS 2019: Data Augmentation and Robust Embeddings for Sentiment Analysis

  title={Atalaya at TASS 2019: Data Augmentation and Robust Embeddings for Sentiment Analysis},
  author={Franco Mart{\'i}n Luque},
  • F. Luque
  • Published in IberLEF@SEPLN 25 September 2019
  • Computer Science
In this article we describe our participation in TASS 2019, a shared task aimed at the detection of sentiment polarity of Spanish tweets. We combined different representations such as bag-of-words, bag-of-characters, and tweet embeddings. In particular, we trained robust subword-aware word embeddings and computed tweet representations using a weighted-averaging strategy. We also used two data augmentation techniques to deal with data scarcity: two-way translation augmentation, and instance… Expand

Figures, Tables, and Topics from this paper

Emotion Detection for Spanish with Data Augmentation and Transformer-Based Models
  • Hongxin Luo
  • Computer Science
  • 2021
The participation of Yeti team in IberLEF EmoEvalEs task, which is based on the Spanish Semantic Analysis in TASS 2020 version, and proposes as separate task for 2021 in IerLEF is described. Expand
Overview of TASS 2019: One More Further for the Global Spanish Sentiment Analysis Corpus
This paper summarizes the approaches and the results of the submitted systems of the different groups for each task in the TASS workshop, and proposes a new approach to sentiment analysis at tweet level. Expand
Quantifying the Evaluation of Heuristic Methods for Textual Data Augmentation
This work proposes a metric for evaluating augmentation heuristics, and quantifies the extent to which an example is “hard to distinguish” by considering the difference between the distribution of the augmented samples of different classes. Expand
Unsupervised Document Embedding via Contrastive Augmentation
This study reveals the enormous benefits of contrastive augmentation for document representation learning with two additional insights: 1) including data augmentation in a contrastive way can substantially improve the embedding quality in unsupervised document representationLearning, and 2) in general, stochastic augmentations generated by simple word-level manipulation work much better than sentence-level and document-level ones. Expand
Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification
A combination of various operations are regarded as an augmentation policy and an efficient Bayesian Optimization algorithm is utilized to automatically search for the best policy, which substantially improves the generalization capability of models. Expand
Cross-Domain Polarity Models to Evaluate User eXperience in E-learning
This paper investigates how to automatically evaluate User eXperience in this domain using sentiment analysis techniques and applies the state-of-the-art sentiment analysis models, trained with a corpus of a different semantic domain, to study the use of cross-domain models for this task. Expand
Data Augmentation for Text Classification Tasks
Thanks to increases in computing power and the growing availability of large datasets, neural networks have achieved state of the art results in many natural language processing (NLP) and computerExpand
Measuring the Effects of Bias in Training Data for Literary Classification
Downstream effects of biased training data have become a major concern of the NLP community. How this may impact the automated curation and annotation of cultural heritage material is currently notExpand
EXIST2021: Detecting Sexism with Transformers and Translation-Augmented Data
This paper presents an approach to face multilingual problems augmenting the data without the overfitting that an aggressive backtranslation can generate, mainly based in fine-tuned BERT models and Data Augmentation with translation and backtranslation. Expand
Data Augmentation Techniques on Arabic Data for Named Entity Recognition
Information Extraction Classification 5 papers 5 papers 1:20 – 1:40 PM Break Parallel Sessions 4 & 5 04-06-2021 1:40 – 3:20 PM Hall 1 (Session 4) Hall 2 (Session 5) Summarization Natural LanguageExpand


Atalaya at TASS 2018: Sentiment Analysis with Tweet Embeddings and Data Augmentation
This work presents the participation as team Atalaya in the task of polarity classification of tweets, which followed standard techniques in preprocessing, representation and classification, and also explored some novel ideas. Expand
A Simple but Tough-to-Beat Baseline for Sentence Embeddings
Overview of TASS 2015
The TASS 2015 proposed tasks, the contents of the generated corpora, the participant groups and the results and analysis of them are presented. Expand
Enriching Word Vectors with Subword Information
A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks. Expand
Thumbs up? Sentiment Classification using Machine Learning Techniques
This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging. Expand
Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web
A methodology for extracting small investor sentiment from stock message boards is developed, which comprises different classifier algorithms coupled together by a voting scheme that is similar to widely used Bayes classifiers. Expand
Scikit-learn: Machine Learning in Python
Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringingExpand
NLTK: The Natural Language Toolkit
NLTK, the Natural Language Toolkit, is a suite of open source program modules, tutorials and problem sets, providing ready-to-use computational linguistics courseware. NLTK covers symbolic andExpand
Improvements in Part-of-Speech Tagging with an Application to German
This paper presents a meta-modelling system that automates the very labor-intensive and therefore time-heavy and expensive process of manually tagging part-of-speech content in a variety of languages. Expand
Overview of TASS 2018: Opinions, Health and Emotions
This work has been partially supported by a grant from the Fondo Europeo de Desarrollo Regional (FEDER), the projects REDES (TIN2015-65136-C2-1-R, TIN2015-65136-C2-2-R) and SMART-DASCIExpand