• Corpus ID: 199448337

Atalaya at TASS 2019: Data Augmentation and Robust Embeddings for Sentiment Analysis

@inproceedings{Luque2019AtalayaAT,
  title={Atalaya at TASS 2019: Data Augmentation and Robust Embeddings for Sentiment Analysis},
  author={Franco Mart{\'i}n Luque},
  booktitle={IberLEF@SEPLN},
  year={2019}
}
  • F. Luque
  • Published in IberLEF@SEPLN 25 September 2019
  • Computer Science
In this article we describe our participation in TASS 2019, a shared task aimed at the detection of sentiment polarity of Spanish tweets. We combined different representations such as bag-of-words, bag-of-characters, and tweet embeddings. In particular, we trained robust subword-aware word embeddings and computed tweet representations using a weighted-averaging strategy. We also used two data augmentation techniques to deal with data scarcity: two-way translation augmentation, and instance… 

Figures and Tables from this paper

Emotion Detection for Spanish with Data Augmentation and Transformer-Based Models
TLDR
The participation of Yeti team in IberLEF EmoEvalEs task, which is based on the Spanish Semantic Analysis in TASS 2020 version, and proposes as separate task for 2021 in IerLEF is described.
Overview of TASS 2019: One More Further for the Global Spanish Sentiment Analysis Corpus
TLDR
This paper summarizes the approaches and the results of the submitted systems of the different groups for each task in the TASS workshop, and proposes a new approach to sentiment analysis at tweet level.
Quantifying the Evaluation of Heuristic Methods for Textual Data Augmentation
TLDR
This work proposes a metric for evaluating augmentation heuristics, and quantifies the extent to which an example is “hard to distinguish” by considering the difference between the distribution of the augmented samples of different classes.
Unsupervised Document Embedding via Contrastive Augmentation
TLDR
This study reveals the enormous benefits of contrastive augmentation for document representation learning with two additional insights: 1) including data augmentation in a contrastive way can substantially improve the embedding quality in unsupervised document representationLearning, and 2) in general, stochastic augmentations generated by simple word-level manipulation work much better than sentence-level and document-level ones.
Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification
TLDR
A combination of various operations are regarded as an augmentation policy and an efficient Bayesian Optimization algorithm is utilized to automatically search for the best policy, which substantially improves the generalization capability of models.
Data Augmentation Approaches in Natural Language Processing: A Survey
Cross-Domain Polarity Models to Evaluate User eXperience in E-learning
TLDR
This paper investigates how to automatically evaluate User eXperience in this domain using sentiment analysis techniques and applies the state-of-the-art sentiment analysis models, trained with a corpus of a different semantic domain, to study the use of cross-domain models for this task.
Reducing and Exploiting Data Augmentation Noise through Meta Reweighting Contrastive Learning for Text Classification
TLDR
This work proposes a novel framework, which leverages both meta learning and contrastive learning techniques as parts of its design for reweighting the augmented samples and refining their feature representations based on their quality, and proposes novel weight-dependent enqueue and dequeue algorithms to utilize augmented samples' weight/quality information effectively.
Data Augmentation for Text Classification Tasks
TLDR
The results show that data augmentation is a powerful method of improving performance when training on datasets with fewer than 10,000 training examples, and the accuracy increases that they offer are reduced by recent advancements in transfer learning schemes, but they are certainly not eliminated.
Measuring the Effects of Bias in Training Data for Literary Classification
Downstream effects of biased training data have become a major concern of the NLP community. How this may impact the automated curation and annotation of cultural heritage material is currently not
...
...

References

SHOWING 1-10 OF 11 REFERENCES
Atalaya at TASS 2018: Sentiment Analysis with Tweet Embeddings and Data Augmentation
TLDR
This work presents the participation as team Atalaya in the task of polarity classification of tweets, which followed standard techniques in preprocessing, representation and classification, and also explored some novel ideas.
A Simple but Tough-to-Beat Baseline for Sentence Embeddings
Overview of TASS 2015
TLDR
The TASS 2015 proposed tasks, the contents of the generated corpora, the participant groups and the results and analysis of them are presented.
Enriching Word Vectors with Subword Information
TLDR
A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.
Thumbs up? Sentiment Classification using Machine Learning Techniques
TLDR
This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.
Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web
TLDR
A methodology for extracting small investor sentiment from stock message boards is developed, which comprises different classifier algorithms coupled together by a voting scheme that is similar to widely used Bayes classifiers.
Scikit-learn: Machine Learning in Python
Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing
NLTK: The Natural Language Toolkit
NLTK, the Natural Language Toolkit, is a suite of open source program modules, tutorials and problem sets, providing ready-to-use computational linguistics courseware. NLTK covers symbolic and
Improvements in Part-of-Speech Tagging with an Application to German
TLDR
This paper presents a meta-modelling system that automates the very labor-intensive and therefore time-heavy and expensive process of manually tagging part-of-speech content in a variety of languages.
Overview of TASS 2018: Opinions, Health and Emotions
This work has been partially supported by a grant from the Fondo Europeo de Desarrollo Regional (FEDER), the projects REDES (TIN2015-65136-C2-1-R, TIN2015-65136-C2-2-R) and SMART-DASCI
...
...