Atalaya at TASS 2019: Data Augmentation and Robust Embeddings for Sentiment Analysis
@inproceedings{Luque2019AtalayaAT, title={Atalaya at TASS 2019: Data Augmentation and Robust Embeddings for Sentiment Analysis}, author={Franco Mart{\'i}n Luque}, booktitle={IberLEF@SEPLN}, year={2019} }
In this article we describe our participation in TASS 2019, a shared task aimed at the detection of sentiment polarity of Spanish tweets. We combined different representations such as bag-of-words, bag-of-characters, and tweet embeddings. In particular, we trained robust subword-aware word embeddings and computed tweet representations using a weighted-averaging strategy. We also used two data augmentation techniques to deal with data scarcity: two-way translation augmentation, and instance…
14 Citations
Emotion Detection for Spanish with Data Augmentation and Transformer-Based Models
- Computer ScienceIberLEF@SEPLN
- 2021
The participation of Yeti team in IberLEF EmoEvalEs task, which is based on the Spanish Semantic Analysis in TASS 2020 version, and proposes as separate task for 2021 in IerLEF is described.
Overview of TASS 2019: One More Further for the Global Spanish Sentiment Analysis Corpus
- Computer ScienceIberLEF@SEPLN
- 2019
This paper summarizes the approaches and the results of the submitted systems of the different groups for each task in the TASS workshop, and proposes a new approach to sentiment analysis at tweet level.
Quantifying the Evaluation of Heuristic Methods for Textual Data Augmentation
- Computer ScienceWNUT
- 2020
This work proposes a metric for evaluating augmentation heuristics, and quantifies the extent to which an example is “hard to distinguish” by considering the difference between the distribution of the augmented samples of different classes.
Unsupervised Document Embedding via Contrastive Augmentation
- Computer ScienceArXiv
- 2021
This study reveals the enormous benefits of contrastive augmentation for document representation learning with two additional insights: 1) including data augmentation in a contrastive way can substantially improve the embedding quality in unsupervised document representationLearning, and 2) in general, stochastic augmentations generated by simple word-level manipulation work much better than sentence-level and document-level ones.
Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification
- Computer ScienceEMNLP
- 2021
A combination of various operations are regarded as an augmentation policy and an efficient Bayesian Optimization algorithm is utilized to automatically search for the best policy, which substantially improves the generalization capability of models.
Data Augmentation Approaches in Natural Language Processing: A Survey
- Computer ScienceAI Open
- 2022
Cross-Domain Polarity Models to Evaluate User eXperience in E-learning
- Computer ScienceNeural Processing Letters
- 2020
This paper investigates how to automatically evaluate User eXperience in this domain using sentiment analysis techniques and applies the state-of-the-art sentiment analysis models, trained with a corpus of a different semantic domain, to study the use of cross-domain models for this task.
Reducing and Exploiting Data Augmentation Noise through Meta Reweighting Contrastive Learning for Text Classification
- Computer Science2021 IEEE International Conference on Big Data (Big Data)
- 2021
This work proposes a novel framework, which leverages both meta learning and contrastive learning techniques as parts of its design for reweighting the augmented samples and refining their feature representations based on their quality, and proposes novel weight-dependent enqueue and dequeue algorithms to utilize augmented samples' weight/quality information effectively.
Data Augmentation for Text Classification Tasks
- Computer Science
- 2020
The results show that data augmentation is a powerful method of improving performance when training on datasets with fewer than 10,000 training examples, and the accuracy increases that they offer are reduced by recent advancements in transfer learning schemes, but they are certainly not eliminated.
Measuring the Effects of Bias in Training Data for Literary Classification
- SociologyLATECHCLFL
- 2020
Downstream effects of biased training data have become a major concern of the NLP community. How this may impact the automated curation and annotation of cultural heritage material is currently not…
References
SHOWING 1-10 OF 11 REFERENCES
Atalaya at TASS 2018: Sentiment Analysis with Tweet Embeddings and Data Augmentation
- Computer ScienceTASS@SEPLN
- 2018
This work presents the participation as team Atalaya in the task of polarity classification of tweets, which followed standard techniques in preprocessing, representation and classification, and also explored some novel ideas.
Overview of TASS 2015
- Computer ScienceTASS@SEPLN
- 2015
The TASS 2015 proposed tasks, the contents of the generated corpora, the participant groups and the results and analysis of them are presented.
Enriching Word Vectors with Subword Information
- Computer ScienceTransactions of the Association for Computational Linguistics
- 2017
A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.
Thumbs up? Sentiment Classification using Machine Learning Techniques
- Computer ScienceEMNLP
- 2002
This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.
Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web
- Computer ScienceManag. Sci.
- 2007
A methodology for extracting small investor sentiment from stock message boards is developed, which comprises different classifier algorithms coupled together by a voting scheme that is similar to widely used Bayes classifiers.
Scikit-learn: Machine Learning in Python
- Computer ScienceJ. Mach. Learn. Res.
- 2011
Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing…
NLTK: The Natural Language Toolkit
- Computer ScienceACL
- 2004
NLTK, the Natural Language Toolkit, is a suite of open source program modules, tutorials and problem sets, providing ready-to-use computational linguistics courseware. NLTK covers symbolic and…
Improvements in Part-of-Speech Tagging with an Application to German
- Education
- 1999
This paper presents a meta-modelling system that automates the very labor-intensive and therefore time-heavy and expensive process of manually tagging part-of-speech content in a variety of languages.
Overview of TASS 2018: Opinions, Health and Emotions
- Political ScienceTASS@SEPLN
- 2018
This work has been partially supported by a grant from the Fondo Europeo de Desarrollo Regional (FEDER), the projects REDES (TIN2015-65136-C2-1-R, TIN2015-65136-C2-2-R) and SMART-DASCI…