• Corpus ID: 52299092

Atalaya at TASS 2018: Sentiment Analysis with Tweet Embeddings and Data Augmentation

@inproceedings{Luque2018AtalayaAT,
  title={Atalaya at TASS 2018: Sentiment Analysis with Tweet Embeddings and Data Augmentation},
  author={Franco M. Luque and Juan Manuel P{\'e}rez},
  booktitle={TASS@SEPLN},
  year={2018}
}
TASS 2018 workshop proposes different challenges on semantic analysis in Spanish. This work presents our participation as team Atalaya in the task of polarity classification of tweets. We followed standard techniques in preprocessing, representation and classification, and also explored some novel ideas. In particular, to obtain tweet embeddings we trained subword-aware word embeddings and use a weighted scheme to average them. To deal with overfitting problems caused by training data scarcity… 

Figures and Tables from this paper

Atalaya at TASS 2019: Data Augmentation and Robust Embeddings for Sentiment Analysis
  • F. Luque
  • Computer Science
    IberLEF@SEPLN
  • 2019
TLDR
This article describes its participation in TASS 2019, a shared task aimed at the detection of sentiment polarity of Spanish tweets, and trained robust subword-aware word embeddings and computed tweet representations using a weighted-averaging strategy.
Atalaya at SemEval 2019 Task 5: Robust Embeddings for Tweet Classification
TLDR
This article describes its participation in HatEval, a shared task aimed at the detection of hate speech against immigrants and women, and trained robust task-oriented subword-aware embeddings and computed tweet representations using a weighted-averaging strategy.
TASS 2018: The Strength of Deep Learning in Language Understanding Tasks
TLDR
Two new tasks focused on semantic relation extraction in the health domain and emotion classification in the news domain were added to the two traditional tasks of TASS, namely sentiment analysis at tweet level and aspect level.
Overview of TASS 2019: One More Further for the Global Spanish Sentiment Analysis Corpus
TLDR
This paper summarizes the approaches and the results of the submitted systems of the different groups for each task in the TASS workshop, and proposes a new approach to sentiment analysis at tweet level.
Neural Language Model Based Training Data Augmentation for Weakly Supervised Early Rumor Detection
  • Sooji Han, Jie Gao, F. Ciravegna
  • Computer Science
    2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)
  • 2019
TLDR
Preliminary experiments with a state-of-the-art deep learning-based rumor detection model show that augmented data can alleviate over-fitting and class imbalance caused by limited train data and can help to train complex neural networks (NNs).
Deep Learning Hyper-parameter Tuning for Sentiment Analysis in Twitter based on Evolutionary Algorithms
TLDR
This work proposes the use of the evolutionary algorithm SHADE for the optimisation of the configuration of a deep learning model for the task of sentiment analysis in Twitter, and shows that the hyper-parameters found by the evolutionary algorithms enhance the performance of the deep learning method.
Learning Data Augmentation Schedules for Natural Language Processing
TLDR
This paper investigates whether data-driven augmentation scheduling and the integration of a wider set of transformations can lead to improved performance where fixed and limited policies were unsuccessful and suggests that, while this approach can help the training process in some settings, the improvements are unsubstantial.
Overview of TASS 2018: Opinions, Health and Emotions
This work has been partially supported by a grant from the Fondo Europeo de Desarrollo Regional (FEDER), the projects REDES (TIN2015-65136-C2-1-R, TIN2015-65136-C2-2-R) and SMART-DASCI

References

SHOWING 1-10 OF 16 REFERENCES
A Simple but Tough-to-Beat Baseline for Sentence Embeddings
Thumbs up? Sentiment Classification using Machine Learning Techniques
TLDR
This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.
Enriching Word Vectors with Subword Information
TLDR
A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.
Character-level Convolutional Networks for Text Classification
TLDR
This article constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results in text classification.
The democratization of deep learning in TASS 2017
This research work is partially supported by REDES project (TIN2015-65136-C2-1-R) and SMART project (TIN2017-89517-P) from the Spanish Government, and a grant from the Fondo Europeo de Desarrollo
Distributed Representations of Words and Phrases and their Compositionality
TLDR
This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Vocal Tract Length Perturbation (VTLP) improves speech recognition
TLDR
Improvements in speech recognition are suggested without increasing the number of training epochs, and it is suggested that data transformations should be an important component of training neural networks for speech, especially for data limited projects.
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Audio augmentation for speech recognition
TLDR
This paper investigates audio-level speech augmentation methods which directly process the raw signal, and presents results on 4 different LVCSR tasks with training data ranging from 100 hours to 1000 hours, to examine the effectiveness of audio augmentation in a variety of data scenarios.
Improvements in Part-of-Speech Tagging with an Application to German
TLDR
This paper presents a meta-modelling system that automates the very labor-intensive and therefore time-heavy and expensive process of manually tagging part-of-speech content in a variety of languages.
...
1
2
...