Corpus ID: 214667372

Heavy-tailed Representations, Text Polarity Classification & Data Augmentation

@article{Jalalzai2020HeavytailedRT,
  title={Heavy-tailed Representations, Text Polarity Classification \& Data Augmentation},
  author={Hamid Jalalzai and Pierre Colombo and C. Clavel and {\'E}ric Gaussier and Giovanna Varni and Emmanuel Vignon and A. Sabourin},
  journal={ArXiv},
  year={2020},
  volume={abs/2003.11593}
}
The dominant approaches to text representation in natural language rely on learning embeddings on massive corpora which have convenient properties such as compositionality and distance preservation. In this paper, we develop a novel method to learn a heavy-tailed embedding with desirable regularity properties regarding the distributional tails, which allows to analyze the points far away from the distribution bulk using the framework of multivariate extreme value theory. In particular, a… Expand
Hierarchical Pre-training for Sequence Labelling in Spoken Dialog
Informative Clusters for Multivariate Extremes

References

SHOWING 1-10 OF 68 REFERENCES
Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations
Deep contextualized word representations
Mining Quality Phrases from Massive Text Corpora
Learning to Compose Domain-Specific Transformations for Data Augmentation
Sequence to Sequence Learning with Neural Networks
...
1
2
3
4
5
...