Enriching Word Vectors with Subword Information
@article{Bojanowski2017EnrichingWV, title={Enriching Word Vectors with Subword Information}, author={P. Bojanowski and E. Grave and Armand Joulin and Tomas Mikolov}, journal={Transactions of the Association for Computational Linguistics}, year={2017}, volume={5}, pages={135-146} }
Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. [...] Key Method A vector representation is associated to each character n-gram; words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows us to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word…Expand Abstract
Supplemental Code
Github Repo
Via Papers with Code
Library for fast text representation and classification.
Topics from this paper
Paper Mentions
News Article
4,320 Citations
Learning to Generate Word Representations using Subword Information
- Computer Science
- COLING
- 2018
- 13
- Highly Influenced
Morphological Skip-Gram: Using morphological knowledge to improve word representation
- Computer Science
- ArXiv
- 2020
- PDF
An Adaptive Wordpiece Language Model for Learning Chinese Word Embeddings
- Computer Science
- 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE)
- 2019
- 1
Measuring Enrichment Of Word Embeddings With Subword And Dictionary Information
- Computer Science
- 2019
- Highly Influenced
Named Entity Recognition in Russian with Word Representation Learned by a Bidirectional Language Model
- Computer Science
- AINL 2018
- 2018
- 2
Probabilistic FastText for Multi-Sense Word Embeddings
- Computer Science, Mathematics
- ACL
- 2018
- 42
- Highly Influenced
- PDF
References
SHOWING 1-10 OF 56 REFERENCES
Better Word Representations with Recursive Neural Networks for Morphology
- Computer Science
- CoNLL
- 2013
- 703
- Highly Influential
- PDF
KNET: A General Framework for Learning Word Embedding Using Morphological Knowledge
- Computer Science
- TOIS
- 2015
- 13
- PDF
Distributed Representations of Words and Phrases and their Compositionality
- Computer Science, Mathematics
- NIPS
- 2013
- 20,426
- PDF
Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation
- Computer Science
- EMNLP
- 2015
- 514
- PDF
Word Embeddings Go to Italy: A Comparison of Models and Training Datasets
- Computer Science
- IIR
- 2015
- 37
- PDF
Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models
- Computer Science
- ACL
- 2016
- 304
- Highly Influential
- PDF