• Publications
  • Influence
Improving Vector Space Word Representations Using Multilingual Correlation
TLDR
This paper argues that lexico-semantic content should additionally be invariant across languages and proposes a simple technique based on canonical correlation analysis (CCA) for incorporating multilingual evidence into vectors generated monolingually. Expand
DyNet: The Dynamic Neural Network Toolkit
TLDR
DyNet is a toolkit for implementing neural network models based on dynamic declaration of network structure that has an optimized C++ backend and lightweight graph representation and is designed to allow users to implement their models in a way that is idiomatic in their preferred programming language. Expand
Sparse Overcomplete Word Vector Representations
TLDR
This work proposes methods that transform word vectors into sparse (and optionally binary) vectors, which are more similar to the interpretable features typically used in NLP, though they are discovered automatically from raw corpora. Expand
Cross-lingual Models of Word Embeddings: An Empirical Comparison
TLDR
It is shown that models which require expensive cross-lingual knowledge almost always perform better, but cheaply supervised models often prove competitive on certain tasks. Expand
Problems With Evaluation of Word Embeddings Using Word Similarity Tasks
TLDR
It is suggested that the use of word similarity tasks for evaluation of word vectors is not sustainable and calls for further research on evaluation methods. Expand
Morphological Inflection Generation Using Character Sequence to Sequence Learning
TLDR
This work model the problem of inflection generation as a character sequence to sequence learning problem and presents a variant of the neural encoder-decoder model for solving it, which is language independent and can be trained in both supervised and semi-supervised settings. Expand
Learning To Split and Rephrase From Wikipedia Edit History
TLDR
It is shown that incorporating WikiSplit as training data produces a model with qualitatively better predictions that score 32 BLEU points above the prior best result on the WebSplit benchmark. Expand
Evaluation of Word Vector Representations by Subspace Alignment
TLDR
QVEC is presented—a computationally inexpensive intrinsic evaluation measure of the quality of word embeddings based on alignment to a matrix of features extracted from manually crafted lexical resources—that obtains strong correlation with performance of the vectors in a battery of downstream semantic evaluation tasks. Expand
Training and Evaluating a German Named Entity Recognizer with Semantic Generalization
TLDR
This work alleviates the small size of available NER training corpora for German with distributional generalization features trained on large unlabelled corpora with a freely available optimized Named Entity Recognizer for German. Expand
Community Evaluation and Exchange of Word Vectors at wordvectors.org
TLDR
This work presents a website and suite of offline tools that facilitate evaluation of word vectors on standard lexical semantics benchmarks and permit exchange and archival by users who wish to find good vectors for their applications. Expand
...
1
2
3
4
5
...