• Publications
  • Influence
Neural Word Embedding as Implicit Matrix Factorization
It is shown that using a sparse Shifted Positive PMI word-context matrix to represent words improves results on two word similarity tasks and one of two analogy tasks, and conjecture that this stems from the weighted nature of SGNS's factorization.
Improving Distributional Similarity with Lessons Learned from Word Embeddings
It is revealed that much of the performance gains of word embeddings are due to certain system design choices and hyperparameter optimizations, rather than the embedding algorithms themselves, and these modifications can be transferred to traditional distributional models, yielding similar gains.
Universal Dependencies v1: A Multilingual Treebank Collection
This paper describes v1 of the universal guidelines, the underlying design principles, and the currently available treebanks for 33 languages, as well as highlighting the needs for sound comparative evaluation and cross-lingual learning experiments.
word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method
This note is an attempt to explain equation (4) (negative sampling) in "Distributed Representations of Words and Phrases and their Compositionality" by Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado and Jeffrey Dean.
Dependency-Based Word Embeddings
The skip-gram model with negative sampling introduced by Mikolov et al. is generalized to include arbitrary contexts, and experiments with dependency-based contexts are performed, showing that they produce markedly different embeddings.
Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations
The effectiveness of the BiLSTM approach is demonstrated by applying it to a greedy transition-based parser as well as to a globally optimized graph-basedparser.
Linguistic Regularities in Sparse and Explicit Word Representations
It is demonstrated that analogy recovery is not restricted to neural word embeddings, and that a similar amount of relational similarities can be recovered from traditional distributional word representations.
Improving Hypernymy Detection with an Integrated Path-based and Distributional Method
An improved path-based algorithm is suggested, in which the dependency paths are encoded using a recurrent neural network, that achieves results comparable to distributional methods.
A Primer on Neural Network Models for Natural Language Processing
  • Yoav Goldberg
  • Computer Science
    J. Artif. Intell. Res.
  • 2 October 2015
This tutorial surveys neural network models from the perspective of natural language processing research, in an attempt to bring natural-language researchers up to speed with the neural techniques.
Neural Network Methods for Natural Language Processing
This book focuses on the application of neural network models to natural language data, and introduces more specialized neural network architectures, including 1D convolutional neural networks, recurrent neural Networks, conditioned-generation models, and attention-based models.