Combining Distributed Vector Representations for Words

  title={Combining Distributed Vector Representations for Words},
  author={Justin Garten and Kenji Sagae and Volkan Ustun and Morteza Dehghani},
Recent interest in distributed vector representations for words has resulted in an increased diversity of approaches, each with strengths and weaknesses. We demonstrate how diverse vector representations may be inexpensively composed into hybrid representations, effectively leveraging strengths of individual components, as evidenced by substantial improvements on a standard word analogy task. We further compare these results over different sizes of training sets and find these advantages are… 

Tables from this paper

Estimating Distributed Representations of Compound Words Using Recurrent Neural Networks

The experimental results show that the RNN-based approach can estimate the distributed representations of compound words better than the average representation approach, which simply uses the average of individual word representations as an estimated representation of a compound word.

Asynchronous Training of Word Embeddings for Large Text Corpora

This paper proposes a scalable approach to train word embeddings by partitioning the input space instead in order to scale to massive text corpora while not sacrificing the performance of theembeddings.

Meta-Embedding Sentence Representation for Textual Similarity

By contrasting in-domain and pre-trained embedding models, it is shown under which conditions they can be jointly used for bottom-up sentence embeddings and the first bottom- up meta-embedding representation at the sentence level for textual similarity is proposed.

dish2vec: A Comparison of Word Embedding Methods in an Unsupervised Setting

This paper elaborates on three popular word embedding methods; GloVe and two versions of word2vec: continuous skip-gram and continuous bag-of-words and addresses instability of the methods with respect to the hyperparameters.

Bayesian Neural Word Embedding

Experimental results demonstrate the performance of the proposed scalable Bayesian neural word embedding algorithm for word analogy and similarity tasks on six different datasets and show it is competitive with the original Skip-Gram method.

Leveraging Meta-Embeddings for Bilingual Lexicon Extraction from Specialized Comparable Corpora

This work proposes the first systematic evaluation of different word embedding models for bilingual terminology extraction from specialized comparable corpora and emphasizes how the character-based embedding model outperforms other models on the quality of the extracted bilingual lexicons.

Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen

It is shown that simple averaging over multiple word pairs improves over the state-of-the-art, and a further improvement in accuracy is achieved by combining cosine similarity with an estimation of the extent to which a candidate answer belongs to the correct word class.

Bilingual Word Embeddings for Bilingual Terminology Extraction from Specialized Comparable Corpora

Different word embedding models are explored and it is shown how a general-domain comparable corpus can enrich a specialized comparable corpus via neural networks.

Understanding Feature Focus in Multitask Settings for Lexico-semantic Relation Identification

Evaluation results over a set of gold-standard datasets show that combinations of similar features are beneficial (feature sets) and asymmetric distributional features are a strong cue to discriminate asymmetric relations as well as they play an important role in multitask architectures.



Efficient Estimation of Word Representations in Vector Space

Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

Distributed Vector Representations of Words in the Sigma Cognitive Architecture

A new algorithm for learning distributed-vector word representations from large, shallow information resources, and how this algorithm can be implemented via small modifications to Sigma is described.

Dependency-Based Word Embeddings

The skip-gram model with negative sampling introduced by Mikolov et al. is generalized to include arbitrary contexts, and experiments with dependency-based contexts are performed, showing that they produce markedly different embeddings.

Linguistic Regularities in Continuous Space Word Representations

The vector-space word representations that are implicitly learned by the input-layer weights are found to be surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset.

Representation Learning: A Review and New Perspectives

Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.

Low-Rank Tensors for Scoring Dependency Structures

This paper uses tensors to map high-dimensional feature vectors into low dimensional representations of words in their syntactic roles, and to leverage modularity in the tensor for easy training with online algorithms.

Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors

An extensive evaluation of context-predicting models with classic, count-vector-based distributional semantic approaches, on a wide range of lexical semantics tasks and across many parameter settings shows that the buzz around these models is fully justified.

Indexing by Latent Semantic Analysis

A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.

Representing word meaning and order information in a composite holographic lexicon.

A computational model that builds a holographic lexicon representing both word meaning and word order from unsupervised experience with natural language demonstrates that a broad range of psychological data can be accounted for directly from the structure of lexical representations learned in this way, without the need for complexity to be built into either the processing mechanisms or the representations.

Parallel Distributed Processing

There have been great breakthroughs in the understanding of cognition as a result of the development of expressive highlevel computer languages and powerful algorithms.