• Publications
  • Influence
Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen
TLDR
We show that the information not detected by linear offset may still be recoverable by a more sophisticated search method, and thus is actually encoded in the embedding. Expand
  • 76
  • 15
  • PDF
Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn't
TLDR
We present a balanced test set with 99,200 questions in 40 categories, and we systematically examine how accuracy for different categories is affected by window size and dimensionality of the SVD-based word embeddings. Expand
  • 114
  • 14
  • PDF
Intrinsic Evaluations of Word Embeddings: What Can We Do Better?
TLDR
This paper presents an analysis of existing methods for the intrinsic evaluation of word embeddings. Expand
  • 67
  • 3
  • PDF
Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings
TLDR
We provide a systematical investigation of 4 different syntactic context types and context representations for learning word embeddings. Expand
  • 28
  • 2
  • PDF
The (too Many) Problems of Analogical Reasoning with Word Vectors
TLDR
We argue against such “linguistic regularities” as a model for linguistic relations in vector space models and as a benchmark, and we show that the vector offset (as well as two other, better-performing methods) suffers from dependence on vector similarity. Expand
  • 41
  • 1
  • PDF
Subword-level Composition Functions for Learning Word Embeddings
TLDR
We propose CNN- and RNN-based composition functions for learning word embeddings, and systematically compare them with popular word-level and subword-level models (Skip-Gram and FastText). Expand
  • 15
  • 1
  • PDF
GPU-Accelerated Large-Scale Distributed Sorting Coping with Device Memory Capacity
TLDR
We analyze the performance of several state-of-the art distributed sorting algorithms and perform a case study of HykSort, an existing splitter-based algorithm by offloading costly computation phases to GPUs. Expand
  • 8
  • 1
Scaling Word2Vec on Big Corpus
TLDR
Word embedding has been well accepted as an important feature in the area of natural language processing (NLP). Expand
  • 11
Large-scale distributed sorting for GPU-based heterogeneous supercomputers
TLDR
We investigate applicability of using GPU devices to the splitter-based algorithms and extend HykSort, an existing splitter based algorithm by offloading costly computation phases to GPUs. Expand
  • 9
...
1
2
3
4
5
...