• Publications
  • Influence
Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t.
TLDR
This study applies the widely used vector offset method to 4 types of linguistic relations: inflectional and derivational morphology, and lexicographic and encyclopedic semantics, and systematically examines how accuracy for different categories is affected by window size and dimensionality of the SVD-based word embeddings.
Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen
TLDR
It is shown that simple averaging over multiple word pairs improves over the state-of-the-art, and a further improvement in accuracy is achieved by combining cosine similarity with an estimation of the extent to which a candidate answer belongs to the correct word class.
Intrinsic Evaluations of Word Embeddings: What Can We Do Better?
TLDR
It is argued for a shift from abstract ratings of word embedding “quality” to exploration of their strengths and weaknesses to do justice to the strengths of distributional meaning representations.
The (too Many) Problems of Analogical Reasoning with Word Vectors
TLDR
This paper argues against such “linguistic regularities” as a model for linguistic relations in vector space models and as a benchmark, and shows that the vector offset suffers from dependence on vector similarity.
Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings
TLDR
This paper provides a systematical investigation of 4 different syntactic context types and context representations for learning word embeddings and hopes it would be helpful for choosing the best context type and representation for a given task.
Subword-level Composition Functions for Learning Word Embeddings
TLDR
CNN- and RNN-based subword-level composition functions for learning word embeddings are proposed, and they are evaluated on a set of intrinsic and extrinsic tasks, showing that sub word-level models have advantage on tasks related to morphology and datasets with high OOV rate, and can be combined with other types of embeddeddings.
Migrating Legacy Fortran to Python While Retaining Fortran-Level Performance through Transpilation and Type Hints
TLDR
A framework implementing two-way transpilation of Python code by just-in-time compilation leveraging type hints mechanism introduced in Python 3.5 achieved performance equivalent to that of Python manually translated to Fortran, and better than using other currently available JIT alternatives.
Subcharacter Information in Japanese Embeddings: When Is It Worth It?
TLDR
This work examines whether the subcharacter information was recently shown to be beneficial for a number of intrinsic and extrinsic tasks in Chinese, and contributes a new analogy dataset for this language.
Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics
TLDR
A case study of generalization in NLI (from MNLI to the adversarially constructed HANS dataset) in a range of BERT-based architectures as well as with subsampling the data and increasing the model size, providing insights into how Transformer-based models learn to generalize.
Learning Neural Representations for Predicting GPU Performance
TLDR
This work extends the previously proposed collaborating filtering based modeling technique, to build an analytical model which can predict performance of applications across different GPU systems and improves state-of-the-art collaborative filtering approach based on matrix factorization by building a multi-layer perceptron.
...
...