• Publications
  • Influence
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks. Expand
SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation
SimLex-999 is presented, a gold standard resource for evaluating distributional semantic models that improves on existing resources in several important ways, and explicitly quantifies similarity rather than association or relatedness so that pairs of entities that are associated but not actually similar have a low rating. Expand
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
A new benchmark styled after GLUE is presented, a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard are presented. Expand
The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations
There is a sweet-spot, not too big and not too small, between single words and full sentences that allows the most meaningful information in a text to be effectively retained and recalled, and models which store explicit representations of long-term contexts outperform state-of-the-art neural language models at predicting semantic content words. Expand
Learning Distributed Representations of Sentences from Unlabelled Data
A systematic comparison of models that learn distributed phrase or sentence representations from unlabelled data finds that the optimal approach depends critically on the intended application. Expand
Measuring abstract reasoning in neural networks
A dataset and challenge designed to probe abstract reasoning, inspired by a well-known human IQ test, is proposed and ways to both measure and induce stronger abstract reasoning in neural networks are introduced. Expand
SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity
SimVerb-3500, an evaluation resource that provides human ratings for the similarity of 3,500 verb pairs, is introduced, hoping that it will enable a richer understanding of the diversity and complexity of verb semantics and guide the development of systems that can effectively represent and interpret this meaning. Expand
Grounded Language Learning in a Simulated 3D World
An agent is presented that learns to interpret language in a simulated 3D environment where it is rewarded for the successful execution of written instructions and its comprehension of language extends beyond its prior experience, enabling it to apply familiar language to unfamiliar situations and to interpret entirely novel instructions. Expand
Learning to Understand Phrases by Embedding the Dictionary
This work proposes using the definitions found in everyday dictionaries as a means of bridging the gap between lexical and phrasal semantics, and presents two applications of these architectures: reverse dictionaries that return the name of a concept given a definition or description and general-knowledge crossword question answerers. Expand
HyperLex: A Large-Scale Evaluation of Graded Lexical Entailment
We introduce HyperLex—a data set and evaluation resource that quantifies the extent of the semantic category membership, that is, type-of relation, also known as hyponymy–hypernymy or lexicalExpand