High-risk learning: acquiring new word vectors from tiny data

  title={High-risk learning: acquiring new word vectors from tiny data},
  author={Aur{\'e}lie Herbelot and Marco Baroni},
  booktitle={Conference on Empirical Methods in Natural Language Processing},
Distributional semantics models are known to struggle with small data. It is generally accepted that in order to learn ‘a good vector’ for a word, a model must have sufficient examples of its usage. This contradicts the fact that humans can guess the meaning of a word from a few occurrences only. In this paper, we show that a neural language model such as Word2Vec only necessitates minor modifications to its standard architecture to learn new terms from tiny data, using background knowledge… 

Figures and Tables from this paper

Bad Form: Comparing Context-Based and Form-Based Few-Shot Learning in Distributional Semantic Models

It is shown that hyperparameters that have largely been ignored in previous work can consistently improve the performance of both baseline and advanced models, achieving a new state of the art on 4 out of 6 tasks.

Evaluating the Consistency of Word Embeddings from Small Data

This work addresses the evaluation of distributional semantic models trained on smaller, domain-specific texts, specifically, philosophical text, and proposes a measure of consistency which can be used as an evaluation metric when no in-domain gold-standard data is available.

Memory, Show the Way: Memory Based Few Shot Word Representation Learning

This paper proposes Mem2Vec, a memory based embedding learning method capable of acquiring high quality word representations from fairly limited context that directly adapts the representations produced by a DSM with a longterm memory to guide its guess of a novel word.

One-shot and few-shot learning of word embeddings

This work highlights a simple technique by which deep recurrent networks can similarly exploit their prior knowledge to learn a useful representation for a new word from little data, which could make natural language processing systems much more flexible, by allowing them to learn continually from the new words they encounter.

Learning Semantic Representations for Novel Words: Leveraging Both Form and Context

This paper proposes an architecture that leverages both sources of information - surface-form and context - and shows that it results in large increases in embedding quality, and can be integrated into any existing NLP system and enhance its capability to handle novel words.

Few-Shot Representation Learning for Out-Of-Vocabulary Words

A novel hierarchical attention network-based embedding framework is proposed to serve as the neural regression function, in which the context information of a word is encoded and aggregated from K observations to predict an oracle embedding vector based on limited contexts.

Towards Incremental Learning of Word Embeddings Using Context Informativeness

This paper investigates the task of learning word embeddings from very sparse data in an incremental, cognitively-plausible way and incorporates informativeness in a previously proposed model of nonce learning, using it for context selection and learning rate modulation.

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

A la carte embedding is introduced, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings.

Context and Embeddings in Language Modelling - an Exploration

This work explores different embedding models, data augmentation techniques and context selection strategies (subsampling on the input space) for real world language problems.


Being the first approach of using machine learning techniques to learn similarities among questions for this particular data, the results are satisfactory and the predictions obtained for the testing data and the visualization of the word embeddings in the multidimensional space are reasonable.



Improving Word Representations via Global Context and Multiple Word Prototypes

A new neural network architecture is presented which learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and accounts for homonymy and polysemy by learning multiple embedDings per word.

Better Word Representations with Recursive Neural Networks for Morphology

This paper combines recursive neural networks, where each morpheme is a basic unit, with neural language models to consider contextual information in learning morphologicallyaware word representations and proposes a novel model capable of building representations for morphologically complex words from their morphemes.

A Neural Probabilistic Language Model

This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.

Multimodal Word Meaning Induction From Minimal Exposure to Natural Text.

It is concluded that DSMs provide a convincing computational account of word learning even at the early stages in which a word is first encountered, and the way they build meaning representations can offer new insights into human language acquisition.

Natural Language Processing (Almost) from Scratch

We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity

Multimodal Distributional Semantics

This work proposes a flexible architecture to integrate text- and image-based distributional information, and shows in a set of empirical tests that the integrated model is superior to the purely text-based approach, and it provides somewhat complementary semantic information with respect to the latter.

Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics

This work adapts compositional methods originally developed for phrases to the task of deriving the distributional meaning of morphologically complex words from their parts, and demonstrates the usefulness of a compositional morphology component in distributional semantics.

Vector Space Models of Word Meaning and Phrase Meaning: A Survey

  • K. Erk
  • Computer Science
    Lang. Linguistics Compass
  • 2012
This survey looks at the use of vector space models to describe the meaning of words and phrases: the phenomena thatvector space models address, and the techniques that they use to do so.

Obtaining a Better Understanding of Distributional Models of German Derivational Morphology

A rank-based evaluation metric is introduced, which reveals the task to be challenging due to specific properties of German (compounding, capitalization), and shows that performance varies greatly between patterns and even among base-derived term pairs of the same pattern.

From Frequency to Meaning: Vector Space Models of Semantics

The goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs, and to provide pointers into the literature for those who are less familiar with the field.