Numeracy enhances the Literacy of Language Models

  title={Numeracy enhances the Literacy of Language Models},
  author={Avijit Thawani and Jay Pujara and Filip Ilievski},
Specialized number representations in NLP have shown improvements on numerical reasoning tasks like arithmetic word problems and masked number prediction. But humans also use numeracy to make better sense of world concepts, e.g., you can seat 5 people in your ‘room’ but not 500. Does a better grasp of numbers improve a model’s understanding of other concepts and words? This paper studies the effect of using six different number encoders on the task of masked word prediction (MWP), as a proxy… 

Figures and Tables from this paper

Masked Measurement Prediction: Learning to Jointly Predict Quantities and Units from Textual Context
A novel task, Masked Measurement Prediction ( MMP), where a model learns to reconstruct a number together with its associated unit given masked text, is introduced, useful for both training new numerically informed models as well as evaluating numeracy of existing systems.
FORTAP: Using Formulas for Numerical-Reasoning-Aware Table Pretraining
This paper proposes FORTAP, the first exploration to leverage spreadsheet formulas for table pretraining with tree attention, and proposes two novel self-supervised pretraining objectives derived from formulas, numerical reference prediction (NRP) and numerical calculation prediction (NCP).
Unsupervised Numerical Reasoning to Extract Phenotypes from Clinical Text by Leveraging External Knowledge
A novel unsupervised methodology leveraging external knowledge and contextualized word embeddings from ClinicalBERT for numerical reasoning in a variety of phenotypic contexts is presented.


Do NLP Models Know Numbers? Probing Numeracy in Embeddings
This work investigates the numerical reasoning capabilities of a state-of-the-art question answering model on the DROP dataset and finds this model excels on questions that require numerical reasoning, i.e., it already captures numeracy.
Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers
This paper explores different strategies for modelling numerals with language models, such as memorisation and digit-by-digit composition, and proposes a novel neural architecture that uses a continuous probability density function to model numerals from an open vocabulary.
Probing for Multilingual Numerical Understanding in Transformer-Based Language Models
Novel multilingual probing tasks tested on DistilBERT, XLM, and BERT find evidence that the information encoded in these pretrained models’ embeddings is sufficient for grammaticality judgments but generally not for value comparisons.
Injecting Numerical Reasoning Skills into Language Models
This work shows that numerical reasoning is amenable to automatic data generation, and thus one can inject this skill into pre-trained LMs, by generating large amounts of data, and training in a multi-task setup.
Learning Numeral Embedding
Two novel numeral embedding methods that can handle the out-of-vocabulary (OOV) problem for numerals are proposed and shown its effectiveness on four intrinsic and extrinsic tasks: word similarity, embedding numeracy, numeral prediction, and sequence labeling.
Representing Numbers in NLP: a Survey and a Vision
This work synthesizes best practices for representing numbers in text and articulate a vision for holistic numeracy in NLP, comprised of design trade-offs and a unified evaluation.
An Empirical Investigation of Contextualized Number Prediction
A suite of output distribution parameterizations are introduced that incorporate latent variables to add expressivity and better fit the natural distribution of numeric values in running text, and combine them with both recur-rent and transformer-based encoder architectures.
Do Language Embeddings capture Scales?
This work identifies contextual information in pre-training and numeracy as two key factors affecting their performance, and shows that a simple method of canonicalizing numbers can have a significant effect on the results.
Numeracy-600K: Learning Numeracy for Detecting Exaggerated Information in Market Comments
This paper attempts to answer the question of whether neural network models can learn numeracy, which is the ability to predict the magnitude of a numeral at some specific position in a text description, through comprehensive experiments.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.