Representing Numbers in NLP: a Survey and a Vision

@article{Thawani2021RepresentingNI,
  title={Representing Numbers in NLP: a Survey and a Vision},
  author={Avijit Thawani and Jay Pujara and Pedro A. Szekely and Filip Ilievski},
  journal={ArXiv},
  year={2021},
  volume={abs/2103.13136}
}
NLP systems rarely give special consideration to numbers found in text. This starkly contrasts with the consensus in neuroscience that, in the brain, numbers are represented differently from words. We arrange recent NLP work on numeracy into a comprehensive taxonomy of tasks and methods. We break down the subjective notion of numeracy into 7 subtasks, arranged along two dimensions: granularity (exact vs approximate) and units (abstract vs grounded). We analyze the myriad representational… 

Tables from this paper

Numeracy enhances the Literacy of Language Models
TLDR
A significant improvement in MWP for sentences containing numbers is found, that exponent embeddings are the best number encoders, yielding over 2 points jump in prediction accuracy over a BERT baseline, and that these enhanced literacy skills also generalize to contexts without annotated numbers.
Investigating Numeracy Learning Ability of a Text-to-Text Transfer Model
TLDR
This work investigates the ability of text-to-text transfer learning model (T5), which has outperformed its predecessors in the conventional NLP tasks, to learn numeracy, and considers four numeracy tasks.
Have You Seen That Number? Investigating Extrapolation in Question Answering Models
TLDR
This work rigorously tests state-of-the-art models on DROP, a numerical MRC dataset, and proposes the E-digit number form that alleviates the lack of extrapolation in models and reveals the need to treat numbers differently from regular words in the text.
Numerical reasoning in machine reading comprehension tasks: are we there yet?
TLDR
This paper presents a controlled study on some of the top-performing model architectures for the task of numerical reasoning and suggests that the standard metrics are incapable of measuring progress towards such tasks.
NumGPT: Improving Numeracy Ability of Generative Pre-trained Models
TLDR
The experiment results show that NumGPT outperforms baseline models (e.g., GPT and GPT with DICE) on a range of numerical reasoning tasks such as measurement estimation, number comparison, math word problems, and magnitude classification.
Innovations in Neural Data-to-text Generation
TLDR
This survey draws boundaries separating DTG from the rest of the natural language generation (NLG) landscape, encompassing an up-to-date synthesis of the literature, and highlighting the stages of technological adoption from within and outside the greater NLG umbrella.
Learning Numeracy: A Simple Yet Effective Number Embedding Approach Using Knowledge Graph
TLDR
This work proposes a simple number embedding approach based on knowledge graph that is easy to implement, and experiment results on vari-ous numeracy-related NLP tasks demonstrate the effectiveness andency of the method.
Masked Measurement Prediction: Learning to Jointly Predict Quantities and Units from Textual Context
TLDR
A novel task, Masked Measurement Prediction ( MMP), where a model learns to reconstruct a number together with its associated unit given masked text, is introduced, useful for both training new numerically informed models as well as evaluating numeracy of existing systems.
MWP-BERT: Numeracy-Augmented Pre-training for Math Word Problem Solving
TLDR
This work argues that injecting numerical properties into symbolic placeholders with contextu-alized representation learning schema can provide a way out of the dilemma in the number representation issue here and builds MWP-BERT, an effective contextual number representation PLM.
Survey of Hallucination in Natural Language Generation
TLDR
This survey serves tofacilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG by providing a broad overview of the research progress and challenges in the hallucination problem inNLG.
...
...

References

SHOWING 1-10 OF 77 REFERENCES
Exploring Numeracy in Word Embeddings
TLDR
Inspired by cognitive studies on how humans perceive numbers, an analysis framework is developed to test how well word embeddings capture two essential properties of numbers: magnitude and numeration.
Do NLP Models Know Numbers? Probing Numeracy in Embeddings
TLDR
This work investigates the numerical reasoning capabilities of a state-of-the-art question answering model on the DROP dataset and finds this model excels on questions that require numerical reasoning, i.e., it already captures numeracy.
Probing for Multilingual Numerical Understanding in Transformer-Based Language Models
TLDR
Novel multilingual probing tasks tested on DistilBERT, XLM, and BERT find evidence that the information encoded in these pretrained models’ embeddings is sufficient for grammaticality judgments but generally not for value comparisons.
Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers
TLDR
This paper explores different strategies for modelling numerals with language models, such as memorisation and digit-by-digit composition, and proposes a novel neural architecture that uses a continuous probability density function to model numerals from an open vocabulary.
Visual sense of number vs. sense of magnitude in humans and machines
TLDR
Deep neural networks are tested on the same numerosity comparison task that was administered to human participants, using a stimulus space that allows the precise measurement of the contribution of non-numerical features, suggesting that numerosity is a major, salient property of the authors' visual environment.
NumNet: Machine Reading Comprehension with Numerical Reasoning
TLDR
A numerical MRC model named as NumNet is proposed, which utilizes a numerically-aware graph neural network to consider the comparing information and performs numerical reasoning over numbers in the question and passage, outperforming all existing machine reading comprehension models by considering the numerical relations among numbers.
Investigating the Limitations of the Transformers with Simple Arithmetic Tasks
TLDR
It is found that how a number is represented in its surface form has a strong influence on the model’s accuracy, and this result bolsters evidence that subword tokenizers and positional encodings are components in current transformer designs that might need improvement.
Birds Have Four Legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models
TLDR
Investigating whether and to what extent one can induce numerical commonsense knowledge from PTLMs as well as the robustness of this process finds that this may not work for numerical Commonsense knowledge.
Do Language Embeddings capture Scales?
TLDR
This work identifies contextual information in pre-training and numeracy as two key factors affecting their performance, and shows that a simple method of canonicalizing numbers can have a significant effect on the results.
Core systems of number
...
...