Corpus ID: 237513831

FORTAP: Using Formulae for Numerical-Reasoning-Aware Table Pretraining

  title={FORTAP: Using Formulae for Numerical-Reasoning-Aware Table Pretraining},
  author={Zhoujun Cheng and Haoyu Dong and Fan Cheng and Ran Jia and Pengfei Wu and Shi Han and Dongmei Zhang},
Tables store rich numerical data, but numerical reasoning over tables is still a challenge. In this paper, we find that the spreadsheet formula, which performs calculations on numerical values in tables, is naturally a strong supervision of numerical reasoning. More importantly, large amounts of spreadsheets with expert-made formulae are available on the web and can be obtained easily. FORTAP is the first method for numerical-reasoning-aware table pretraining by leveraging large corpus of… Expand

Figures and Tables from this paper


Learning to Reason for Text Generation from Scientific Tables
SciGen is the first dataset that assesses the arithmetic reasoning capabilities of generation models on complex input structures, i.e., tables from scientific articles and their corresponding descriptions, and one of the main bottlenecks for this task is the lack of proper automatic evaluation metrics. Expand
Semantic Structure Extraction for Spreadsheet Tables with a Multi-task Learning Architecture
This work proposes a multi-task framework that learns table region, structural components and cell types jointly jointly, and builds a large human-labeled dataset with broad coverage of table structures. Expand
TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data
TaBERT is a pretrained LM that jointly learns representations for NL sentences and (semi-)structured tables that achieves new best results on the challenging weakly-supervised semantic parsing benchmark WikiTableQuestions, while performing competitively on the text-to-SQL dataset Spider. Expand
HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation
This work presents HiTab, a free and open dataset for the research community to study question answering (QA) and natural language generation (NLG) over hierarchical tables, and devise a novel hierarchy-aware logical form for symbolic reasoning over tables, which shows high effectiveness. Expand
SpreadsheetCoder: Formula Prediction from Semi-structured Context
This work proposes SPREADSHEETCODER, a BERT-based model architecture to represent the tabular context in both row-based and column-based formats, and achieves top-1 prediction accuracy of 42.51%, which is a considerable improvement over baselines that do not employ richtabular context. Expand
Melford: Using Neural Networks to Find Spreadsheet Errors
This paper shows that applying neural networks to spreadsheets allows us to find an important class of error with high precision, and uses a spatial abstraction of the cells around a particular cell to build a classifier that predicts whether a cell should contain a formula whenever it contains a number. Expand
Active Learning for Spreadsheet Cell Classification
This paper investigates a semi-supervised approach called Active Learning (AL), that can be used to train classification models by selecting only the most informative examples from an unlabeled dataset, and implements an AL cycle for spreadsheet cell classification. Expand
Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples
This work proposes Neural Guided Deductive Search (NGDS), a hybrid synthesis technique that combines the best of both symbolic logic techniques and statistical models and produces programs that satisfy the provided specifications by construction and generalize well on unseen examples, similar to data-driven systems. Expand
A grammar for spreadsheet formulas evaluated on two large datasets
A grammar for spreadsheet formulas that is compatible with the spreadsheet formula language, is compact enough to feasibly implement with a parser generator, and produces parse trees aimed at further manipulation and analysis is presented. Expand
Compositional Semantic Parsing on Semi-Structured Tables
This paper proposes a logical-form driven parsing algorithm guided by strong typing constraints and shows that it obtains significant improvements over natural baselines and is made publicly available. Expand