# Training a First-Order Theorem Prover from Synthetic Data

@article{Firoiu2021TrainingAF, title={Training a First-Order Theorem Prover from Synthetic Data}, author={Vlad Firoiu and Eser Aygun and Ankit Anand and Zafarali Ahmed and Xavier Glorot and Laurent Orseau and Lei Zhang and Doina Precup and Shibl Mourad}, journal={ArXiv}, year={2021}, volume={abs/2103.03798} }

A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models. To tackle this problem, we propose an approach that relies on training purely with synthetically generated theorems, without any human data aside from axioms. We use these theorems to train a neurally-guided saturationbased prover. Our neural prover outperforms the state-of-the-art E-prover on this synthetic data in…

## 8 Citations

### Adversarial Learning to Reason in an Arbitrary Logic

- Computer ScienceFLAIRS
- 2022

This work proposes Monte-Carlo simulations guided by reinforcement learning that can work in an arbitrarily specified logic, without any human knowledge or set of problems, and practically demonstrates the feasibility of the approach in multiple logical systems.

### The Role of Synthetic Data in Improving Neural Network Algorithms

- Computer Science2022 4th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA)
- 2022

Using examples, the important role of synthetic data in the improvement of neural network algorithms and the development of artificial intelligence is shown.

### Synthetic Proof Term Data Augmentation for Theorem Proving with Language Models

- Computer Science
- 2022

This work proposes using samples from trained language models in conjunction with the Lean kernel to generate novel training examples for proof term language modeling, and uses the Lean Kernel to identify type-correct proof term candidates and infer corresponding types.

### MS@IW at SemEval-2022 Task 4: Patronising and Condescending Language Detection with Synthetically Generated Data

- Computer ScienceSEMEVAL
- 2022

The generative power of state of the art generative pretrained transformer models are leveraged to increase training set size and remedy class imbalance issues at SemEval-2022.

### Formal Mathematics Statement Curriculum Learning

- Computer ScienceArXiv
- 2022

It is shown that at same compute budget, expert iteration, by which the authors mean proof search interleaved with learning, dramatically outperforms proof search only and is capable of finding and solving a curriculum of increasingly difficult problems, without the need for associated ground-truth proofs.

### Proving Theorems using Incremental Learning and Hindsight Experience Replay

- Computer ScienceICML
- 2022

It is shown that provers trained in this way can outperform previous machine learning approaches and compete with the state of the art heuristic-based theorem prover E in its best conﬁguration, on the popular benchmarks MPTP2078, M2k and Mizar40.

### Towards the Automatic Mathematician

- Computer ScienceCADE
- 2021

This extended abstract summarizes recent developments of machine learning in mathematical reasoning and the vision of the N2Formal group at Google Research to create an automatic mathematician.

### TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning

- Computer ScienceNeurIPS
- 2021

A novel approach to interactive theorem-proving (ITP) using deep reinforcement learning that is able to prove theorems both end-to-end and from scratch (i.e., without relying on example proofs from human experts).

## References

SHOWING 1-10 OF 33 REFERENCES

### Learning to Prove Theorems by Learning to Generate Theorems

- Computer ScienceNeurIPS
- 2020

This work proposes to learn a neural generator that automatically synthesizes theorems and proofs for the purpose of training a theorem prover, and demonstrates that synthetic data from this approach improves the theorem provers and advances the state of the art of automated theorem proving in Metamath.

### Proof Artifact Co-training for Theorem Proving with Language Models

- Computer ScienceICLR
- 2022

PACT is proposed, a general methodology for extracting abundant self-supervised data from kernel-level proof terms for co-training alongside the usual tactic prediction objective and applied to Lean, an interactive proof assistant which hosts some of the most sophisticated formalized mathematics to date.

### Generative Language Modeling for Automated Theorem Proving

- Computer ScienceArXiv
- 2020

This work presents an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyzes its performance, finding new short proofs that were accepted into the mainMetamath library, which is to this knowledge, the first time a deep-learning based system has contributed proofs that are adopted by a formal mathematics community.

### HOList: An Environment for Machine Learning of Higher-Order Theorem Proving (extended version)

- Computer ScienceArXiv
- 2019

This work provides an open-source framework based on the HOL Light theorem prover that can be used as a reinforcement learning environment and presents a deep reinforcement learning driven automated theorem provers, DeepHOL, with strong initial results on this benchmark.

### Deep Network Guided Proof Search

- Computer ScienceLPAR
- 2017

Experimental evidence is given that with a hybrid, two-phase approach, deep learning based guidance can significantly reduce the average number of proof search steps while increasing the number of theorems proved.

### Learning to Prove Theorems via Interacting with Proof Assistants

- Computer ScienceICML
- 2019

ASTactic, a deep learning-based model that generates tactics as programs in the form of abstract syntax trees (ASTs) can generate effective tactics and can be used to prove new theorems not previously provable by automated methods.

### Language Models are Few-Shot Learners

- Computer ScienceNeurIPS
- 2020

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

### Attention is All you Need

- Computer ScienceNIPS
- 2017

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

### Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

- Computer ScienceNeurIPS
- 2020

This work proposes Unsupervised Environment Design (UED) as an alternative paradigm, where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments.

### Automated curricula through setter-solver interactions

- Computer ScienceArXiv
- 2019

These results represent a substantial step towards applying automatic task curricula to learn complex, otherwise unlearnable goals, and are the first to demonstrate automated curriculum generation for goal-conditioned agents in environments where the possible goals vary between episodes.