• Corpus ID: 232134811

Training a First-Order Theorem Prover from Synthetic Data

  title={Training a First-Order Theorem Prover from Synthetic Data},
  author={Vlad Firoiu and Eser Aygun and Ankit Anand and Zafarali Ahmed and Xavier Glorot and Laurent Orseau and Lei Zhang and Doina Precup and Shibl Mourad},
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models. To tackle this problem, we propose an approach that relies on training purely with synthetically generated theorems, without any human data aside from axioms. We use these theorems to train a neurally-guided saturationbased prover. Our neural prover outperforms the state-of-the-art E-prover on this synthetic data in… 

Figures and Tables from this paper

Adversarial Learning to Reason in an Arbitrary Logic

This work proposes Monte-Carlo simulations guided by reinforcement learning that can work in an arbitrarily specified logic, without any human knowledge or set of problems, and practically demonstrates the feasibility of the approach in multiple logical systems.

The Role of Synthetic Data in Improving Neural Network Algorithms

  • Andrey N. RabchevskyL. Yasnitsky
  • Computer Science
    2022 4th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA)
  • 2022
Using examples, the important role of synthetic data in the improvement of neural network algorithms and the development of artificial intelligence is shown.

Synthetic Proof Term Data Augmentation for Theorem Proving with Language Models

This work proposes using samples from trained language models in conjunction with the Lean kernel to generate novel training examples for proof term language modeling, and uses the Lean Kernel to identify type-correct proof term candidates and infer corresponding types.

MS@IW at SemEval-2022 Task 4: Patronising and Condescending Language Detection with Synthetically Generated Data

The generative power of state of the art generative pretrained transformer models are leveraged to increase training set size and remedy class imbalance issues at SemEval-2022.

Formal Mathematics Statement Curriculum Learning

It is shown that at same compute budget, expert iteration, by which the authors mean proof search interleaved with learning, dramatically outperforms proof search only and is capable of finding and solving a curriculum of increasingly difficult problems, without the need for associated ground-truth proofs.

Proving Theorems using Incremental Learning and Hindsight Experience Replay

It is shown that provers trained in this way can outperform previous machine learning approaches and compete with the state of the art heuristic-based theorem prover E in its best configuration, on the popular benchmarks MPTP2078, M2k and Mizar40.

Towards the Automatic Mathematician

This extended abstract summarizes recent developments of machine learning in mathematical reasoning and the vision of the N2Formal group at Google Research to create an automatic mathematician.

TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning

A novel approach to interactive theorem-proving (ITP) using deep reinforcement learning that is able to prove theorems both end-to-end and from scratch (i.e., without relying on example proofs from human experts).



Learning to Prove Theorems by Learning to Generate Theorems

This work proposes to learn a neural generator that automatically synthesizes theorems and proofs for the purpose of training a theorem prover, and demonstrates that synthetic data from this approach improves the theorem provers and advances the state of the art of automated theorem proving in Metamath.

Proof Artifact Co-training for Theorem Proving with Language Models

PACT is proposed, a general methodology for extracting abundant self-supervised data from kernel-level proof terms for co-training alongside the usual tactic prediction objective and applied to Lean, an interactive proof assistant which hosts some of the most sophisticated formalized mathematics to date.

Generative Language Modeling for Automated Theorem Proving

This work presents an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyzes its performance, finding new short proofs that were accepted into the mainMetamath library, which is to this knowledge, the first time a deep-learning based system has contributed proofs that are adopted by a formal mathematics community.

HOList: An Environment for Machine Learning of Higher-Order Theorem Proving (extended version)

This work provides an open-source framework based on the HOL Light theorem prover that can be used as a reinforcement learning environment and presents a deep reinforcement learning driven automated theorem provers, DeepHOL, with strong initial results on this benchmark.

Deep Network Guided Proof Search

Experimental evidence is given that with a hybrid, two-phase approach, deep learning based guidance can significantly reduce the average number of proof search steps while increasing the number of theorems proved.

Learning to Prove Theorems via Interacting with Proof Assistants

ASTactic, a deep learning-based model that generates tactics as programs in the form of abstract syntax trees (ASTs) can generate effective tactics and can be used to prove new theorems not previously provable by automated methods.

Language Models are Few-Shot Learners

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

This work proposes Unsupervised Environment Design (UED) as an alternative paradigm, where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments.

Automated curricula through setter-solver interactions

These results represent a substantial step towards applying automatic task curricula to learn complex, otherwise unlearnable goals, and are the first to demonstrate automated curriculum generation for goal-conditioned agents in environments where the possible goals vary between episodes.