• Corpus ID: 238198487

Learning to Superoptimize Real-world Programs

  title={Learning to Superoptimize Real-world Programs},
  author={Alex Shypula and Pengcheng Yin and Jeremy Lacomis and Claire Le Goues and Edward N. Schwartz and Graham Neubig},
a superoptimization domain-specific, and/or synthetic program benchmarks. In paper, we propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models. We created a dataset consisting of over 25K real-world x86-64 assembly functions mined from open-source projects and propose an approach, S elf I mitation L for O ptimization ( SILO ) that is to implement and outperforms a standard policy gradient learning approach on our dataset. Our method, SILO… 

Figures and Tables from this paper

Enabling Transformers to Understand Low-Level Programs

This work applies transfer learning to low-level (LLVM) programs and study how low- level programs can be made more amenable to Transformer models through various techniques, including preprocessing, infix/prefix operators, and information deduplication.

Understanding High-Level Properties of Low-Level Programs Through Transformers

It is shown that Transformer models can translate C to LLVM-IR with high accuracy, by training on a parallel corpus of functions extract from 1 million compilable, open-sourced C programs and its corresponding LL VM-IR after compiling with Clang.



Evaluating Large Language Models Trained on Code

It is found that repeated sampling from the GPT language model is a surprisingly effective strategy for producing working solutions to difficult prompts, and the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics are discussed.

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

A new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) is developed that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation.

Sound Loop Superoptimization for Google Native Client

This work demonstrates that superoptimization can dramatically improve the performance of Google Native Client, a SFI system that ships inside the Google Chrome Browser and proposes a new architecture for super Optimization tools that incorporates both a fully sound verification technique and a bounded verification technique to guide the search to optimized code.

Stochastic superoptimization

This work forms the loop-free binary superoptimization task as a stochastic search problem, and a Markov Chain Monte Carlo sampler is used to rapidly explore the space of all possible programs to find one that is an optimization of a given target program.

Semantic program alignment for equivalence checking

A robust semantics-driven technique for program equivalence checking is introduced and it is demonstrated that the algorithm is applicable to challenging equivalence problems beyond the scope of existing techniques.

Learning to superoptimize programs

Experiments on benchmarks comprising of automatically generated as well as existing ("Hacker's Delight") programs show that the proposed method is able to significantly outperform state of the art approaches for code super-optimization.

Competition-level code generation with AlphaCode

AlphaCode is introduced, a system for code generation that achieved an average ranking in the top 54.3% in simulated evaluations on recent programming competitions on the Codeforces platform, marking the first time an artificial intelligence system has performed competitively in programming competitions.

Machine Learning in Compilers: Past, Present and Future

A retrospective of machine learning in compiler optimisation from its earliest inception, through some of the works that set themselves apart, to today's deep learning, finishing with the vision of the field's future.

Deep Symbolic Superoptimization Without Human Knowledge

HISS is a reinforcement learning framework for symbolic super-optimization that keeps human outside the loop and can discover more simplification rules than existing human-dependent methods, and can learn meaningful embeddings for symbolic expressions, which are indicative of equivalence.

Learning to optimize halide with tree search and random programs

This work presents a new algorithm to automatically schedule Halide programs for high-performance image processing and deep learning that produces schedules which are on average almost twice as fast as the existing Halide autoscheduler without autotuning, or more than two as fast with, and is the first automatic scheduling algorithm to significantly outperform human experts on average.