• Corpus ID: 237091166

Augmenting Decompiler Output with Learned Variable Names and Types

@article{Chen2021AugmentingDO,
  title={Augmenting Decompiler Output with Learned Variable Names and Types},
  author={Qibin Chen and Jeremy Lacomis and Edward J. Schwartz and Claire Le Goues and Graham Neubig and Bogdan Vasilescu},
  journal={ArXiv},
  year={2021},
  volume={abs/2108.06363}
}
A common tool used by security professionals for reverse-engineering binaries found in the wild is the decompiler . A decompiler attempts to reverse compilation, transforming a binary to a higher-level language such as C. High-level languages ease reasoning about programs by providing useful abstractions such as loops, typed variables, and comments, but these abstractions are lost during compilation. Decompilers are able to deterministically reconstruct structural properties of code, but… 

DIRE and its Data: Neural Decompiled Variable Renamings with Respect to Software Class

TLDR
This work investigates how data provenance and the quality of training data affect performance, and how well, if at all, trained models generalize across software domains, and evaluates DIRE’s overall performance without respect to data quality.

D ECOMPERSON : How Humans Decompile and What We Can Learn From It

TLDR
It is shown how perfect decompilation allows programmatic analysis of such large datasets, providing new insights into the reverse engineering process.

References

SHOWING 1-10 OF 60 REFERENCES

DIRE: A Neural Approach to Decompiled Identifier Naming

TLDR
The Decompiled Identifier Renaming Engine (DIRE) is proposed, a novel probabilistic technique for variable name recovery that uses both lexical and structural information recovered by the decompiler.

Improving type information inferred by decompilers with supervised machine learning

TLDR
This article builds different classification models capable of inferring the high-level type returned by functions, with significantly higher accuracy than existing decompilers.

Towards Neural Decompilation

TLDR
This work addresses the problem of automatic decompilation, converting a program in low-level representation back to a higher-level human-readable programming language, by presenting a novel approach to decompilation based on neural machine translation.

DIRECT : A Transformer-based Model for Decompiled Identifier Renaming

TLDR
This paper proposes DIRECT, a novel transformer-based architecture customized specifically for the task at hand and evaluates the model on a dataset of decompiled functions and finds that DIRECT outperforms the previous state-of-the-art model by up to 20%.

TypeMiner: Recovering Types in Binary Programs Using Machine Learning

TLDR
This paper builds on the assumption that types leave characteristic traits in compiled code that can be automatically identified using machine learning starting at usage locations determined by an analyst, and presents TypeMiner, a static method for recovering types in binary programs.

CATI: Context-Assisted Type Inference from Stripped Binaries

TLDR
This paper presents an efficient approach for inferring types, and overcome the challenge of scattered information provided by static analysis on stripped binaries by implementing a system called CATI, which locates variables from stripped binaries and infers 19 types from variables.

Static Single Assignment for Decompilation

TLDR
The goal of extending the state of the art of machine code decompilation has been achieved and the most promising areas for future research have been identified as range analysis and alias analysis.

TIE: Principled Reverse Engineering of Types in Binary Programs

TLDR
Novel techniques for reverse engineering data type abstractions from binary programs are developed and a novel type reconstruction system based upon binary code analysis is developed that is both more accurate and more precise at recovering high-level types than existing mechanisms.

No More Gotos: Decompilation Using Pattern-Independent Control-Flow Structuring and Semantic-Preserving Transformations

TLDR
DREAM is presented, the first decompiler to offer a goto-free output, a novel patternindependent control-flow structuring algorithm that can recover all control constructs in binary programs and produce structured decompiled code without any goto statements.

Coda: An End-to-End Neural Program Decompiler

TLDR
Coda1 is proposed, the first end-to-end neural-based framework for code decompilation and reveals the vulnerability of binary executables and imposes a new threat to the protection of Intellectual Property (IP) for software development.
...