NeuDep: neural binary memory dependence analysis

  title={NeuDep: neural binary memory dependence analysis},
  author={Kexin Pei and Dongdong She and Michael Jin-Yi Wang and Scott Geng and Zhou Xuan and Yaniv David and Junfeng Yang and Suman Sekhar Jana and Baishakhi Ray},
  journal={Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
  • Kexin PeiDongdong She Baishakhi Ray
  • Published 4 October 2022
  • Computer Science
  • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
Determining whether multiple instructions can access the same memory location is a critical task in binary analysis. It is challenging as statically computing precise alias information is undecidable in theory. The problem aggravates at the binary level due to the presence of compiler optimizations and the absence of symbols and types. Existing approaches either produce significant spurious dependencies due to conservative analysis or scale poorly to complex binaries. We present a new machine… 

Figures and Tables from this paper

Neural Nets Can Learn Function Type Signatures From Binaries

A new system called EKLAVYA which trains a recurrent neural network to recover function type signatures from disassembled binary code, which generalizes well across the compilers tested on two different instruction sets with various optimization levels, without any specialized prior knowledge of the instruction set, compiler or optimization level.

RENN: Efficient Reverse Execution with Neural-Network-Assisted Alias Analysis

A new deep neural architecture is proposed, which could significantly improve memory alias resolution and can greatly reduce the burden of doing hypothesis testing to track down non-alias relation in binary code.

BDA: practical dependence analysis for binary executables by unbiased whole-program path sampling and per-path abstract interpretation

Applying BDA to call graph generation and malware analysis shows that BDA substantially supersedes the commercial tool IDA in recovering indirect call targets and outperforms a state-of-the-art malware analysis tool Cuckoo by disclosing 3 times more hidden payloads.

Spindle: Informed Memory Access Monitoring

This work proposes a novel memory access monitoring and analysis framework, Spindle, which performs common static analysis to identify predictable memory access patterns into a compact program structure summary and implements Spindle in the popular LLVM compiler.

Debin: Predicting Debug Information in Stripped Binaries

An automated tool, called Debin, is implemented, which handles ELF binaries on three of the most popular architectures: x86, x64 and ARM, and it is shown that Debin is helpful for the task of inspecting real-world malware -- it revealed suspicious library usage and behaviors such as DNS resolver reader.

StateFormer: fine-grained type recovery from binaries using generative state modeling

This work presents StateFormer, a new neural architecture that is adept at accurate and robust type inference, and significantly outperforms state-of-the-art ML-based tools by 14.6% in recovering types for both function arguments and variables.

BiRD: Race Detection in Software Binaries under Relaxed Memory Models

Bird, a prototype tool, is presented, to dynamically detect harmful data races in x86 binaries under relaxed memory models, TSO and PSO, and its comparison with the state-of-the-art tools indicate Bird’s potential in effectively detecting data Races in software binaries.

Improved Memory-Access Analysis for x86 Executables

This paper develops static-analysis methods to recover a good approximation to the variables and dynamically allocated memory objects of a stripped executable, and to track the flow of values through them by using a technique described in this paper.

SymLM: Predicting Function Names in Stripped Binaries via Context-Sensitive Execution-Aware Code Embeddings

A function Symbol name prediction and binary Language Modeling (SymLM) framework, with a novel neural architecture that learns the comprehensive function semantics by jointly modeling the execution behavior of the calling context and instructions via a novel fusing encoder.

Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity

Trex is presented, a transfer-learning-based framework, to automate learning execution semantics explicitly from functions' micro-traces and transfer the learned knowledge to match semantically similar functions.