The Strengths and Behavioral Quirks of Java Bytecode Decompilers

@article{Harrand2019TheSA,
  title={The Strengths and Behavioral Quirks of Java Bytecode Decompilers},
  author={Nicolas Harrand and C{\'e}sar Soto-Valero and Monperrus Martin and Beno{\^i}t Baudry},
  journal={2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)},
  year={2019},
  pages={92-102}
}
During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, the decompilation process, which aims at producing source code from bytecode, must establish some strategies to reconstruct the information that has been lost. Modern Java decompilers tend to use distinct strategies to achieve proper decompilation. In this work, we hypothesize that the diverse ways in which… 

Figures and Tables from this paper

Java Bytecode Control Flow Classification: Framework for Guiding Java Decompilation

TLDR
The methodology for guiding Java decompiler to deal with the problem of selection statement deficiency is proposed and the experimental results show that the decompilation accuracy is 93.68 percent, which is obviously better than Java Decompiler.

A Large-Scale Empirical Study of Android App Decompilation

TLDR
The empirical evaluation, complemented with an indepth manual analysis of a number of apps, indicate that code obfuscation is quite rarely encountered, even in malicious apps, which indicates that near-perfect Android decompilation is, at least in theory, achievable, with implementation-level improvements to decompilation tools.

Elipmoc: advanced decompilation of Ethereum smart contracts

TLDR
Elipmoc is an evolution of Gigahorse, the top research decompiler, dramatically improving over it and over other state-of-the-art tools, by employing several high-precision techniques and making them scalable.

References

SHOWING 1-10 OF 44 REFERENCES

Decompiling Java Bytecode: Problems, Traps and Pitfalls

TLDR
The problems in assigning types to variables and literals, and the problems due to expression evaluation on the Java stack are outlined, and a particular emphasis on issues related to Java exceptions and synchronized blocks are looked at.

An Evaluation of Current Java Bytecode Decompilers

TLDR
This paper evaluates the currently available Java bytecode decompilers using an extension of the criteria that were used in the original study, found that none passed all the tests, each of which were designed to target different problem areas.

Using compilation/decompilation to enhance clone detection

TLDR
Compilation/decompilation canonicalise syntactic changes made to source code and can be used as source code normalisation and decompilation as normalisation to compliment clone detection are studied.

Metrics for Measuring the Effectiveness of Decompilers and Obfuscators

TLDR
By quantitatively comparing original Java source against decompiled and obfuscated code respectively, it is shown which decompilers produce "good" code and whether obfuscations result in "hard-to-understand" code.

Towards Neural Decompilation

TLDR
This work addresses the problem of automatic decompilation, converting a program in low-level representation back to a higher-level human-readable programming language, by presenting a novel approach to decompilation based on neural machine translation.

Static Single Assignment for Decompilation

TLDR
The goal of extending the state of the art of machine code decompilation has been achieved and the most promising areas for future research have been identified as range analysis and alias analysis.

PsybOt malware: A step-by-step decompilation case study

TLDR
This paper gives a step-by-step case study of decompiling a MIPS worm called psyb0t by using a retargetable decompiler that is being developed within the Lissom project, and describes the decompiler in detail.

Evolving Exact Decompilation

TLDR
A novel technique for C decompilation is introduced that provides the correctness guarantees and readability properties essential for accurate and efficient binary analysis and demonstrates the promise of this novel, general, and powerful approach to decompilation.

Reconstruction of Composite Types for Decompilation

TLDR
A method for automatic reconstruction of composite types in a high-level program during decompilation based on expressing memory access operations as pairs base offset, then building equivalence classes for the bases used in the program and accumulating offsets for each equivalence class.

Using recurrent neural networks for decompilation

TLDR
A novel technique for decompiling binary code snippets using a model based on Recurrent Neural Networks that learns properties and patterns that occur in source code and uses them to produce decompilation output.