doc - Design documents related to the decompilation pipeline.

  •        8

This repository contains design documents related to the decompilation pipeline of decomp/decomp. Poster summarizing the current capabilities of the decompilation pipeline.



Related Projects

decomp - Components of a decompilation pipeline.

  •    Go

The aim of this project is to implement a decompilation pipeline composed of independent components interacting through well-defined interfaces, as further described in the design documents of the project. From a high-level perspective, the components of the decompilation pipeline are conceptually grouped into three modules. Firstly, the front-end translates a source language (e.g. x86 assembly) into LLVM IR; a platform-independent low-level intermediate representation. Secondly, the middle-end structures the LLVM IR by identifying high-level control flow primitives (e.g. pre-test loops, 2-way conditionals). Lastly, the back-end translates the structured LLVM IR into a high-level target programming language (e.g. Go).

fcd - An optimizing decompiler

  •    C++

Fcd is an LLVM-based native program optimizing decompiler, released under an LLVM-style license. It started as a bachelor's degree senior project and carries forward its initial development philosophy of getting results fast. As such, it was architectured to have low coupling between distinct decompilation phases and to be highly hackable. Fcd uses a unique technique to reliably translate machine code to LLVM IR. Currently, it only supports x86_64. Disassembly uses Capstone. It implements pattern-independent structuring to provide a goto-free output.

mcsema - Framework for lifting x86, amd64, and aarch64 program binaries to LLVM bitcode

  •    C++

McSema is an executable lifter. It translates ("lifts") executable binaries from native machine code to LLVM bitcode. LLVM bitcode is an intermediate representation form of a program that was originally created for the retargetable LLVM compiler, but which is also very useful for performing program analysis methods that would not be possible to perform on an executable binary directly. McSema enables analysts to find and retroactively harden binary programs against security bugs, independently validate vendor source code, and generate application tests with high code coverage. McSema isn’t just for static analysis. The lifted LLVM bitcode can also be fuzzed with libFuzzer, an LLVM-based instrumented fuzzer that would otherwise require the target source code. The lifted bitcode can even be compiled back into a runnable program! This is a procedure known as static binary rewriting, binary translation, or binary recompilation.

panda - Platform for Architecture-Neutral Dynamic Analysis

  •    C

PANDA is an open-source Platform for Architecture-Neutral Dynamic Analysis. It is built upon the QEMU whole system emulator, and so analyses have access to all code executing in the guest and all data. PANDA adds the ability to record and replay executions, enabling iterative, deep, whole system analyses. Further, the replay log files are compact and shareable, allowing for repeatable experiments. A nine billion instruction boot of FreeBSD, e.g., is represented by only a few hundred MB. PANDA leverages QEMU's support of thirteen different CPU architectures to make analyses of those diverse instruction sets possible within the LLVM IR. In this way, PANDA can have a single dynamic taint analysis, for example, that precisely supports many CPUs. PANDA analyses are written in a simple plugin architecture which includes a mechanism to share functionality between plugins, increasing analysis code re-use and simplifying complex analysis development. It is currently being developed in collaboration with MIT Lincoln Laboratory, NYU, and Northeastern University.

simplify - Generic Android Deobfuscator

  •    Java

Simplify virtually executes an app to understand its behavior and then tries to optimize the code so that it behaves identically but is easier for a human to understand. Each optimization type is simple and generic, so it doesn't matter what the specific type of obfuscation is used. The code on the left is a decompilation of an obfuscated app, and the code on the right has been deobfuscated.

codechecker - CodeChecker is an analyzer tooling, defect database and viewer extension for the Clang Static Analyzer and Clang Tidy

  •    Python

CodeChecker is a static analysis infrastructure built on the LLVM/Clang Static Analyzer toolchain, replacing scan-build in a Linux or macOS (OS X) development environment. In OSX environment the intercept-build tool from scan-build is used to log the compiler invocations.

gocaml - :camel: Practical statically typed functional programming language implementation with Go and LLVM

  •    Go

GoCaml is subset of OCaml in Go based on MinCaml using LLVM. GoCaml adds many features to original MinCaml. MinCaml is a minimal subset of OCaml for educational purpose. It is statically-typed and compiled into a binary. This project aims incremental compiler development for my own programming language. Type inference, closure transform, mid-level IR are implemented.

llvm - LLVM IR library in pure Go (work in progress).

  •    LLVM

This project is a work in progress. The implementation is incomplete and subject to change. The documentation may be inaccurate. The aim of this project is to provide a pure Go library for interacting with LLVM IR.

ILSpy - .NET Decompiler

  •    CSharp

ILSpy is the open-source .NET assembly browser and decompiler. It supports decompilation to CSharp, VB.

retdec - RetDec is a retargetable machine-code decompiler based on LLVM.

  •    C++

RetDec is a retargetable machine-code decompiler based on LLVM. Currently, we support Windows (7 or later), Linux, macOS, and (experimentally) FreeBSD. An installed version of RetDec requires approximately 4 GB of free disk space.

phan - Phan is a static analyzer for PHP

  •    PHP

Phan is a static analyzer for PHP that prefers to minimize false-positives. Phan attempts to prove incorrectness rather than correctness. Phan looks for common issues and will verify type compatibility on various operations when type information is available or can be deduced. Phan has a good (but not comprehensive) understanding of flow control and does not attempt to track values.

llvmlite - A lightweight LLVM python binding for writing JIT compilers

  •    Python

The old llvmpy binding exposes a lot of LLVM APIs but the mapping of C++-style memory management to Python is error prone. Numba and many JIT compilers do not need a full LLVM API. Only the IR builder, optimizer, and JIT compiler APIs are necessary. The llvmlite.llvmpy namespace provides a minimal llvmpy compatibility layer.

souper - A superoptimizer for LLVM IR

  •    C++

A superoptimizer for LLVM IR

phan - Phan is a static analyzer for PHP

  •    PHP

Phan is a static analyzer for PHP that prefers to minimize false-positives. Phan attempts to prove incorrectness rather than correctness.Phan looks for common issues and will verify type compatibility on various operations when type information is available or can be deduced. Phan does not have a strong understanding of flow control and does not attempt to track values.

bap - Binary Analysis Platform

  •    OCaml

The Carnegie Mellon University Binary Analysis Platform (CMU BAP) is a reverse engineering and program analysis platform that works with binary code and doesn't require the source code. BAP supports multiple architectures: ARM, x86, x86-64, PowerPC, and MIPS. BAP disassembles and lifts binary code into the RISC-like BAP Instruction Language (BIL). Program analysis is performed using the BIL representation and is architecture independent in a sense that it will work equally well for all supported architectures. The platform comes with a set of tools, libraries, and plugins. The documentation and tutorial are also available. The main purpose of BAP is to provide a toolkit for implementing automated program analysis. BAP is written in OCaml and it is the preferred language to write analysis, we have bindings to C, Python and Rust. The Primus Framework also provide a Lisp-like DSL for writing program analysis tools. BAP is developed in CMU, Cylab and is sponsored by various grants from the United States Department of Defense, Siemens AG, and the Korea government, see sponsors for more information.

Iterative Flow Analysis

  •    C++

IFA, Iterative Flow Analysis is a combined data-flow and control flow analysis. It is capable of resolving the concrete types (as opposed to the nominal or declared types) and the interprocedural call graph for statically or dynamically typed programs.

JustDecompileEngine - The decompilation engine of JustDecompile

  •    CSharp

This is the engine of the popular .NET decompiler JustDecompile. C# is the only programming language used.Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

binnavi - BinNavi is a binary analysis IDE that allows to inspect, navigate, edit and annotate control flow graphs and call graphs of disassembled code

  •    Java

Copyright 2011-2016 Google Inc.BinNavi is a binary analysis IDE - an environment that allows users to inspect, navigate, edit, and annotate control-flow-graphs of disassembled code, do the same for the callgraph of the executable, collect and combine execution traces, and generally keep track of analysis results among a group of analysts.

pipelines - a language for scripting data flow

  •    Nim

Pipelines is a language and runtime for crafting massively parallel pipelines. Unlike other languages for defining data flow, the Pipeline language requires implementation of components to be defined separately in the Python scripting language. This allows the details of implementations to be separated from the structure of the pipeline, while providing access to thousands of active libraries for machine learning, data analysis and processing. Skip to Getting Started to install the Pipeline compiler. Running the Pipeline document would safely execute each component of the pipeline in parallel and output the expected result.