decomp - Components of a decompilation pipeline.

  •        8

The aim of this project is to implement a decompilation pipeline composed of independent components interacting through well-defined interfaces, as further described in the design documents of the project. From a high-level perspective, the components of the decompilation pipeline are conceptually grouped into three modules. Firstly, the front-end translates a source language (e.g. x86 assembly) into LLVM IR; a platform-independent low-level intermediate representation. Secondly, the middle-end structures the LLVM IR by identifying high-level control flow primitives (e.g. pre-test loops, 2-way conditionals). Lastly, the back-end translates the structured LLVM IR into a high-level target programming language (e.g. Go).

https://github.com/decomp/decomp

Tags
Implementation
License
Platform

   




Related Projects

fcd - An optimizing decompiler

  •    C++

Fcd is an LLVM-based native program optimizing decompiler, released under an LLVM-style license. It started as a bachelor's degree senior project and carries forward its initial development philosophy of getting results fast. As such, it was architectured to have low coupling between distinct decompilation phases and to be highly hackable. Fcd uses a unique technique to reliably translate machine code to LLVM IR. Currently, it only supports x86_64. Disassembly uses Capstone. It implements pattern-independent structuring to provide a goto-free output.

mcsema - Framework for lifting x86, amd64, and aarch64 program binaries to LLVM bitcode

  •    C++

McSema is an executable lifter. It translates ("lifts") executable binaries from native machine code to LLVM bitcode. LLVM bitcode is an intermediate representation form of a program that was originally created for the retargetable LLVM compiler, but which is also very useful for performing program analysis methods that would not be possible to perform on an executable binary directly. McSema enables analysts to find and retroactively harden binary programs against security bugs, independently validate vendor source code, and generate application tests with high code coverage. McSema isn’t just for static analysis. The lifted LLVM bitcode can also be fuzzed with libFuzzer, an LLVM-based instrumented fuzzer that would otherwise require the target source code. The lifted bitcode can even be compiled back into a runnable program! This is a procedure known as static binary rewriting, binary translation, or binary recompilation.

simplify - Generic Android Deobfuscator

  •    Java

Simplify virtually executes an app to understand its behavior and then tries to optimize the code so that it behaves identically but is easier for a human to understand. Each optimization type is simple and generic, so it doesn't matter what the specific type of obfuscation is used. The code on the left is a decompilation of an obfuscated app, and the code on the right has been deobfuscated.

ILSpy - .NET Decompiler

  •    CSharp

ILSpy is the open-source .NET assembly browser and decompiler. It supports decompilation to CSharp, VB.

panda - Platform for Architecture-Neutral Dynamic Analysis

  •    C

PANDA is an open-source Platform for Architecture-Neutral Dynamic Analysis. It is built upon the QEMU whole system emulator, and so analyses have access to all code executing in the guest and all data. PANDA adds the ability to record and replay executions, enabling iterative, deep, whole system analyses. Further, the replay log files are compact and shareable, allowing for repeatable experiments. A nine billion instruction boot of FreeBSD, e.g., is represented by only a few hundred MB. PANDA leverages QEMU's support of thirteen different CPU architectures to make analyses of those diverse instruction sets possible within the LLVM IR. In this way, PANDA can have a single dynamic taint analysis, for example, that precisely supports many CPUs. PANDA analyses are written in a simple plugin architecture which includes a mechanism to share functionality between plugins, increasing analysis code re-use and simplifying complex analysis development. It is currently being developed in collaboration with MIT Lincoln Laboratory, NYU, and Northeastern University.


JavaGuard

  •    Java

JavaGuard is a general purpose bytecode obfuscator, designed to fit effortlessly into your regular build and testing process, providing peace of mind that your valuable Java code is more secure against decompilation and other forms of reverse engineering

JustDecompileEngine - The decompilation engine of JustDecompile

  •    CSharp

This is the engine of the popular .NET decompiler JustDecompile. C# is the only programming language used.Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

barf-project - BARF : A multiplatform open source Binary Analysis and Reverse engineering Framework

  •    Python

The analysis of binary code is a crucial activity in many areas of the computer sciences and software engineering disciplines ranging from software security and program analysis to reverse engineering. Manual binary analysis is a difficult and time-consuming task and there are software tools that seek to automate or assist human analysts. However, most of these tools have several technical and commercial restrictions that limit access and use by a large portion of the academic and practitioner communities. BARF is an open source binary analysis framework that aims to support a wide range of binary code analysis tasks that are common in the information security discipline. It is a scriptable platform that supports instruction lifting from multiple architectures, binary translation to an intermediate representation, an extensible framework for code analysis plugins and interoperation with external tools such as debuggers, SMT solvers and instrumentation tools. The framework is designed primarily for human-assisted analysis but it can be fully automated. All packages were tested on Ubuntu 16.04 (x86_64).

bap - Binary Analysis Platform

  •    OCaml

The Carnegie Mellon University Binary Analysis Platform (CMU BAP) is a reverse engineering and program analysis platform that works with binary code and doesn't require the source code. BAP supports multiple architectures: ARM, x86, x86-64, PowerPC, and MIPS. BAP disassembles and lifts binary code into the RISC-like BAP Instruction Language (BIL). Program analysis is performed using the BIL representation and is architecture independent in a sense that it will work equally well for all supported architectures. The platform comes with a set of tools, libraries, and plugins. The documentation and tutorial are also available. The main purpose of BAP is to provide a toolkit for implementing automated program analysis. BAP is written in OCaml and it is the preferred language to write analysis, we have bindings to C, Python and Rust. The Primus Framework also provide a Lisp-like DSL for writing program analysis tools. BAP is developed in CMU, Cylab and is sponsored by various grants from the United States Department of Defense, Siemens AG, and the Korea government, see sponsors for more information.

Boomerang - Decompiler of Machine Code Programs

  •    C++

After a program has been thrown into the world in binary form, it can boomerang back as source code. The Boomerang reverse engineering framework is the first general native executable decompiler available to the public.

gocaml - :camel: Practical statically typed functional programming language implementation with Go and LLVM

  •    Go

GoCaml is subset of OCaml in Go based on MinCaml using LLVM. GoCaml adds many features to original MinCaml. MinCaml is a minimal subset of OCaml for educational purpose. It is statically-typed and compiled into a binary. This project aims incremental compiler development for my own programming language. Type inference, closure transform, mid-level IR are implemented.

llvm - LLVM IR library in pure Go (work in progress).

  •    LLVM

This project is a work in progress. The implementation is incomplete and subject to change. The documentation may be inaccurate. The aim of this project is to provide a pure Go library for interacting with LLVM IR.

Triton - Triton is a Dynamic Binary Analysis (DBA) framework

  •    C++

Triton is a dynamic binary analysis (DBA) framework. It provides internal components like a Dynamic Symbolic Execution (DSE) engine, a Taint engine, AST representations of the x86 and the x86-64 instructions set semantics, SMT simplification passes, an SMT Solver Interface and, the last but not least, Python bindings. Based on these components, you are able to build program analysis tools, automate reverse engineering and perform software verification. As Triton is still a young project, please, don't blame us if it is not yet reliable. Open issues or pull requests are always better than troll =).

retdec - RetDec is a retargetable machine-code decompiler based on LLVM.

  •    C++

RetDec is a retargetable machine-code decompiler based on LLVM. Currently, we support Windows (7 or later), Linux, macOS, and (experimentally) FreeBSD. An installed version of RetDec requires approximately 4 GB of free disk space.

Hikari - LLVM Obfuscator

  •    

English Documentation Hikari(Light in Japanese, name stolen from the Nintendo Switch game Xenoblade Chronicles 2) is my hackathon-ishtoy project for the 2017 Christmas to kill time.It's already stable enough to use in production environment. However, as initially planned, Hikari has been ported to LLVM 6.0 release version and no longer being actively maintained due to the time and effort it takes. You can find the history of its development at developer branch. Further enhancements include more features like Code-Intergrity Checking and a full anti-hook implementation. These are not open-source and will probably be released as a commercial product. If you know me close enough we can discuss the license model and pricing issue because I might not be able to provide real-time bug fix and stuff. Any undiscovered potential bugs affecting the obfuscated binary are fixed during obfuscation so you get a workable binary.

pharos - Automated static analysis tools for binary programs

  •    C++

The Pharos static binary analysis framework is a project of the Software Engineering Institute at Carnegie Mellon University. The framework is designed to facilitate the automated analysis of binary programs. It uses the ROSE compiler infrastructure developed by Lawrence Livermore National Laboratory for disassembly, control flow analysis, instruction semantics, and more. The current distribution is a substantial update to the previous version, and is part of an ongoing process to release more of the framework and tools publicly. This software is released under a BSD license. Carnegie Mellon University retains the copyright.

WAVM - WebAssembly Virtual Machine

  •    WebAssembly

This is a standalone VM for WebAssembly. It can load both the standard binary format, and the text format defined by the WebAssembly reference interpreter. For the text format, it can load both the standard stack machine syntax and the old-fashioned AST syntax used by the reference interpreter, and all of the testing commands. To build it, you'll need CMake and LLVM 6.0. If CMake can't find your LLVM directory, you can manually give it the location in the LLVM_DIR CMake configuration variable. Note that on Windows, you must compile LLVM from source, and manually point the LLVM_DIR configuration variable at <LLVM build directory>\lib\cmake\llvm.

WPF StylesExplorer

  •    WPF

Styles Explorer is project containing library for WPF baml decompilation and a tool to explorer baml resources.

llvmlite - A lightweight LLVM python binding for writing JIT compilers

  •    Python

The old llvmpy binding exposes a lot of LLVM APIs but the mapping of C++-style memory management to Python is error prone. Numba and many JIT compilers do not need a full LLVM API. Only the IR builder, optimizer, and JIT compiler APIs are necessary. The llvmlite.llvmpy namespace provides a minimal llvmpy compatibility layer.

souper - A superoptimizer for LLVM IR

  •    C++

A superoptimizer for LLVM IR