mcsema - Framework for lifting x86, amd64, and aarch64 program binaries to LLVM bitcode

  •        15

McSema is an executable lifter. It translates ("lifts") executable binaries from native machine code to LLVM bitcode. LLVM bitcode is an intermediate representation form of a program that was originally created for the retargetable LLVM compiler, but which is also very useful for performing program analysis methods that would not be possible to perform on an executable binary directly. McSema enables analysts to find and retroactively harden binary programs against security bugs, independently validate vendor source code, and generate application tests with high code coverage. McSema isn’t just for static analysis. The lifted LLVM bitcode can also be fuzzed with libFuzzer, an LLVM-based instrumented fuzzer that would otherwise require the target source code. The lifted bitcode can even be compiled back into a runnable program! This is a procedure known as static binary rewriting, binary translation, or binary recompilation.



Related Projects

fcd - An optimizing decompiler

  •    C++

Fcd is an LLVM-based native program optimizing decompiler, released under an LLVM-style license. It started as a bachelor's degree senior project and carries forward its initial development philosophy of getting results fast. As such, it was architectured to have low coupling between distinct decompilation phases and to be highly hackable. Fcd uses a unique technique to reliably translate machine code to LLVM IR. Currently, it only supports x86_64. Disassembly uses Capstone. It implements pattern-independent structuring to provide a goto-free output.

cargo-fuzz - Command line helpers for fuzzing

  •    Rust

Note: libFuzzer needs LLVM sanitizer support, so this is only works on x86-64 Linux and x86-64 macOS for now. This also needs a nightly since it uses some unstable command-line flags. You'll also need a C++ compiler with C++11 support. This crate is currently under some churn -- in case stuff isn't working, please reinstall it (cargo install cargo-fuzz -f). Rerunning cargo fuzz init after moving your fuzz folder and updating this crate may get you a better generated fuzz/Cargo.toml. Expect this to settle down soon.

emscripten - Emscripten: An LLVM-to-JavaScript Compiler

  •    C

Emscripten is an LLVM-to-JavaScript compiler. It takes LLVM bitcode - which can be generated from C/C++, using llvm-gcc (DragonEgg) or clang, or any other language that can be converted into LLVM - and compiles that into JavaScript, which can be run on the web (or anywhere else JavaScript can run). Emscripten is available under 2 licenses, the MIT license and the University of Illinois/NCSA Open Source License.

elvm - EsoLangVM Compiler Infrastructure

  •    C

ELVM is similar to LLVM but dedicated to Esoteric Languages. This project consists of two components - frontend and backend. Currently, the only frontend we have is a modified version of 8cc. The modified 8cc translates C code to an internal representation format called ELVM IR (EIR). Unlike LLVM bitcode, EIR is designed to be extremely simple, so there's more chance we can write a translator from EIR to an esoteric language. The above list contains languages which are known to be difficult to program in, but with ELVM, you can create programs in such languages. You can easily create Brainfuck programs by writing C code for example. One of interesting testcases ELVM has is a tiny Lisp interpreter. The all above language backends are passing the test, which means you can run Lisp on the above languages.

llvm - LLVM IR library in pure Go (work in progress).

  •    LLVM

This project is a work in progress. The implementation is incomplete and subject to change. The documentation may be inaccurate. The aim of this project is to provide a pure Go library for interacting with LLVM IR.

gocaml - :camel: Practical statically typed functional programming language implementation with Go and LLVM

  •    Go

GoCaml is subset of OCaml in Go based on MinCaml using LLVM. GoCaml adds many features to original MinCaml. MinCaml is a minimal subset of OCaml for educational purpose. It is statically-typed and compiled into a binary. This project aims incremental compiler development for my own programming language. Type inference, closure transform, mid-level IR are implemented.

bap - Binary Analysis Platform

  •    OCaml

The Carnegie Mellon University Binary Analysis Platform (CMU BAP) is a reverse engineering and program analysis platform that works with binary code and doesn't require the source code. BAP supports multiple architectures: ARM, x86, x86-64, PowerPC, and MIPS. BAP disassembles and lifts binary code into the RISC-like BAP Instruction Language (BIL). Program analysis is performed using the BIL representation and is architecture independent in a sense that it will work equally well for all supported architectures. The platform comes with a set of tools, libraries, and plugins. The documentation and tutorial are also available. The main purpose of BAP is to provide a toolkit for implementing automated program analysis. BAP is written in OCaml and it is the preferred language to write analysis, we have bindings to C, Python and Rust. The Primus Framework also provide a Lisp-like DSL for writing program analysis tools. BAP is developed in CMU, Cylab and is sponsored by various grants from the United States Department of Defense, Siemens AG, and the Korea government, see sponsors for more information.

llvm.js - LLVM compiled to JavaScript using Emscripten

  •    Javascript

You will hit errors on attempting to use tblgen and others, the build system is self-executing, but we generate bitcode that is unrunnable. When the errors happen, copy in the file from a parallel native build, and edit the Makefile of the parent dir (for llvm-tblgen, utils/, for llvm-config, tools/ and ./) that generates that file to not call it (otherwise, running make again will go back and overwrite the one you just copied in). Re-run make.


  •    Python

Import LLVM bitcode directly into Python and use it as an extension module. You'll need to have a pretty complete LLVM development environment installed on your machine. Bitey has been developed using LLVM/Clang-3.1. You might need to install it yourself.

Hikari - LLVM Obfuscator


English Documentation Hikari(Light in Japanese, name stolen from the Nintendo Switch game Xenoblade Chronicles 2) is my hackathon-ishtoy project for the 2017 Christmas to kill time.It's already stable enough to use in production environment. However, as initially planned, Hikari has been ported to LLVM 6.0 release version and no longer being actively maintained due to the time and effort it takes. You can find the history of its development at developer branch. Further enhancements include more features like Code-Intergrity Checking and a full anti-hook implementation. These are not open-source and will probably be released as a commercial product. If you know me close enough we can discuss the license model and pricing issue because I might not be able to provide real-time bug fix and stuff. Any undiscovered potential bugs affecting the obfuscated binary are fixed during obfuscation so you get a workable binary.

WAVM - WebAssembly Virtual Machine

  •    WebAssembly

This is a standalone VM for WebAssembly. It can load both the standard binary format, and the text format defined by the WebAssembly reference interpreter. For the text format, it can load both the standard stack machine syntax and the old-fashioned AST syntax used by the reference interpreter, and all of the testing commands. To build it, you'll need CMake and LLVM 6.0. If CMake can't find your LLVM directory, you can manually give it the location in the LLVM_DIR CMake configuration variable. Note that on Windows, you must compile LLVM from source, and manually point the LLVM_DIR configuration variable at <LLVM build directory>\lib\cmake\llvm.

sulong - Sulong, the LLVM bitcode implementation of Graal VM.

  •    Java

Sulong is a high-performance LLVM bitcode interpreter built on the GraalVM by Oracle Labs.Sulong is written in Java and uses the Truffle language implementation framework and Graal as a dynamic compiler.

Triton - Triton is a Dynamic Binary Analysis (DBA) framework

  •    C++

Triton is a dynamic binary analysis (DBA) framework. It provides internal components like a Dynamic Symbolic Execution (DSE) engine, a Taint engine, AST representations of the x86 and the x86-64 instructions set semantics, SMT simplification passes, an SMT Solver Interface and, the last but not least, Python bindings. Based on these components, you are able to build program analysis tools, automate reverse engineering and perform software verification. As Triton is still a young project, please, don't blame us if it is not yet reliable. Open issues or pull requests are always better than troll =).


  •    C

SkyEye is a very fast full system simulator which takes llvm as IR of dynmic compiled framework.. It can simulate series ARM, Coldfire,Mips, Powerpc, Sparc, x86 and Blackfin DSP Processor. Also can simulate multicore system by the multicore of host.

Udis86 Disassembler for x86 and x86-64

  •    C

Udis86 is an easy-to-use minimalistic disassembler library for the x86 and x86-64 instruction set architectures. The primary intent of the design and development of udis86 is to aid software development projects that entail binary code analysis.

echojs - an ahead of time compiler and runtime for ES6

  •    Javascript

The environment variable LLVM_SUFFIX can be set and its value will be appended to the names of all llvm executables (e.g. llvm-config-3.6 instead of llvm-config.) The default is -3.6. Change this if you have a different build of llvm you want to use. Homebrew installs llvm 3.6 executables without the suffix, thus export LLVM_SUFFIX=. As for MIN_OSX_VERSION: homebrew's formula for llvm (3.4, at least. haven't verified with 3.6) doesn't specify a -mmacosx-version-min= flag, so it builds to whatever you have on your machine. Node.js's gyp support in node-gyp, however, does put a -mmacosx-version-min=10.5 flag. A mismatch here causes the node-llvm binding to allocate llvm types using incorrect size calculations, and causes all manner of memory corruption. If you're either running 10.5 or 10.9, you can leave the variable unset. Otherwise, set it to the version of OSX you're running. Hopefully some discussion with the homebrew folks will get this fixed upstream.

llvmlite - A lightweight LLVM python binding for writing JIT compilers

  •    Python

The old llvmpy binding exposes a lot of LLVM APIs but the mapping of C++-style memory management to Python is error prone. Numba and many JIT compilers do not need a full LLVM API. Only the IR builder, optimizer, and JIT compiler APIs are necessary. The llvmlite.llvmpy namespace provides a minimal llvmpy compatibility layer.

kaleidoscope - Haskell LLVM JIT Compiler Tutorial

  •    Haskell

A short guide to building a tiny programming language in Haskell with LLVM. You will need GHC 7.8 or newer as well as LLVM 4.0. For information on installing LLVM 4.0 (not 3.9 or earlier) on your platform of choice, take a look at the instructions posted by the llvm-hs maintainers.

souper - A superoptimizer for LLVM IR

  •    C++

A superoptimizer for LLVM IR

capstone - Capstone disassembly/disassembler framework: Core (Arm, Arm64, EVM, M68K, M680X, Mips, PPC, Sparc, SystemZ, TMS320C64x, X86, X86_64, XCore) + bindings (Python, Java, Ocaml, PowerShell, Visual Basic)

  •    C

Capstone is a disassembly framework with the target of becoming the ultimate disasm engine for binary analysis and reversing in the security community. Support multiple hardware architectures: ARM, ARM64 (ARMv8), Ethereum VM, M68K, Mips, PPC, Sparc, SystemZ, TMS320C64X, M680X, XCore and X86 (including X86_64).

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.