mcsema - Framework for lifting x86, amd64, and aarch64 program binaries to LLVM bitcode

  •        11

McSema is an executable lifter. It translates ("lifts") executable binaries from native machine code to LLVM bitcode. LLVM bitcode is an intermediate representation form of a program that was originally created for the retargetable LLVM compiler, but which is also very useful for performing program analysis methods that would not be possible to perform on an executable binary directly. McSema enables analysts to find and retroactively harden binary programs against security bugs, independently validate vendor source code, and generate application tests with high code coverage. McSema isn’t just for static analysis. The lifted LLVM bitcode can also be fuzzed with libFuzzer, an LLVM-based instrumented fuzzer that would otherwise require the target source code. The lifted bitcode can even be compiled back into a runnable program! This is a procedure known as static binary rewriting, binary translation, or binary recompilation.

https://www.trailofbits.com/research-and-development/mcsema/
https://github.com/trailofbits/mcsema

Tags
Implementation
License
Platform

   




Related Projects

fcd - An optimizing decompiler


Fcd is an LLVM-based native program optimizing decompiler, released under an LLVM-style license. It started as a bachelor's degree senior project and carries forward its initial development philosophy of getting results fast. As such, it was architectured to have low coupling between distinct decompilation phases and to be highly hackable. Fcd uses a unique technique to reliably translate machine code to LLVM IR. Currently, it only supports x86_64. Disassembly uses Capstone. It implements pattern-independent structuring to provide a goto-free output.

emscripten - Emscripten: An LLVM-to-JavaScript Compiler


Emscripten is an LLVM-to-JavaScript compiler. It takes LLVM bitcode - which can be generated from C/C++, using llvm-gcc (DragonEgg) or clang, or any other language that can be converted into LLVM - and compiles that into JavaScript, which can be run on the web (or anywhere else JavaScript can run). Emscripten is available under 2 licenses, the MIT license and the University of Illinois/NCSA Open Source License.

elvm - EsoLangVM Compiler Infrastructure


ELVM is similar to LLVM but dedicated to Esoteric Languages. This project consists of two components - frontend and backend. Currently, the only frontend we have is a modified version of 8cc. The modified 8cc translates C code to an internal representation format called ELVM IR (EIR). Unlike LLVM bitcode, EIR is designed to be extremely simple, so there's more chance we can write a translator from EIR to an esoteric language. The above list contains languages which are known to be difficult to program in, but with ELVM, you can create programs in such languages. You can easily create Brainfuck programs by writing C code for example. One of interesting testcases ELVM has is a tiny Lisp interpreter. The all above language backends are passing the test, which means you can run Lisp on the above languages.

gocaml - :camel: Practical statically typed functional programming language implementation with Go and LLVM


GoCaml is subset of OCaml in Go based on MinCaml using LLVM. GoCaml adds many features to original MinCaml. MinCaml is a minimal subset of OCaml for educational purpose. It is statically-typed and compiled into a binary. This project aims incremental compiler development for my own programming language. Type inference, closure transform, mid-level IR are implemented.

llvm.js - LLVM compiled to JavaScript using Emscripten


You will hit errors on attempting to use tblgen and others, the build system is self-executing, but we generate bitcode that is unrunnable. When the errors happen, copy in the file from a parallel native build, and edit the Makefile of the parent dir (for llvm-tblgen, utils/, for llvm-config, tools/ and ./) that generates that file to not call it (otherwise, running make again will go back and overwrite the one you just copied in). Re-run make.


sulong - Sulong, the LLVM bitcode implementation of Graal VM.


Sulong is a high-performance LLVM bitcode interpreter built on the GraalVM by Oracle Labs.Sulong is written in Java and uses the Truffle language implementation framework and Graal as a dynamic compiler.

Triton - Triton is a Dynamic Binary Analysis (DBA) framework


Triton is a dynamic binary analysis (DBA) framework. It provides internal components like a Dynamic Symbolic Execution (DSE) engine, a Taint engine, AST representations of the x86 and the x86-64 instructions set semantics, SMT simplification passes, an SMT Solver Interface and, the last but not least, Python bindings. Based on these components, you are able to build program analysis tools, automate reverse engineering and perform software verification. As Triton is still a young project, please, don't blame us if it is not yet reliable. Open issues or pull requests are always better than troll =).

SkyEye


SkyEye is a very fast full system simulator which takes llvm as IR of dynmic compiled framework.. It can simulate series ARM, Coldfire,Mips, Powerpc, Sparc, x86 and Blackfin DSP Processor. Also can simulate multicore system by the multicore of host.

echojs - an ahead of time compiler and runtime for ES6


The environment variable LLVM_SUFFIX can be set and its value will be appended to the names of all llvm executables (e.g. llvm-config-3.6 instead of llvm-config.) The default is -3.6. Change this if you have a different build of llvm you want to use. Homebrew installs llvm 3.6 executables without the suffix, thus export LLVM_SUFFIX=. As for MIN_OSX_VERSION: homebrew's formula for llvm (3.4, at least. haven't verified with 3.6) doesn't specify a -mmacosx-version-min= flag, so it builds to whatever you have on your machine. Node.js's gyp support in node-gyp, however, does put a -mmacosx-version-min=10.5 flag. A mismatch here causes the node-llvm binding to allocate llvm types using incorrect size calculations, and causes all manner of memory corruption. If you're either running 10.5 or 10.9, you can leave the variable unset. Otherwise, set it to the version of OSX you're running. Hopefully some discussion with the homebrew folks will get this fixed upstream.

Udis86 Disassembler for x86 and x86-64


Udis86 is an easy-to-use minimalistic disassembler library for the x86 and x86-64 instruction set architectures. The primary intent of the design and development of udis86 is to aid software development projects that entail binary code analysis.

llvmlite - A lightweight LLVM python binding for writing JIT compilers


The old llvmpy binding exposes a lot of LLVM APIs but the mapping of C++-style memory management to Python is error prone. Numba and many JIT compilers do not need a full LLVM API. Only the IR builder, optimizer, and JIT compiler APIs are necessary. The llvmlite.llvmpy namespace provides a minimal llvmpy compatibility layer.

kaleidoscope - Haskell LLVM JIT Compiler Tutorial


A short guide to building a tiny programming language in Haskell with LLVM. You will need GHC 7.8 or newer as well as LLVM 4.0. For information on installing LLVM 4.0 (not 3.9 or earlier) on your platform of choice, take a look at the instructions posted by the llvm-hs maintainers.

souper - A superoptimizer for LLVM IR


A superoptimizer for LLVM IR

llvm - Mirror of official llvm git repository located at http://llvm.org/git/llvm. Updated hourly.


Mirror of official llvm git repository located at http://llvm.org/git/llvm. Updated hourly.

llvm-clang-samples - Examples of using the LLVM and Clang compilation libraries and tools


A collection of samples for using LLVM and Clang as libraries. LLVM & Clang evolve rapidly and the C++ API is not stable. This means that code that links against LLVM & Clang as libraries in version X may very well not compile or work in version X+1.

cemu - Cheap EMUlator: lightweight multi-architecture assembly playground


Writing assembly is fun. Assembly is the lowest language (humanly understandable) available to communicate with computers, and is crucial to understand the internal mechanisms of any machine. Unfortunately, setting up an environment to write, compile and run assembly for various architectures (x86, ARM, MIPS, SPARC) has always been painful. CEmu is an attempt to fix this by providing a bundled GUI application that empowers users to write assembly and test it by compiling it to bytecode and executing it in an QEMU-based emulator. CEmu combines all the advantages of a basic assembly IDE, compilation and execution environment, by relying on the great libraries Keystone, Unicorn and Capstone engines in a Qt powered GUI.

llvm-mirror


NOTE: The LLVM project now operates official Git mirrors as well: http://llvm.org/docs/GettingStarted.html#git_mirror -- An automated mirror of llvm/trunk from LLVM's SVN. Updates hourly. Release branches and tags are tracked manually. This mirror is *not* commit-ID compatible with the official Git mirrors.

ruby-llvm - LLVM bindings for Ruby


LLVM bindings for Ruby

checkedc - Checked C is an extension of C that adds bounds checking to C


Checked C is an extension to C that adds static and dynamic checking to detect or prevent common programming errors such as buffer overruns, out-of-bounds memory accesses, and incorrect type casts. This repo contains the specification for the extension, test code, and samples. For the latest version of the specification and the draft of the next version, see the Checked C releases page.We are creating a modified version of LLVM/clang that supports Checked C. The code for the modified version of LLVM/clang lives in the Checked C clang repo and the Checked C LLVM repo.

gef - GEF - GDB Enhanced Features for exploit devs & reversers


GEF is a kick-ass set of commands for X86, ARM, MIPS, PowerPC and SPARC to make GDB cool again for exploit dev. It is aimed to be used mostly by exploiters and reverse-engineers, to provide additional features to GDB using the Python API to assist during the process of dynamic analysis and exploit development. It has full support for both Python2 and Python3 indifferently (as more and more distros start pushing gdb compiled with Python3 support).