ArchAssembler

  •        87

ArchAssembler is a .net (c#) library providing the functionalities of an assembler. Target architecture is x86/x64 with streaming SIMD extensions. Target executable file format is Windows Portable Executable (PE).

http://archassembler.codeplex.com/

Tags
Implementation
License
Platform

   




Related Projects

x86-ray-tracer - Ray tracer written in x86 assembler (with SSE/SSE2 extensions).


Ray tracer written in x86 assembler (with SSE/SSE2 extensions).

SIMD Array


Simple array class to use SSE and SSE2.

Cross-platform SIMD C Headers


A cross-platform, cross-compiler, cross-CPU C header library for programming with SIMD instruction sets. X86 (MMX/SSE/SSE2) GCC and MSVC, PPC Altivec GCC, WMMX ARM GCC, and software emulated SIMD are supported.

SIMDx86


This library is meant for high performance calculations for science or 3D games/rasterizers using SIMD instructions of x86 processors to allow an unparalleled level of optimization. This takes advantage of MMX, 3DNow!, 3DNow!+/MMX+, amp; SSE/SSE2/SSE3/SSSE3

simd - SIMD abstraction layer for sse, altivec and spu


SIMD abstraction layer for sse, altivec and spu



xbyak


a JIT assembler for x86(IA-32)/x64(AMD64, x86-64) MMX/SSE/SSE2/SSE3/SSSE3/SSE4/FPU/AVX/AVX2 by C++ header

herumi-xbyak


a JIT assembler for x86(IA-32)/x64(AMD64, x86-64) MMX/SSE/SSE2/SSE3/SSSE3/SSE4/FPU/AVX by C++ header

FluidSSE - SIMD using SSE 4


SIMD using SSE 4

ssela - SIMD (SSE) Linear Algebra package.


SIMD (SSE) Linear Algebra package.

sha256-simd - Pure Go implementation of SHA256 using SIMD instructions for Intel and ARM


Accelerate SHA256 computations in pure Go for both Intel (AVX2, AVX, SSE) as well as ARM (arm64) platforms.This package is designed as a drop-in replacement for crypto/sha256. For Intel CPUs it has three flavors for AVX2, AVX and SSE whereby the fastest method is automatically chosen depending on CPU capabilities. For ARM CPUs with the Cryptography Extensions advantage is taken of the SHA2 instructions resulting in a massive performance improvement.

SIMD Detector


This SIMD class helps developers to detect the types of SIMD instruction available on users' processor. It supports Intel and AMD CPUs. It is written in C++.

SIMD-Math-Test


A performance comparison between different implementations of an SIMD vector using SSE2

blake2b-simd - Fast hashing using pure Go implementation of BLAKE2b with SIMD instructions


Pure Go implementation of BLAKE2b using SIMD optimizations.This package was initially based on the pure go BLAKE2b implementation of Dmitry Chestnykh and merged with the (cgo dependent) AVX optimized BLAKE2 implementation (which in turn is based on the official implementation. It does so by using Go's Assembler for amd64 architectures with a golang only fallback for other architectures.

despacer - C library to remove white space from strings as fast as possible


We want to remove the space (' ') and the line feeds characters ('\n', '\r') from a string as fast as possible. To avoid unnecessary allocations, we wish to do the processing in-place.Note that clang seems to give better results than gcc.

FastPFor - The FastPFOR C++ library: Fast integer compression


A research library with integer compression schemes. It is broadly applicable to the compression of arrays of 32-bit integers where most integers are small. The library seeks to exploit SIMD instructions (SSE) whenever possible.This library can decode at least 4 billions of compressed integers per second on most desktop or laptop processors. That is, it can decompress data at a rate of 15 GB/s. This is significantly faster than generic codecs like gzip, LZO, Snappy or LZ4.

PE-Header - Displays the Microsoft Portable Executable headers from an executable file


Displays the Microsoft Portable Executable headers from an executable file

libjpeg_turbo


Dependency library libjpeg_turbo for WebRTC engine (as used in Open Peer C++ library). libjpeg-turbo is a JPEG image codec that uses SIMD instructions (MMX, SSE2, NEON) to accelerate baseline JPEG compression and decompression on x86, x86-64, and ARM systems.

libjpeg-turbo - Main libjpeg-turbo repository


libjpeg-turbo is a JPEG image codec that uses SIMD instructions (MMX, SSE2, NEON, AltiVec) to accelerate baseline JPEG compression and decompression on x86, x86-64, ARM, and PowerPC systems. On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg, all else being equal. On other types of systems, libjpeg-turbo can still outperform libjpeg by a significant amount, by virtue of its highly-optimized Huffman coding routines. In many cases, the performance of libjpeg-turbo rivals that of proprietary high-speed JPEG codecs.libjpeg-turbo implements both the traditional libjpeg API as well as the less powerful but more straightforward TurboJPEG API. libjpeg-turbo also features colorspace extensions that allow it to compress from/decompress to 32-bit and big-endian pixel buffers (RGBX, XBGR, etc.), as well as a full-featured Java interface.

LFMat


LFMat is an open source template fast C++ linear algebra library with storage compatible and asm specializations for 3DNow!, SSE, SSE2 and Altivec (e.g. for solvers) taking cache into account. There's a wide variety of structure and storage styles...