Displaying 1 to 11 from 11 results

Vc - SIMD Vector Classes for C++

  •    C++

Recent generations of CPUs, and GPUs in particular, require data-parallel codes for full efficiency. Data parallelism requires that the same sequence of operations is applied to different input data. CPUs and GPUs can thus reduce the necessary hardware for instruction decoding and scheduling in favor of more arithmetic and logic units, which execute the same instructions synchronously. On CPU architectures this is implemented via SIMD registers and instructions. A single SIMD register can store N values and a single SIMD instruction can execute N operations on those values. On GPU architectures N threads run in perfect sync, fed by a single instruction decoder/scheduler. Each thread has local memory and a given index to calculate the offsets in memory for loads and stores. Current C++ compilers can do automatic transformation of scalar codes to SIMD instructions (auto-vectorization). However, the compiler must reconstruct an intrinsic property of the algorithm that was lost when the developer wrote a purely scalar implementation in C++. Consequently, C++ compilers cannot vectorize any given code to its most efficient data-parallel variant. Especially larger data-parallel loops, spanning over multiple functions or even translation units, will often not be transformed into efficient SIMD code.

Liam F

  •    DotNet

Basic mathematical functions implemented using AVX

aftermath

  •    

Provides sse/avx implementation for matrix storage, access and basic operations, probability distributions and fast ziggurat random number generators.




xsimd - Modern, portable C++ wrappers for SIMD intrinsics and parallelized, optimized math implementations

  •    C++

SIMD (Single Instruction, Multiple Data) is a feature of microprocessors that has been available for many years. SIMD instructions perform a single operation on a batch of values at once, and thus provide a way to significantly accelerate code execution. However, these instructions differ between microprocessor vendors and compilers. xsimd provides a unified means for using these features for library authors. Namely, it enables manipulation of batches of numbers with the same arithmetic operators as for single values. It also provides accelerated implementation of common mathematical functions operating on batches.

despacer - C library to remove white space from strings as fast as possible

  •    C

We want to remove the space (' ') and the line feeds characters ('\n', '\r') from a string as fast as possible. To avoid unnecessary allocations, we wish to do the processing in-place.Note that clang seems to give better results than gcc.

sha256-simd - Pure Go implementation of SHA256 using SIMD instructions for Intel and ARM

  •    Assembly

Accelerate SHA256 computations in pure Go for both Intel (AVX2, AVX, SSE) as well as ARM (arm64) platforms.This package is designed as a drop-in replacement for crypto/sha256. For Intel CPUs it has three flavors for AVX2, AVX and SSE whereby the fastest method is automatically chosen depending on CPU capabilities. For ARM CPUs with the Cryptography Extensions advantage is taken of the SHA2 instructions resulting in a massive performance improvement.

opal - SIMD C/C++ library for massive optimal sequence alignment (local/SW, infix, overlap, global)

  •    C++

This work has been supported in part by Croatian Science Fundation under the project UIP-11-2013-7353. Opal (ex Swimd) is SIMD C/C++ library for massive optimal sequence alignment. Opal is implemented mainly by Rognes's "Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation". Main difference is that Opal offers support for AVX2 and 4 alignment modes instead of just Smith-Waterman.


go-memset - An efficient memset implementation for Golang.

  •    Go

An efficient memset implementation for Golang. go-memset provides a Memset function which uses an assembly implementation on x86-64 and can provide performance equivalent to the optimised first loop.