SIMDxorshift - Fast random number generators: Vectorized (SIMD) version of xorshift128+

  •        19

Xorshift are a family of pseudo-random number generators (RNG) invented by Marsaglia.We present a vectorized version of xorshift128+, a popular random-number generator part of this family. It is written in C. The implementation uses Intel's SIMD instructions and is based on Vigna's original (pure C) implementation.

https://github.com/lemire/SIMDxorshift

Tags
Implementation
License
Platform

   




Related Projects

Vc - SIMD Vector Classes for C++

  •    C++

Recent generations of CPUs, and GPUs in particular, require data-parallel codes for full efficiency. Data parallelism requires that the same sequence of operations is applied to different input data. CPUs and GPUs can thus reduce the necessary hardware for instruction decoding and scheduling in favor of more arithmetic and logic units, which execute the same instructions synchronously. On CPU architectures this is implemented via SIMD registers and instructions. A single SIMD register can store N values and a single SIMD instruction can execute N operations on those values. On GPU architectures N threads run in perfect sync, fed by a single instruction decoder/scheduler. Each thread has local memory and a given index to calculate the offsets in memory for loads and stores. Current C++ compilers can do automatic transformation of scalar codes to SIMD instructions (auto-vectorization). However, the compiler must reconstruct an intrinsic property of the algorithm that was lost when the developer wrote a purely scalar implementation in C++. Consequently, C++ compilers cannot vectorize any given code to its most efficient data-parallel variant. Especially larger data-parallel loops, spanning over multiple functions or even translation units, will often not be transformed into efficient SIMD code.

xsimd - Modern, portable C++ wrappers for SIMD intrinsics and parallelized, optimized math implementations

  •    C++

SIMD (Single Instruction, Multiple Data) is a feature of microprocessors that has been available for many years. SIMD instructions perform a single operation on a batch of values at once, and thus provide a way to significantly accelerate code execution. However, these instructions differ between microprocessor vendors and compilers. xsimd provides a unified means for using these features for library authors. Namely, it enables manipulation of batches of numbers with the same arithmetic operators as for single values. It also provides accelerated implementation of common mathematical functions operating on batches.

simdjson-go - Golang port of simdjson: parsing gigabytes of JSON per second

  •    Go

This is a Golang port of simdjson, a high performance JSON parser developed by Daniel Lemire and Geoff Langdale. It makes extensive use of SIMD instructions to achieve parsing performance of gigabytes of JSON per second. Performance wise, simdjson-go runs on average at about 40% to 60% of the speed of simdjson. Compared to Golang's standard package encoding/json, simdjson-go is about 10x faster.

libsimdpp - Portable header-only zero-overhead C++ low level SIMD library

  •    C++

libsimdpp is a portable header-only zero-overhead C++ low level SIMD library. The library presents a single interface over SIMD instruction sets present in x86, ARM, PowerPC and MIPS architectures. On architectures that support different SIMD instruction sets the library allows the same source code files to be compiled for each SIMD instruction set and then hooked into an internal or third-party dynamic dispatch mechanism. This allows the capabilities of the processor to be queried on runtime and the most efficient implementation to be selected. The library sits somewhere in the middle between programming directly in SIMD intrinsics and even higher-level SIMD libraries. As much control as possible is given to the developer, so that it's possible to exactly predict what code the compiler will generate.

ecmascript_simd - SIMD numeric type for EcmaScript

  •    Javascript

SIMD.js has been taken out of active development in TC39 and removed from Stage 3, and is not being pursued by web browsers for implementation. SIMD operations exposed to the web are under active development within WebAssembly, with operations based on the SIMD.js operations. With WebAssembly in advanced development or shipping in multiple browsers, it seems like an adequate vehicle to subsume asm.js use cases, which are judged to be the broader cases. Although some developers have expressed interest in using SIMD.js outside of asm.js, implementers have found that implementing and optimizing for this case reliably creates a lot of complexity, and have made the decision to focus instead on delivering WebAssembly and SIMD instructions in WASM.


Simd - C++ image processing library with using of SIMD: SSE, SSE2, SSE3, SSSE3, SSE4

  •    C++

The Simd Library is a free open source image processing library, designed for C and C++ programmers. It provides many useful high performance algorithms for image processing such as: pixel format conversion, image scaling and filtration, extraction of statistic information from images, motion detection, object detection (HAAR and LBP classifier cascades) and classification, neural network. The algorithms are optimized with using of different SIMD CPU extensions. In particular the library supports following CPU extensions: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX-512 for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC (big-endian), NEON for ARM.

SIMD Detector

  •    DotNet

This SIMD class helps developers to detect the types of SIMD instruction available on users' processor. It supports Intel and AMD CPUs. It is written in C++.

FastPFor - The FastPFOR C++ library: Fast integer compression

  •    C++

A research library with integer compression schemes. It is broadly applicable to the compression of arrays of 32-bit integers where most integers are small. The library seeks to exploit SIMD instructions (SSE) whenever possible.This library can decode at least 4 billions of compressed integers per second on most desktop or laptop processors. That is, it can decompress data at a rate of 15 GB/s. This is significantly faster than generic codecs like gzip, LZO, Snappy or LZ4.

simdjson - Parsing gigabytes of JSON per second

  •    C++

JSON documents are everywhere on the Internet. Servers spend a lot of time parsing these documents. We want to accelerate the parsing of JSON per se using commonly available SIMD instructions as much as possible while doing full validation (including character encoding). A description of the design and implementation of simdjson appears at https://arxiv.org/abs/1902.08318 and an informal blog post providing some background and context is at https://branchfree.org/2019/02/25/paper-parsing-gigabytes-of-json-per-second/.

Cross-platform SIMD C Headers

  •    C

A cross-platform, cross-compiler, cross-CPU C header library for programming with SIMD instruction sets. X86 (MMX/SSE/SSE2) GCC and MSVC, PPC Altivec GCC, WMMX ARM GCC, and software emulated SIMD are supported.

pulpino - An open-source microcontroller system based on RISC-V

  •    C

PULPino is an open-source single-core microcontroller system, based on 32-bit RISC-V cores developed at ETH Zurich. PULPino is configurable to use either the RISCY or the zero-riscy core. RISCY is an in-order, single-issue core with 4 pipeline stages and it has an IPC close to 1, full support for the base integer instruction set (RV32I), compressed instructions (RV32C) and multiplication instruction set extension (RV32M). It can be configured to have single-precision floating-point instruction set extension (RV32F). It implements several ISA extensions such as: hardware loops, post-incrementing load and store instructions, bit-manipulation instructions, MAC operations, support fixed-point operations, packed-SIMD instructions and the dot product. It has been designed to increase the energy efficiency of in ultra-low-power signal processing applications. RISCY implementes a subset of the 1.9 privileged specification. Further informations can be found in http://ieeexplore.ieee.org/abstract/document/7864441/.

ArchAssembler

  •    CSharp

ArchAssembler is a .net (c#) library providing the functionalities of an assembler. Target architecture is x86/x64 with streaming SIMD extensions. Target executable file format is Windows Portable Executable (PE).

stdsimd - Experiments with adding SIMD support to Rust's standard library.

  •    Rust

stdsimd is now shipped with Rust's std library - its is part of libcore and libstd. The easiest way to use it is just to import it via use std::arch.

cgmath - A linear algebra and mathematics library for computer graphics.

  •    Rust

A linear algebra and mathematics library for computer graphics. Not all of the functionality has been implemented yet, and the existing code is not fully covered by the testsuite. If you encounter any mistakes or omissions please let me know by posting an issue, or even better: send me a pull request with a fix.

faster - SIMD for humans

  •    Rust

Easy, powerful, portable, absurdly fast numerical calculations. Includes static dispatch with inlining based on your platform and vector types, zero-allocation iteration, vectorized loading/storing, and support for uneven collections. The vector size is entirely determined by the machine you’re compiling for - it attempts to use the largest vector size supported by your machine, and works on any platform or architecture (see below for details).

stdarch - Rust's standard library vendor-specific APIs and run-time feature detection

  •    HTML

std_detect implements std::detect - Rust's standard library run-time CPU feature detection. The std::simd component now lives in the packed_simd crate.

SMVector3

  •    

Vector3 class implemented as float array or with SIMD instructions with the same interface so it is transparant whether you decide to use one version or another. You can also change version during the life cycle of the projects.

libjpeg-turbo - Main libjpeg-turbo repository

  •    C

libjpeg-turbo is a JPEG image codec that uses SIMD instructions (MMX, SSE2, NEON, AltiVec) to accelerate baseline JPEG compression and decompression on x86, x86-64, ARM, and PowerPC systems. On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg, all else being equal. On other types of systems, libjpeg-turbo can still outperform libjpeg by a significant amount, by virtue of its highly-optimized Huffman coding routines. In many cases, the performance of libjpeg-turbo rivals that of proprietary high-speed JPEG codecs.libjpeg-turbo implements both the traditional libjpeg API as well as the less powerful but more straightforward TurboJPEG API. libjpeg-turbo also features colorspace extensions that allow it to compress from/decompress to 32-bit and big-endian pixel buffers (RGBX, XBGR, etc.), as well as a full-featured Java interface.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.