Displaying 1 to 20 from 58 results

mkl-dnn - Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN)

  •    C++

Intel MKL-DNN repository migrated to https://github.com/intel/mkl-dnn. The old address will continue to be available and will redirect to the new repo. Please update your links. Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN) is an open source performance library for deep learning applications. The library accelerates deep learning applications and framework on Intel(R) architecture. Intel(R) MKL-DNN contains vectorized and threaded building blocks which you can use to implement deep neural networks (DNN) with C and C++ interfaces.

ncnn - ncnn is a high-performance neural network inference framework optimized for the mobile platform

  •    C

ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployment and uses on mobile phones from the beginning of design. ncnn does not have third party dependencies. it is cross-platform, and runs faster than all known open source frameworks on mobile phone cpu. Developers can easily deploy deep learning algorithm models to the mobile platform by using efficient ncnn implementation, create intelligent APPs, and bring the artificial intelligence to your fingertips. ncnn is currently being used in many Tencent applications, such as QQ, Qzone, WeChat, Pitu and so on.

NNPACK - Acceleration package for neural networks on multi-core CPUs

  •    C

NNPACK is an acceleration package for neural network computations. NNPACK aims to provide high-performance implementations of convnet layers for multi-core CPUs. NNPACK is not intended to be directly used by machine learning researchers; instead it provides low-level performance primitives leveraged in leading deep learning frameworks, such as PyTorch, Caffe2, MXNet, tiny-dnn, Caffe, Torch, and Darknet.




js - turbo.js - perform massive parallel computations in your browser with GPGPU.

  •    Javascript

turbo.js is a small library that makes it easier to perform complex calculations that can be done in parallel. The actual calculation performed (the kernel executed) uses the GPU for execution. This enables you to work on an array of values all at once. turbo.js is compatible with all browsers (even IE when not using ES6 template strings) and most desktop and mobile GPUs.

simdjson - Parsing gigabytes of JSON per second

  •    C++

JSON documents are everywhere on the Internet. Servers spend a lot of time parsing these documents. We want to accelerate the parsing of JSON per se using commonly available SIMD instructions as much as possible while doing full validation (including character encoding). A description of the design and implementation of simdjson appears at https://arxiv.org/abs/1902.08318 and an informal blog post providing some background and context is at https://branchfree.org/2019/02/25/paper-parsing-gigabytes-of-json-per-second/.

faster - SIMD for humans

  •    Rust

Easy, powerful, portable, absurdly fast numerical calculations. Includes static dispatch with inlining based on your platform and vector types, zero-allocation iteration, vectorized loading/storing, and support for uneven collections. The vector size is entirely determined by the machine you’re compiling for - it attempts to use the largest vector size supported by your machine, and works on any platform or architecture (see below for details).

Simd - C++ image processing library with using of SIMD: SSE, SSE2, SSE3, SSSE3, SSE4

  •    C++

The Simd Library is a free open source image processing library, designed for C and C++ programmers. It provides many useful high performance algorithms for image processing such as: pixel format conversion, image scaling and filtration, extraction of statistic information from images, motion detection, object detection (HAAR and LBP classifier cascades) and classification, neural network. The algorithms are optimized with using of different SIMD CPU extensions. In particular the library supports following CPU extensions: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX-512 for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC (big-endian), NEON for ARM.


EntityComponentSystemSamples

  •    CSharp

Here you can find the resources required to start building with these new systems today. We have also provided a new forum where you can find more information and share your experiences with these new systems.

libsimdpp - Portable header-only zero-overhead C++ low level SIMD library

  •    C++

libsimdpp is a portable header-only zero-overhead C++ low level SIMD library. The library presents a single interface over SIMD instruction sets present in x86, ARM, PowerPC and MIPS architectures. On architectures that support different SIMD instruction sets the library allows the same source code files to be compiled for each SIMD instruction set and then hooked into an internal or third-party dynamic dispatch mechanism. This allows the capabilities of the processor to be queried on runtime and the most efficient implementation to be selected. The library sits somewhere in the middle between programming directly in SIMD intrinsics and even higher-level SIMD libraries. As much control as possible is given to the developer, so that it's possible to exactly predict what code the compiler will generate.

cgmath - A linear algebra and mathematics library for computer graphics.

  •    Rust

A linear algebra and mathematics library for computer graphics. Not all of the functionality has been implemented yet, and the existing code is not fully covered by the testsuite. If you encounter any mistakes or omissions please let me know by posting an issue, or even better: send me a pull request with a fix.

GLM - OpenGL Mathematics (GLM)

  •    C++

OpenGL Mathematics (GLM) is a header only C++ mathematics library for graphics software based on the OpenGL Shading Language (GLSL) specifications. GLM provides classes and functions designed and implemented with the same naming conventions and functionality than GLSL so that anyone who knows GLSL, can use GLM as well in C++.

Vc - SIMD Vector Classes for C++

  •    C++

Recent generations of CPUs, and GPUs in particular, require data-parallel codes for full efficiency. Data parallelism requires that the same sequence of operations is applied to different input data. CPUs and GPUs can thus reduce the necessary hardware for instruction decoding and scheduling in favor of more arithmetic and logic units, which execute the same instructions synchronously. On CPU architectures this is implemented via SIMD registers and instructions. A single SIMD register can store N values and a single SIMD instruction can execute N operations on those values. On GPU architectures N threads run in perfect sync, fed by a single instruction decoder/scheduler. Each thread has local memory and a given index to calculate the offsets in memory for loads and stores. Current C++ compilers can do automatic transformation of scalar codes to SIMD instructions (auto-vectorization). However, the compiler must reconstruct an intrinsic property of the algorithm that was lost when the developer wrote a purely scalar implementation in C++. Consequently, C++ compilers cannot vectorize any given code to its most efficient data-parallel variant. Especially larger data-parallel loops, spanning over multiple functions or even translation units, will often not be transformed into efficient SIMD code.

pikkr - JSON parser which picks up values directly without performing tokenization in Rust

  •    Rust

Pikkr is a JSON parser which picks up values directly without performing tokenization in Rust. This JSON parser is implemented based on Y. Li, N. R. Katsipoulakis, B. Chandramouli, J. Goldstein, and D. Kossmann. Mison: a fast JSON parser for data analytics. In VLDB, 2017. This JSON parser performs well when there are a limited number of different JSON structural variants in a JSON data stream or JSON collection, and that is a common case in data analytics field.

stdarch - Rust's standard library vendor-specific APIs and run-time feature detection

  •    HTML

std_detect implements std::detect - Rust's standard library run-time CPU feature detection. The std::simd component now lives in the packed_simd crate.

SIMD Detector

  •    DotNet

This SIMD class helps developers to detect the types of SIMD instruction available on users' processor. It supports Intel and AMD CPUs. It is written in C++.

ArchAssembler

  •    CSharp

ArchAssembler is a .net (c#) library providing the functionalities of an assembler. Target architecture is x86/x64 with streaming SIMD extensions. Target executable file format is Windows Portable Executable (PE).

stdsimd - Experiments with adding SIMD support to Rust's standard library.

  •    Rust

stdsimd is now shipped with Rust's std library - its is part of libcore and libstd. The easiest way to use it is just to import it via use std::arch.

TurboPFor - Fastest Integer Compression

  •    C

Generate and test (zipfian) skewed distribution (100.000.000 integers, Block size=128/256) Note: Unlike general purpose compression, a small fixed size (ex. 128 integers) is in general used in "integer compression". Large blocks involved, while processing queries (inverted index, search engines, databases, graphs, in memory computing,...) need to be entirely decoded. (*) codecs inefficient for small block sizes are tested with 64Ki integers/block.