Displaying 1 to 18 from 18 results

mkl-dnn - Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN)

  •    C++

Intel MKL-DNN repository migrated to https://github.com/intel/mkl-dnn. The old address will continue to be available and will redirect to the new repo. Please update your links. Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN) is an open source performance library for deep learning applications. The library accelerates deep learning applications and framework on Intel(R) architecture. Intel(R) MKL-DNN contains vectorized and threaded building blocks which you can use to implement deep neural networks (DNN) with C and C++ interfaces.

Simd - C++ image processing library with using of SIMD: SSE, SSE2, SSE3, SSSE3, SSE4

  •    C++

The Simd Library is a free open source image processing library, designed for C and C++ programmers. It provides many useful high performance algorithms for image processing such as: pixel format conversion, image scaling and filtration, extraction of statistic information from images, motion detection, object detection (HAAR and LBP classifier cascades) and classification, neural network. The algorithms are optimized with using of different SIMD CPU extensions. In particular the library supports following CPU extensions: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX-512 for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC (big-endian), NEON for ARM.

libsimdpp - Portable header-only zero-overhead C++ low level SIMD library

  •    C++

libsimdpp is a portable header-only zero-overhead C++ low level SIMD library. The library presents a single interface over SIMD instruction sets present in x86, ARM, PowerPC and MIPS architectures. On architectures that support different SIMD instruction sets the library allows the same source code files to be compiled for each SIMD instruction set and then hooked into an internal or third-party dynamic dispatch mechanism. This allows the capabilities of the processor to be queried on runtime and the most efficient implementation to be selected. The library sits somewhere in the middle between programming directly in SIMD intrinsics and even higher-level SIMD libraries. As much control as possible is given to the developer, so that it's possible to exactly predict what code the compiler will generate.

Vc - SIMD Vector Classes for C++

  •    C++

Recent generations of CPUs, and GPUs in particular, require data-parallel codes for full efficiency. Data parallelism requires that the same sequence of operations is applied to different input data. CPUs and GPUs can thus reduce the necessary hardware for instruction decoding and scheduling in favor of more arithmetic and logic units, which execute the same instructions synchronously. On CPU architectures this is implemented via SIMD registers and instructions. A single SIMD register can store N values and a single SIMD instruction can execute N operations on those values. On GPU architectures N threads run in perfect sync, fed by a single instruction decoder/scheduler. Each thread has local memory and a given index to calculate the offsets in memory for loads and stores. Current C++ compilers can do automatic transformation of scalar codes to SIMD instructions (auto-vectorization). However, the compiler must reconstruct an intrinsic property of the algorithm that was lost when the developer wrote a purely scalar implementation in C++. Consequently, C++ compilers cannot vectorize any given code to its most efficient data-parallel variant. Especially larger data-parallel loops, spanning over multiple functions or even translation units, will often not be transformed into efficient SIMD code.

TurboPFor - Fastest Integer Compression

  •    C

Generate and test (zipfian) skewed distribution (100.000.000 integers, Block size=128/256) Note: Unlike general purpose compression, a small fixed size (ex. 128 integers) is in general used in "integer compression". Large blocks involved, while processing queries (inverted index, search engines, databases, graphs, in memory computing,...) need to be entirely decoded. (*) codecs inefficient for small block sizes are tested with 64Ki integers/block.

gorse - A High Performance Recommender System Package based on Collaborative Filtering for Go

  •    Go

More examples could be found in the example folder. All models are tested by 5-fold cross validation on a PC with Intel(R) Core(TM) i5-4590 CPU (3.30GHz) and 16.0GB RAM. All scores are the best scores achieved by gorse yet.

fastbase64 - SIMD-accelerated base64 codecs

  •    C

We are investigating the possibility of SIMD-accelerated base64 codecs.We extend's Nick Galbreath's base64 library (this high-performance library is used in Chromium).

highwayhash - Node.js implementation of HighwayHash, Google's fast and strong hash function

  •    Javascript

Node.js implementation of Google's HighwayHash.Based on SipHash, it is believed to be robust against hash flooding and timing attacks because memory accesses are sequential and the algorithm is branch-free.

argon2 - Implementation of argon2 (i, d, id) algorithms with CPU dispatching

  •    C++

There are also HashWithCustomMemory and VerifyWithCustomMemory methods to which you can pass a memory area to use it for computations and to save a little on memory allocation. GetMemorySize method returns the size of memory area that required for a particular instance. The library uses constexpr to calculate some values at compile time. mcost value is a template variable, so the library doesn't support arbitrary mcost values except for predefined ones (in practise you usually don't need it).

ksim - The little simulator that could.

  •    C

The little simulator that could. ksim is a simulator for Intel Skylake GPUs. It grew out of two tools I wrote a while back for capturing and decoding the output of the Intel open source drivers. Once you're capturing and decoding the command stream output by the driver it's a small step to start interpreting the stream. Initially, it was exciting to see it fetch vertices, but it has now snowballed into a fairly competent and efficient software rasterizer. As of March 22nd, ksim runs all major GL/Vulkan shader stages (vertex, hull, domain, geometry and fragment shaders) as well as compute shaders. It JIT compiles the Intel GPU ISA to AVX2 code on-the-fly using its own IR and compiler.

TurboTranspose - Integer + Floating Point Compression Filter

  •    C

🆕 Download IcApp a new benchmark for TurboPFor+TurboTranspose for testing allmost all integer and floating point file types. Note: Lossy compression benchmark with icapp only.

base64simd - Base64 coding and decoding with SIMD instructions (SSE/AVX2/AVX512F/AVX512BW/AVX512VBMI/ARM Neon)

  •    C++

Repository contains code for encoding and decoding base64 using SIMD instructions. Depending on CPU's architecture, vectorized encoding is faster than scalar versions by factor from 2 to 4; decoding is faster 2 .. 2.7 times. Daniel Lemire and I wrote also paper Faster Base64 Encoding and Decoding Using AVX2 Instructions which was published by ACM Transactiona on the Web.

parsing-int-series - Parse multiple decimal integers separated by arbitrary number of delimiters

  •    C++

Parsers extract integer numbers from strings. A number can be prepended by a sign character. The numbers are separated by arbitrary sequences of separator chars. All other characters are invalid and the parsers detects them and raise exception. Requires: C++11 compiler (tested with GCC 7.3) and Python 2.7.

sse-popcount - SIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html

  •    C++

Daniel Lemire, Nathan Kurz and I published an article Faster Population Counts using AVX2 Instructions. Subdirectory original contains code from 2008 --- it is 32-bit and GCC-centric. The root directory contains fresh C++11 code, written with intrinsics and tested on 64-bit machines.

sse4-strstr - SIMD (SWAR/SSE/SSE4/AVX2/AVX512F/ARM Neon) of Karp-Rabin algorithm's modification

  •    C++

Sample programs for article "SIMD-friendly algorithms for substring searching" (http://0x80.pl/articles/simd-strfind.html). The root directory contains C++11 procedures implemented using intrinsics for SSE, SSE4, AVX2, AVX512F, AVX512BW and ARM Neon (both ARMv7 and ARMv8).

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.