Displaying 1 to 20 from 34 results

NyuziProcessor - GPGPU microprocessor architecture

  •    C++

Nyuzi is an experimental GPGPU processor hardware design focused on compute intensive tasks. It is optimized for use cases like blockchain mining, deep learning, and autonomous driving. This project includes a synthesizable hardware design written in System Verilog, an instruction set emulator, an LLVM based C/C++ compiler, software libraries, and tests. It can be used to experiment with microarchitectural and instruction set design tradeoffs.

emu - a language for programming GPUs, with a focus on ergonomics first and performance second

  •    Rust

⚠ Please note that while Emu 0.2.0 is quite usable, it suffers from 2 key issues. It firstly does nothing to minimize CPU-GPU data transfer and secondly it's compiler is not well-tested. These can be reasons not to use Emu 0.2.0. A new version of Emu is in the works, however, with significant improvements in the language, compiler, and compile-time checker. This new version of Emu should be released some time in Q4 of 2019. But unlike OpenCL/CUDA/Halide/Futhark, Emu is embedded in Rust. This lets it take advantage of the ecosystem in ways...

lingvo - Lingvo

  •    Python

Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models. A list of publications using Lingvo can be found here.




accelerate - Embedded language for high-performance array computations

  •    Haskell

Data.Array.Accelerate defines an embedded language of array computations for high-performance computing in Haskell. Computations on multi-dimensional, regular arrays are expressed in the form of parameterised collective operations (such as maps, reductions, and permutations). These computations are online-compiled and executed on a range of architectures. Chapter 6 of Simon Marlow's book Parallel and Concurrent Programming in Haskell contains a tutorial introduction to Accelerate.

neanderthal - Fast Clojure Matrix Library

  •    Clojure

Neanderthal is a Clojure library for fast matrix and linear algebra computations based on the highly optimized native libraries of BLAS and LAPACK computation routines for both CPU and GPU.. Read the documentation at Neanderthal Web Site.

tf-quant-finance - High-performance TensorFlow library for quantitative finance.

  •    Jupyter

This library provides high-performance components leveraging the hardware acceleration support and automatic differentiation of TensorFlow. The library will provide TensorFlow support for foundational mathematical methods, mid-level methods, and specific pricing models. The coverage is being rapidly expanded over the next few months. Foundational methods. Core mathematical methods - optimisation, interpolation, root finders, linear algebra, random and quasi-random number generation, etc.

Kubernetes-GPU-Guide - This guide should help fellow researchers and hobbyists to easily automate and accelerate there deep leaning training with their own Kubernetes GPU cluster

  •    Shell

This guide should help fellow researchers and hobbyists to easily automate and accelerate there deep leaning training with their own Kubernetes GPU cluster. Therefore I will explain how to easily setup a GPU cluster on multiple Ubuntu 16.04 bare metal servers and provide some useful scripts and .yaml files that do the entire setup for you. By the way: If you need a Kubernetes GPU-cluster for other reasons, this guide might be helpful to you as well.


picongpu - Particle-in-cell simulations for the exascale era :sparkles:

  •    C++

PIConGPU is a fully relativistic, manycore, 3D3V particle-in-cell (PIC) code. The Particle-in-Cell algorithm is a central tool in plasma physics. It describes the dynamics of a plasma by computing the motion of electrons and ions in the plasma based on Maxwell's equations. As one of our supported compute platforms, GPUs provide a computational performance of several TFLOP/s at considerable lower invest and maintenance costs compared to multi CPU-based compute architectures of similar performance. The latest high-performance systems (TOP500) are enhanced by accelerator hardware that boost their peak performance up to the multi-PFLOP/s level. With its outstanding performance and scalability to more than 18'000 GPUs, PIConGPU was one of the finalists of the 2013 Gordon Bell Prize.

cuda-api-wrappers - Thin C++-flavored wrappers for the CUDA Runtime API

  •    C++

nVIDIA's Runtime API for CUDA is intended for use both in C and C++ code. As such, it uses a C-style API, the lowest common denominator (with a few notable exceptions of templated function overloads). This library of wrappers around the Runtime API is intended to allow us to embrace many of the features of C++ (including some C++11) for using the runtime API - but without reducing expressivity or increasing the level of abstraction (as in, e.g., the Thrust library). Using cuda-api-wrappers, you still have your devices, streams, events and so on - but they will be more convenient to work with in more C++-idiomatic ways.

Arraymancer - A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU, OpenCL and embedded devices

  •    Nim

Arraymancer is a tensor (N-dimensional array) project in Nim. The main focus is providing a fast and ergonomic CPU, Cuda and OpenCL ndarray library on which to build a scientific computing and in particular a deep learning ecosystem. The library is inspired by Numpy and PyTorch. The library provides ergonomics very similar to Numpy, Julia and Matlab but is fully parallel and significantly faster than those libraries. It is also faster than C-based Torch.

FAST - Framework for Heterogeneous Medical Image Computing and Visualization

  •    C++

FAST (Framework for Heterogeneous Medical Image Computing and Visualization) is an open-source cross-platform framework with the main goal of making it easier to do processing and visualization of medical images on heterogeneous systems (CPU+GPU). A detailed description of the framework design can be found on the project wiki or in the research article: FAST: framework for heterogeneous medical image computing and visualization. Erik Smistad, Mohammadmehdi Bozorgi, Frank Lindseth. International Journal of Computer Assisted Radiology and Surgery. February 2015.

aer-engine - An OpenGL 4.3 / C++ 11 rendering engine oriented towards animation.

  •    C++

An OpenGL 4.3 / C++ 11 rendering engine oriented towards animation. The build was compiled against GCC 4.9.

accelerate-llvm - LLVM backend for Accelerate

  •    Haskell

This package compiles Accelerate code to LLVM IR, and executes that code on multicore CPUs as well as NVIDIA GPUs. This avoids the need to go through nvcc or clang. For details on Accelerate, refer to the main repository. We love all kinds of contributions, so feel free to open issues for missing features as well as report (or fix!) bugs on the issue tracker.

AnvilKit - AnvilKit tames Metal. Very much WIP.

  •    Swift

AnvilKit tames Metal. It's a collection of code that seems to come up in just about every project that everyone seems to roll themselves. Object that wraps MTLDevice and makes it into a singleton so that you don't need to pass it around.

gpuowl - GPU Mersenne primality test.

  •    C++

gpuOwl is a Mersenne (see http://mersenne.org/ ) primality tester implemented in OpenCL, that works well on AMD GPUs. gpuOwl implements the PRP test with a powerful self-validating algorithm that protects agains errors. gpuOwl uses FFT transforms of size 8M and 16M, and is best used with Mersenne exponents in the vicinity of 150M and 300M.

sushi2 - Matrix Library for JavaScript

  •    Javascript

This library is intended to be the fastest matrix library for JavaScript, with the power of GPU computing. To gain best performance, WebCL technology is used to access GPU from JavaScript. Since this project is written in TypeScript, transpiling to JavaScript is necessary.

Opt - Opt DSL

  •    Terra

Opt (optlang.org) is a new language in which a user simply writes energy functions over image- or graph-structured unknowns, and a compiler automatically generates state-of-the-art GPU optimization kernels. Real-world energy functions compile directly into highly optimized GPU solver implementations with performance competitive with the best published hand-tuned, application-specific GPU solvers. This is an alpha release of the software to get feedback on the expressiveness of the language. We are interested in seeing what problems can be expressed and what features will be necessary to support more problems.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.