NyuziProcessor - GPGPU microprocessor architecture

  •        79

Nyuzi is an experimental GPGPU processor hardware design focused on compute intensive tasks. It is optimized for use cases like blockchain mining, deep learning, and autonomous driving. This project includes a synthesizable hardware design written in System Verilog, an instruction set emulator, an LLVM based C/C++ compiler, software libraries, and tests. It can be used to experiment with microarchitectural and instruction set design tradeoffs.




Related Projects

SystemMonitor - iOS application providing you all information about your device - hardware, operating system, processor, memory, GPU, network interface, storage and battery, including OpenGL powered visual representation in real time

  •    Objective-C

iOS application providing you all information about your device - hardware, operating system, processor, memory, GPU, network interface, storage and battery, including OpenGL powered visual representation in real time.

Starling Framework - The Cross Platform Game Engine

  •    ActionScript

The Starling Framework allows you to create hardware accelerated apps in ActionScript 3. The main target is the creation of 2D games, but Starling may be used for any graphical application. Thanks to Adobe AIR, Starling-based applications can be deployed to mobile devices (iOS, Android), the desktop (Windows, OS X), and to the browser (via the Flash plugin).

ROCm - ROCm - Open Source Platform for HPC and Ultrascale GPU Computing


The ROCm Platform brings a rich foundation to advanced computing by seamlessly integrating the CPU and GPU with the goal of solving real-world problems. This software enables the high-performance operation of AMD GPUs for computationally oriented tasks in the Linux operating system. ROCm is focused on using AMD GPUs to accelerate computational tasks such as machine learning, engineering workloads, and scientific computing. In order to focus our development efforts on these domains of interest, ROCm supports a targeted set of hardware configurations which are detailed further in this section.

picongpu - Particle-in-cell simulations for the exascale era :sparkles:

  •    C++

PIConGPU is a fully relativistic, manycore, 3D3V particle-in-cell (PIC) code. The Particle-in-Cell algorithm is a central tool in plasma physics. It describes the dynamics of a plasma by computing the motion of electrons and ions in the plasma based on Maxwell's equations. As one of our supported compute platforms, GPUs provide a computational performance of several TFLOP/s at considerable lower invest and maintenance costs compared to multi CPU-based compute architectures of similar performance. The latest high-performance systems (TOP500) are enhanced by accelerator hardware that boost their peak performance up to the multi-PFLOP/s level. With its outstanding performance and scalability to more than 18'000 GPUs, PIConGPU was one of the finalists of the 2013 Gordon Bell Prize.

tf-quant-finance - High-performance TensorFlow library for quantitative finance.

  •    Jupyter

This library provides high-performance components leveraging the hardware acceleration support and automatic differentiation of TensorFlow. The library will provide TensorFlow support for foundational mathematical methods, mid-level methods, and specific pricing models. The coverage is being rapidly expanded over the next few months. Foundational methods. Core mathematical methods - optimisation, interpolation, root finders, linear algebra, random and quasi-random number generation, etc.

MapD - The MapD Core database

  •    C++

MapD Core is an in-memory, column store, SQL relational database that was designed from the ground up to run on GPUs. MapD Core is the foundational element of a larger data exploration platform that emphasizes speed at scale. By taking advantage of the parallel processing power of the hardware, MapD Core can query billions of rows in milliseconds. Furthermore, by using the graphics pipelines of GPUs, MapD Core can render graphics directly from the server.

Kubernetes-GPU-Guide - This guide should help fellow researchers and hobbyists to easily automate and accelerate there deep leaning training with their own Kubernetes GPU cluster

  •    Shell

This guide should help fellow researchers and hobbyists to easily automate and accelerate there deep leaning training with their own Kubernetes GPU cluster. Therefore I will explain how to easily setup a GPU cluster on multiple Ubuntu 16.04 bare metal servers and provide some useful scripts and .yaml files that do the entire setup for you. By the way: If you need a Kubernetes GPU-cluster for other reasons, this guide might be helpful to you as well.

cgminer - ASIC / FPGA / GPU miner in c for bitcoin and litecoin

  •    bitcoin

This is a multi-threaded multi-pool GPU, FPGA and ASIC miner with ATI GPU monitoring, (over)clocking and fanspeed support for bitcoin and derivative coins. Do not use on multiple block chains at the same time!

gunrock - High-Performance Graph Primitives on GPUs

  •    Cuda

Gunrock is a CUDA library for graph-processing designed specifically for the GPU. It uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. For more details, please visit our website, read Why Gunrock, our TOPC 2017 paper Gunrock: GPU Graph Analytics, look at our results, and find more details in our publications. See Release Notes to keep up with the our latest changes.

emu - a language for programming GPUs, with a focus on ergonomics first and performance second

  •    Rust

⚠ Please note that while Emu 0.2.0 is quite usable, it suffers from 2 key issues. It firstly does nothing to minimize CPU-GPU data transfer and secondly it's compiler is not well-tested. These can be reasons not to use Emu 0.2.0. A new version of Emu is in the works, however, with significant improvements in the language, compiler, and compile-time checker. This new version of Emu should be released some time in Q4 of 2019. But unlike OpenCL/CUDA/Halide/Futhark, Emu is embedded in Rust. This lets it take advantage of the ecosystem in ways...

compute - A C++ GPU Computing Library for OpenCL

  •    C++

Boost.Compute is a GPU/parallel-computing library for C++ based on OpenCL. The core library is a thin C++ wrapper over the OpenCL API and provides access to compute devices, contexts, command queues and memory buffers.

neanderthal - Fast Clojure Matrix Library

  •    Clojure

Neanderthal is a Clojure library for fast matrix and linear algebra computations based on the highly optimized native libraries of BLAS and LAPACK computation routines for both CPU and GPU.. Read the documentation at Neanderthal Web Site.

Arraymancer - A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU, OpenCL and embedded devices

  •    Nim

Arraymancer is a tensor (N-dimensional array) project in Nim. The main focus is providing a fast and ergonomic CPU, Cuda and OpenCL ndarray library on which to build a scientific computing and in particular a deep learning ecosystem. The library is inspired by Numpy and PyTorch. The library provides ergonomics very similar to Numpy, Julia and Matlab but is fully parallel and significantly faster than those libraries. It is also faster than C-based Torch.

Rapid HDL


Rapid Hardware Definition Language (Rapid HDL) is an object oriented C# software library in to script/generate/build synthesizable Verilog for FPGA hardware and software co-design in Visual Studio. It also integrates and automates Xilinx or Mentor Graphics build tools.

lyon - 2D graphics rendering experiments in rust.

  •    Rust

A path tessellation library written in rust for GPU-based 2D graphics rendering. For now the goal is to provide efficient SVG-compliant path tessellation tools to help with rendering vector graphics on the GPU. For now think of this library as a way to turn complex paths into triangles for use in your own rendering engine.

vulkan_minimal_compute - Minimal Example of Using Vulkan for Compute Operations. Only ~400LOC.

  •    C++

This is a simple demo that demonstrates how to use Vulkan for compute operations only. In other words, this demo does nothing related to graphics, and only uses Vulkan to execute some computation on the GPU. For this demo, Vulkan is used to render the Mandelbrot set on the GPU. The demo is very simple, and is only ~400LOC. The code is heavily commented, so it should be useful for people interested in learning Vulkan. The application launches a compute shader that renders the mandelbrot set, by rendering it into a storage buffer. The storage buffer is then read from the GPU, and saved as .png. Check the source code comments for further info.

OpenVIDIA : Parallel GPU Computer Vision

  •    C

OpenVIDIA projects implement computer vision algorithms running on on graphics hardware such as single or multiple graphics processing units(GPUs) using OpenGL, Cg and CUDA-C. Some samples will soon support OpenCL and Direct Compute API's also.

vulkan - Vulkan API bindings for Go programming language

  •    Go

Package vulkan provides Go bindings for Vulkan — a low-overhead, cross-platform 3D graphics and compute API. Updated October 13, 2018 — Vulkan 1.1.88. Vulkan API is the result of 18 months in an intense collaboration between leading hardware, game engine and platform vendors, built on significant contributions from multiple Khronos members. Vulkan is designed for portability across multiple platforms with desktop and mobile GPU architectures.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.