•        188

managedCUDA makes the CUDA Driver API available in .net applications written in C#, Visual Basic or any other .net language. It also includes classes for an easy handling and interop with CUDA, i.e. build-in CUDA types like float3.



Related Projects

ManagedCuda Galaxy Simulator

This project is a test of ManagedCuda and graphics interop to OpenTK to simulate a simple galaxy on the GPU.

vexcl - VexCL is a C++ vector expression template library for OpenCL/CUDA

VexCL is a vector expression template library for OpenCL/CUDA. It has been created for ease of GPGPU development with C++. VexCL strives to reduce amount of boilerplate code needed to develop GPGPU applications. The library provides convenient and intuitive notation for vector arithmetic, reduction, sparse matrix-vector products, etc. Multi-device and even multi-platform computations are supported. The source code of the library is distributed under very permissive MIT license.


A wrapper for NVidia's CuBLAS (Compute Unified Basic Linear Algebra Subprograms) for the CLR.


Optix.NET is a .NET wrapper for the Nvidia Optix GPU ray-tracing library.

scikit-cuda - Python interface to GPU-powered libraries

scikit-cuda provides Python interfaces to many of the functions in the CUDA device/runtime, CUBLAS, CUFFT, and CUSOLVER libraries distributed as part of NVIDIA's CUDA Programming Toolkit, as well as interfaces to select functions in the CULA Dense Toolkit. Both low-level wrapper functions similar to their C counterparts and high-level functions comparable to those in NumPy and Scipy are provided. Package documentation is available at Many of the high-level functions have examples in their docstrings. More illustrations of how to use both the wrappers and high-level functions can be found in the demos/ and tests/ subdirectories.

NyuziProcessor - GPGPU microprocessor architecture

Nyuzi is an experimental GPGPU processor hardware design focused on compute intensive tasks. It is optimized for use cases like blockchain mining, deep learning, and autonomous driving. This project includes a synthesizable hardware design written in System Verilog, an instruction set emulator, an LLVM based C/C++ compiler, software libraries, and tests. It can be used to experiment with microarchitectural and instruction set design tradeoffs.

mshadow - Matrix Shadow:Lightweight CPU/GPU Matrix and Tensor Template Library in C++/CUDA for (Deep) Machine Learning

MShadow is a lightweight CPU/GPU Matrix/Tensor Template Library in C++/CUDA. The goal of mshadow is to support efficient, device invariant and simple tensor library for machine learning project that aims for maximum performance and control, while also emphasize simplicity.MShadow also provides interface that allows writing Multi-GPU and distributed deep learning programs in an easy and unified way.

coriander - Build NVIDIA® CUDA™ code for OpenCL™ 1.2 devices

Build applications written in NVIDIA® CUDA™ code for OpenCL™ 1.2 devices. Other systems should work too, ideally. You will need at a minimum at least one OpenCL-enabled GPU, and appropriate OpenCL drivers installed, for the GPU. Both linux and Mac systems stand a reasonable chance of working ok.

regl-cnn - Digit recognition with Convolutional Neural Networks in WebGL

GPU accelerated handwritten digit recognition with regl. Note that this network will probably be slower than the corresponding network implemented on the CPU. This is because of the overhead associated with transferring data to and from the GPU. But in the future we will attempt implementing more complex networks in the browser, such as Neural Style, and then we think that we will see a significant speedup compared to the CPU.

vulkan_minimal_compute - Minimal Example of Using Vulkan for Compute Operations. Only ~400LOC.

This is a simple demo that demonstrates how to use Vulkan for compute operations only. In other words, this demo does nothing related to graphics, and only uses Vulkan to execute some computation on the GPU. For this demo, Vulkan is used to render the Mandelbrot set on the GPU. The demo is very simple, and is only ~400LOC. The code is heavily commented, so it should be useful for people interested in learning Vulkan. The application launches a compute shader that renders the mandelbrot set, by rendering it into a storage buffer. The storage buffer is then read from the GPU, and saved as .png. Check the source code comments for further info.

Image Resizer GPGPU

Make images smaller, resizing and resampling with incredible performance, scalability and ease with features such as GPGPU processing and distributed computing.


FsGPU project contains library and samples to assist general purpose GPU programming in F# for CUDA enabled devices.

Permutations with CUDA and OpenCL

Finding massive permutations on GPU with CUDA and OpenCL

nnabla - Neural Network Libraries

Neural Network Libraries is a deep learning framework that is intended to be used for research, development and production. We aim to have it running everywhere: desktop PCs, HPC clusters, embedded devices and production servers.This installs the CPU version of Neural Network Libraries. GPU-acceleration can be added by installing the CUDA extension with pip install nnabla-ext-cuda.


Simple program that displays information about CUDA-enabled devices. The program is equipped with GPU performance test.

chainer - A flexible framework of neural networks for deep learning

Chainer is a Python-based deep learning framework aiming at flexibility. It provides automatic differentiation APIs based on the define-by-run approach (a.k.a. dynamic computational graphs) as well as object-oriented high-level APIs to build and train neural networks. It also supports CUDA/cuDNN using CuPy for high performance training and inference. For more details of Chainer, see the documents and resources listed above and join the community in Forum, Slack, and Twitter. The stable version of current Chainer is separated in here: v3.

GPU Flame Fractal Renderer

Renderer for flam3 cosmic recursive fractal flames implemented on GPU. Requires a CUDA-capable graphics card.

GPUVerify: A verifier for GPU kernels

GPUVerify is a tool for verifying race- and divergence-freedom of GPU kernels written in OpenCL and CUDA.

blocksparse - Efficient GPU kernels for block-sparse matrix multiplication and convolution

The blocksparse package contains TensorFlow Ops and corresponding GPU kernels for block-sparse matrix multiplication. Also included are related ops like edge bias, sparse weight norm and layer norm. To learn more, see the launch post on the OpenAI blog.


GPCompute is an old CUDA-like but Based on DX81 (or later) for compatibility with almost any current Videocards. It's Developped in C/C++. With Simple Interface for Arrayed-Computations. The Limitation all came from its DX version implemention.