managedCUDA

  •        188

managedCUDA makes the CUDA Driver API available in .net applications written in C#, Visual Basic or any other .net language. It also includes classes for an easy handling and interop with CUDA, i.e. build-in CUDA types like float3.

http://managedcuda.codeplex.com/

Tags
Implementation
License
Platform

   




Related Projects

ManagedCuda Galaxy Simulator


This project is a test of ManagedCuda and graphics interop to OpenTK to simulate a simple galaxy on the GPU.

vexcl - VexCL is a C++ vector expression template library for OpenCL/CUDA


VexCL is a vector expression template library for OpenCL/CUDA. It has been created for ease of GPGPU development with C++. VexCL strives to reduce amount of boilerplate code needed to develop GPGPU applications. The library provides convenient and intuitive notation for vector arithmetic, reduction, sparse matrix-vector products, etc. Multi-device and even multi-platform computations are supported. The source code of the library is distributed under very permissive MIT license.

CuBLAS.Net


A wrapper for NVidia's CuBLAS (Compute Unified Basic Linear Algebra Subprograms) for the CLR.

Optix.NET


Optix.NET is a .NET wrapper for the Nvidia Optix GPU ray-tracing library.

scikit-cuda - Python interface to GPU-powered libraries


scikit-cuda provides Python interfaces to many of the functions in the CUDA device/runtime, CUBLAS, CUFFT, and CUSOLVER libraries distributed as part of NVIDIA's CUDA Programming Toolkit, as well as interfaces to select functions in the CULA Dense Toolkit. Both low-level wrapper functions similar to their C counterparts and high-level functions comparable to those in NumPy and Scipy are provided. Package documentation is available at http://scikit-cuda.readthedocs.org/. Many of the high-level functions have examples in their docstrings. More illustrations of how to use both the wrappers and high-level functions can be found in the demos/ and tests/ subdirectories.


NyuziProcessor - GPGPU microprocessor architecture


Nyuzi is an experimental GPGPU processor hardware design focused on compute intensive tasks. It is optimized for use cases like blockchain mining, deep learning, and autonomous driving. This project includes a synthesizable hardware design written in System Verilog, an instruction set emulator, an LLVM based C/C++ compiler, software libraries, and tests. It can be used to experiment with microarchitectural and instruction set design tradeoffs.

mshadow - Matrix Shadow:Lightweight CPU/GPU Matrix and Tensor Template Library in C++/CUDA for (Deep) Machine Learning


MShadow is a lightweight CPU/GPU Matrix/Tensor Template Library in C++/CUDA. The goal of mshadow is to support efficient, device invariant and simple tensor library for machine learning project that aims for maximum performance and control, while also emphasize simplicity.MShadow also provides interface that allows writing Multi-GPU and distributed deep learning programs in an easy and unified way.

coriander - Build NVIDIA® CUDA™ code for OpenCL™ 1.2 devices


Build applications written in NVIDIA® CUDA™ code for OpenCL™ 1.2 devices. Other systems should work too, ideally. You will need at a minimum at least one OpenCL-enabled GPU, and appropriate OpenCL drivers installed, for the GPU. Both linux and Mac systems stand a reasonable chance of working ok.

regl-cnn - Digit recognition with Convolutional Neural Networks in WebGL


GPU accelerated handwritten digit recognition with regl. Note that this network will probably be slower than the corresponding network implemented on the CPU. This is because of the overhead associated with transferring data to and from the GPU. But in the future we will attempt implementing more complex networks in the browser, such as Neural Style, and then we think that we will see a significant speedup compared to the CPU.

vulkan_minimal_compute - Minimal Example of Using Vulkan for Compute Operations. Only ~400LOC.


This is a simple demo that demonstrates how to use Vulkan for compute operations only. In other words, this demo does nothing related to graphics, and only uses Vulkan to execute some computation on the GPU. For this demo, Vulkan is used to render the Mandelbrot set on the GPU. The demo is very simple, and is only ~400LOC. The code is heavily commented, so it should be useful for people interested in learning Vulkan. The application launches a compute shader that renders the mandelbrot set, by rendering it into a storage buffer. The storage buffer is then read from the GPU, and saved as .png. Check the source code comments for further info.

Image Resizer GPGPU


Make images smaller, resizing and resampling with incredible performance, scalability and ease with features such as GPGPU processing and distributed computing.

FsGPU


FsGPU project contains library and samples to assist general purpose GPU programming in F# for CUDA enabled devices.

Permutations with CUDA and OpenCL


Finding massive permutations on GPU with CUDA and OpenCL

nnabla - Neural Network Libraries


Neural Network Libraries is a deep learning framework that is intended to be used for research, development and production. We aim to have it running everywhere: desktop PCs, HPC clusters, embedded devices and production servers.This installs the CPU version of Neural Network Libraries. GPU-acceleration can be added by installing the CUDA extension with pip install nnabla-ext-cuda.

cuda-z


Simple program that displays information about CUDA-enabled devices. The program is equipped with GPU performance test.

chainer - A flexible framework of neural networks for deep learning


Chainer is a Python-based deep learning framework aiming at flexibility. It provides automatic differentiation APIs based on the define-by-run approach (a.k.a. dynamic computational graphs) as well as object-oriented high-level APIs to build and train neural networks. It also supports CUDA/cuDNN using CuPy for high performance training and inference. For more details of Chainer, see the documents and resources listed above and join the community in Forum, Slack, and Twitter. The stable version of current Chainer is separated in here: v3.

GPU Flame Fractal Renderer


Renderer for flam3 cosmic recursive fractal flames implemented on GPU. Requires a CUDA-capable graphics card.

GPUVerify: A verifier for GPU kernels


GPUVerify is a tool for verifying race- and divergence-freedom of GPU kernels written in OpenCL and CUDA.

blocksparse - Efficient GPU kernels for block-sparse matrix multiplication and convolution


The blocksparse package contains TensorFlow Ops and corresponding GPU kernels for block-sparse matrix multiplication. Also included are related ops like edge bias, sparse weight norm and layer norm. To learn more, see the launch post on the OpenAI blog.

GPCompute


GPCompute is an old CUDA-like but Based on DX81 (or later) for compatibility with almost any current Videocards. It's Developped in C/C++. With Simple Interface for Arrayed-Computations. The Limitation all came from its DX version implemention.