gdrcopy - A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

  •        53

A low-latency GPU memory copy library based on NVIDIA GPUDirect RDMA technology. While GPUDirect RDMA is meant for direct access to GPU memory from third-party devices, it is possible to use these same APIs to create perfectly valid CPU mappings of the GPU memory.

https://github.com/NVIDIA/gdrcopy

Tags
Implementation
License
Platform

   




Related Projects

TinyNvidiaUpdateChecker - Check for NVIDIA GPU driver updates!

  •    CSharp

This application has a simple concept, when launched it will check for new driver updates for your NVIDIA gpu! With this you no longer need waste your time searching if there's something new to get. HTML Agility Pack will automatically install when attempting to debug the project (make sure you're running the latest version of VS2017), or you may manually install it by doing the following: Open up your Package Manager Console and type in Install-Package HtmlAgilityPack.

tensorflow-gpu-install-ubuntu-16

  •    

Before you begin, you may need to disable the opensource ubuntu NVIDIA driver called nouveau. If nouveau driver(s) are still loaded do not proceed with the installation guide and troubleshoot why it's still loaded.

nvvl - A library that uses hardware acceleration to load sequences of video frames to facilitate machine learning training

  •    C++

NVVL (NVIDIA Video Loader) is a library to load random sequences of video frames from compressed video files to facilitate machine learning training. It uses FFmpeg's libraries to parse and read the compressed packets from video files and the video decoding hardware available on NVIDIA GPUs to off-load and accelerate the decoding of those packets, providing a ready-for-training tensor in GPU device memory. NVVL can additionally perform data augmentation while loading the frames. Frames can be scaled, cropped, and flipped horizontally using the GPUs dedicated texture mapping units. Output can be in RGB or YCbCr color space, normalized to [0, 1] or [0, 255], and in float, half, or uint8 tensors. Using compressed video files instead of individual frame image files significantly reduces the demands on the storage and I/O systems during training. Storing video datasets as video files consumes an order of magnitude less disk space, allowing for larger datasets to both fit in system RAM as well as local SSDs for fast access. During loading fewer bytes must be read from disk. Fitting on smaller, faster storage and reading fewer bytes at load time allievates the bottleneck of retrieving data from disks, which will only get worse as GPUs get faster. For the dataset used in our example project, H.264 compressed .mp4 files were nearly 40x smaller than storing frames as .png files.

coriander - Build NVIDIA® CUDA™ code for OpenCL™ 1.2 devices

  •    LLVM

Build applications written in NVIDIA® CUDA™ code for OpenCL™ 1.2 devices. Other systems should work too, ideally. You will need at a minimum at least one OpenCL-enabled GPU, and appropriate OpenCL drivers installed, for the GPU. Both linux and Mac systems stand a reasonable chance of working ok.

CudaSift - A CUDA implementation of SIFT for NVidia GPUs (1.6 ms on a GTX 1060)

  •    Cuda

This is the fourth version of a SIFT (Scale Invariant Feature Transform) implementation using CUDA for GPUs from NVidia. The first version is from 2007 and GPUs have evolved since then. This version is slightly more precise and considerably faster than the previous versions and has been optimized for Kepler and later generations of GPUs. On a GTX 1060 GPU the code takes about 1.6 ms on a 1280x960 pixel image and 2.4 ms on a 1920x1080 pixel image. There is also code for brute-force matching of features that takes about 2.2 ms for two sets of around 1900 SIFT features each.


nvidia-docker - Build and run Docker containers leveraging NVIDIA GPUs

  •    Makefile

The full documentation and frequently asked questions are available on the repository wiki. An introduction to the NVIDIA Container Runtime is also covered in our blog post.

xmrig-nvidia - Monero (XMR) NVIDIA miner

  •    C++

⚠️ You must update miners to version 2.5 before April 6 due Monero PoW change. XMRig is high performance Monero (XMR) NVIDIA miner, with the official full Windows support.

persistent-rnn - Fast Recurrent Networks Library

  •    C++

A fast implementation of recurrent neural network layers in CUDA. For a GPU, the largest source of on-chip memory is distributed among the individual register files of thousands of threads. For example, the NVIDIA TitanX GPU has 6.3 MB of register file memory, which is enough to store a recurrent layer with approximately 1200 activations. Persistent kernels exploit this register file memory to cache recurrent weights and reuse them over multiple timesteps.

nvptx - How to: Run Rust code on your NVIDIA GPU

  •    Rust

Since 2016-12-31, rustc can compile Rust code to PTX (Parallel Thread Execution) code, which is like GPU assembly, via --emit=asm and the right --target argument. This PTX code can then be loaded and executed on a GPU. However, a few days later 128-bit integer support landed in rustc and broke compilation of the core crate for NVPTX targets (LLVM assertions). Furthermore, there was no nightly release between these two events so it was not possible to use the NVPTX backend with a nightly compiler.

webdriver.sh - bash script for managing NVIDIA web drivers on macOS

  •    Shell

Bash script for managing NVIDIA's web drivers on macOS High Sierra and later with an option to set the required build number in NVDAStartupWeb.kext and NVDAEGPUSupport.kext. Installs/updates to the latest available NVIDIA web drivers for your current version of macOS.

NVIDIA PerfGraph

  •    C++

A simple, cross platform performance monitoring application specifically designed to be used with nVidia's instrumented driver and the NVPerfSDK to give a graphical representation of internal GPU counters. Support for non-GPU counters is also available.

marvin - Marvin: A Minimalist GPU-only N-Dimensional ConvNets Framework

  •    C++

Marvin is a GPU-only neural network framework made with simplicity, hackability, speed, memory consumption, and high dimensional data in mind. Download CUDA 7.5 and cuDNN 5.1. You will need to register with NVIDIA. Below are some additional steps to set up cuDNN 5.1. NOTE We highly recommend that you install different versions of cuDNN to different directories (e.g., /usr/local/cudnn/vXX) because different software packages may require different versions.

jetson-inference - Guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson

  •    C++

Welcome to our training guide for inference and deep vision runtime library for NVIDIA DIGITS and Jetson Xavier/TX1/TX2. This repo uses NVIDIA TensorRT for efficiently deploying neural networks onto the embedded platform, improving performance and power efficiency using graph optimizations, kernel fusion, and half-precision FP16 on the Jetson.

wdbgark - WinDBG Anti-RootKit Extension

  •    C++

WDBGARK is an extension (dynamic library) for the Microsoft Debugging Tools for Windows. It main purpose is to view and analyze anomalies in Windows kernel using kernel debugger. It is possible to view various system callbacks, system tables, object types and so on. For more user-friendly view extension uses DML. For the most of commands kernel-mode connection is required. Feel free to use extension with live kernel-mode debugging or with kernel-mode crash dump analysis (some commands will not work). Public symbols are required, so use them, force to reload them, ignore checksum problems, prepare them before analysis and you'll be happy. Windows BETA/RC is supported by design, but read a few notes. First, i don't care about checked builds. Second, i don't care if you don't have symbols (public or private). IA64/ARM is unsupported (and will not).

gnome-shell-system-monitor-applet - Display system informations in gnome shell status bar, such as memory usage, cpu usage, network rates…

  •    Javascript

This extension requires GNOME Shell v3.4 or later. Please see the alternate branches gnome-3.0 and gnome-3.2 if you are using an older version of GNOME Shell (check with gnome-shell --version). Additionally, if you have an Nvidia graphics card, and want to monitor its memory usage, you'll need to install nvidia-smi.

mc-cnn - Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches

  •    Cuda

A NVIDIA GPU with at least 6 GB of memory is required to run on the KITTI data set and 12 GB to run on the Middlebury data set. We tested the code on GTX Titan (KITTI only), K80, and GTX Titan X. The code is released under the BSD 2-Clause license. Please cite our paper if you use code from this repository in your work. Install Torch, OpenCV 2.4, and png++.

gloo - Collective communications library with various primitives for multi-machine training.

  •    C++

Gloo is a collective communications library. It comes with a number of collective algorithms useful for machine learning applications. These include a barrier, broadcast, and allreduce. Transport of data between participating machines is abstracted so that IP can be used at all times, or InifiniBand (or RoCE) when available. In the latter case, if the InfiniBand transport is used, GPUDirect can be used to accelerate cross machine GPU-to-GPU memory transfers.

jetson-reinforcement - Deep reinforcement learning GPU libraries for NVIDIA Jetson with PyTorch, OpenAI Gym, and Gazebo robotics simulator

  •    C++

In this tutorial, we'll be creating artificially intelligent agents that learn from interacting with their environment, gathering experience, and a system of rewards with deep reinforcement learning (deep RL). Using end-to-end neural networks that translate raw pixels into actions, RL-trained agents are capable of exhibiting intuitive behaviors and performing complex tasks. Ultimately, our aim will be to train reinforcement learning agents from virtual robotic simulation in 3D and transfer the agent to a real-world robot. Reinforcement learners choose the best action for the agent to perform based on environmental state (like camera inputs) and rewards that provide feedback to the agent about it's performance. Reinforcement learning can learn to behave optimally in it's environment given a policy, or task - like obtaining the reward.

TV-Out for NVidia cards

  •    C

This is a tool to enable TV-Out on Linux for NVidia cards. It does not need the kernel, supports multiple TV encoder chips. You may use all the features of the chip, down to direct register access, and all resolutions and sizes the chip supports.

nvidia-update - Install nVidia drivers on macOS the easy way.

  •    Shell

The simplest way to install nVidia drivers on macOS. This script installs the best (not necessarily the latest) official nVidia web drivers for your system.