CudaSift - A CUDA implementation of SIFT for NVidia GPUs (1.6 ms on a GTX 1060)

  •        214

This is the fourth version of a SIFT (Scale Invariant Feature Transform) implementation using CUDA for GPUs from NVidia. The first version is from 2007 and GPUs have evolved since then. This version is slightly more precise and considerably faster than the previous versions and has been optimized for Kepler and later generations of GPUs. On a GTX 1060 GPU the code takes about 1.6 ms on a 1280x960 pixel image and 2.4 ms on a 1920x1080 pixel image. There is also code for brute-force matching of features that takes about 2.2 ms for two sets of around 1900 SIFT features each.



Related Projects

coriander - Build NVIDIA® CUDA™ code for OpenCL™ 1.2 devices

  •    LLVM

Build applications written in NVIDIA® CUDA™ code for OpenCL™ 1.2 devices. Other systems should work too, ideally. You will need at a minimum at least one OpenCL-enabled GPU, and appropriate OpenCL drivers installed, for the GPU. Both linux and Mac systems stand a reasonable chance of working ok.

nvidia-docker - Build and run Docker containers leveraging NVIDIA GPUs

  •    Makefile

The full documentation and frequently asked questions are available on the repository wiki. An introduction to the NVIDIA Container Runtime is also covered in our blog post.

xmrig-nvidia - Monero (XMR) NVIDIA miner

  •    C++

⚠️ You must update miners to version 2.5 before April 6 due Monero PoW change. XMRig is high performance Monero (XMR) NVIDIA miner, with the official full Windows support.

kmcuda - Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA

  •    Jupyter

K-means implementation is based on "Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup". While it introduces some overhead and many conditional clauses which are bad for CUDA, it still shows 1.6-2x speedup against the Lloyd algorithm. K-nearest neighbors employ the same triangle inequality idea and require precalculated centroids and cluster assignments, similar to the flattened ball tree. Technically, this project is a shared library which exports two functions defined in kmcuda.h: kmeans_cuda and knn_cuda. It has built-in Python3 and R native extension support, so you can from libKMCUDA import kmeans_cuda or dyn.load("").

Deep-Learning-Boot-Camp - A community run, 5-day PyTorch Deep Learning Bootcamp

  •    Jupyter

Tel-Aviv Deep Learning Bootcamp is an intensive (and free!) 5-day program intended to teach you all about deep learning. It is nonprofit focused on advancing data science education and fostering entrepreneurship. The Bootcamp is a prominent venue for graduate students, researchers, and data science professionals. It offers a chance to study the essential and innovative aspects of deep learning. Participation is via a donation to the A.L.S ASSOCIATION for promoting research of the Amyotrophic Lateral Sclerosis (ALS) disease.

persistent-rnn - Fast Recurrent Networks Library

  •    C++

A fast implementation of recurrent neural network layers in CUDA. For a GPU, the largest source of on-chip memory is distributed among the individual register files of thousands of threads. For example, the NVIDIA TitanX GPU has 6.3 MB of register file memory, which is enough to store a recurrent layer with approximately 1200 activations. Persistent kernels exploit this register file memory to cache recurrent weights and reuse them over multiple timesteps.

marvin - Marvin: A Minimalist GPU-only N-Dimensional ConvNets Framework

  •    C++

Marvin is a GPU-only neural network framework made with simplicity, hackability, speed, memory consumption, and high dimensional data in mind. Download CUDA 7.5 and cuDNN 5.1. You will need to register with NVIDIA. Below are some additional steps to set up cuDNN 5.1. NOTE We highly recommend that you install different versions of cuDNN to different directories (e.g., /usr/local/cudnn/vXX) because different software packages may require different versions.

DetectAndTrack - The implementation of an algorithm presented in the CVPR18 paper: "Detect-and-Track: Efficient Pose Estimation in Videos"

  •    Python

R. Girdhar, G. Gkioxari, L. Torresani, M. Paluri and D. Tran. Detect-and-Track: Efficient Pose Estimation in Videos. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018. This code was developed and tested on NVIDIA P100 (16GB), M40 (12GB) and 1080Ti (11GB) GPUs. Training requires at least 4 GPUs for most configurations, and some were trained with 8 GPUs. It might be possible to train on a single GPU by scaling down the learning rate and scaling up the iteration schedule, but we have not tested all possible setups. Testing can be done on a single GPU. Unfortunately it is currently not possible to run this on a CPU as some ops do not have CPU implementations.

gunrock - High-Performance Graph Primitives on GPUs

  •    Cuda

Gunrock is a CUDA library for graph-processing designed specifically for the GPU. It uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. For more details, please visit our website, read Why Gunrock, our TOPC 2017 paper Gunrock: GPU Graph Analytics, look at our results, and find more details in our publications. See Release Notes to keep up with the our latest changes.

TinyNvidiaUpdateChecker - Check for NVIDIA GPU driver updates!

  •    CSharp

This application has a simple concept, when launched it will check for new driver updates for your NVIDIA gpu! With this you no longer need waste your time searching if there's something new to get. HTML Agility Pack will automatically install when attempting to debug the project (make sure you're running the latest version of VS2017), or you may manually install it by doing the following: Open up your Package Manager Console and type in Install-Package HtmlAgilityPack.

OpenVIDIA : Parallel GPU Computer Vision

  •    C

OpenVIDIA projects implement computer vision algorithms running on on graphics hardware such as single or multiple graphics processing units(GPUs) using OpenGL, Cg and CUDA-C. Some samples will soon support OpenCL and Direct Compute API's also.

jetson-inference - Guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson

  •    C++

Welcome to our training guide for inference and deep vision runtime library for NVIDIA DIGITS and Jetson Xavier/TX1/TX2. This repo uses NVIDIA TensorRT for efficiently deploying neural networks onto the embedded platform, improving performance and power efficiency using graph optimizations, kernel fusion, and half-precision FP16 on the Jetson.

nvptx - How to: Run Rust code on your NVIDIA GPU

  •    Rust

Since 2016-12-31, rustc can compile Rust code to PTX (Parallel Thread Execution) code, which is like GPU assembly, via --emit=asm and the right --target argument. This PTX code can then be loaded and executed on a GPU. However, a few days later 128-bit integer support landed in rustc and broke compilation of the core crate for NVPTX targets (LLVM assertions). Furthermore, there was no nightly release between these two events so it was not possible to use the NVPTX backend with a nightly compiler.

scikit-cuda - Python interface to GPU-powered libraries

  •    Python

scikit-cuda provides Python interfaces to many of the functions in the CUDA device/runtime, CUBLAS, CUFFT, and CUSOLVER libraries distributed as part of NVIDIA's CUDA Programming Toolkit, as well as interfaces to select functions in the CULA Dense Toolkit. Both low-level wrapper functions similar to their C counterparts and high-level functions comparable to those in NumPy and Scipy are provided. Package documentation is available at Many of the high-level functions have examples in their docstrings. More illustrations of how to use both the wrappers and high-level functions can be found in the demos/ and tests/ subdirectories.

node-cuda - NVIDIA CUDA™ bindings for Node.js

  •    C++

NVIDIA CUDA™ bindings for Node.js



ttgLib is a C++ library for parallel resource-intensive programs creation for hybrid architectures like CPU+GPU. This library provides ttg::pipeline parallel primitive with wise load distribution over different computing API like as OpenMP or Intel TBB, NVidia CUDA and OpenCL.


  •    DotNet

Optix.NET is a .NET wrapper for the Nvidia Optix GPU ray-tracing library.

excavator - NiceHash's proprietary low-level CUDA miner

  •    HTML

Excavator is GPU miner by NiceHash for mining various altcoins on Excavator is being actively developed by djeZo, dropky and voidstar. Miner is using custom built code base with modern approach and supporting modern NVIDIA video cards. First, make sure you have Visual C++ 2017 redistributable (x64) installed.

mc-cnn - Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches

  •    Cuda

A NVIDIA GPU with at least 6 GB of memory is required to run on the KITTI data set and 12 GB to run on the Middlebury data set. We tested the code on GTX Titan (KITTI only), K80, and GTX Titan X. The code is released under the BSD 2-Clause license. Please cite our paper if you use code from this repository in your work. Install Torch, OpenCV 2.4, and png++.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.