The following code snippet demonstrates the general workflow of nnvm compiler.Licensed under an Apache-2.0 license.
computation-graph deep-learning optimization deployment nnvm tvm cuda opencl rocm metalThe full documentation and frequently asked questions are available on the repository wiki. An introduction to the NVIDIA Container Runtime is also covered in our blog post.
nvidia-docker docker cuda gpuChainer is a Python-based deep learning framework aiming at flexibility. It provides automatic differentiation APIs based on the define-by-run approach (a.k.a. dynamic computational graphs) as well as object-oriented high-level APIs to build and train neural networks. It also supports CUDA/cuDNN using CuPy for high performance training and inference. For more details of Chainer, see the documents and resources listed above and join the community in Forum, Slack, and Twitter. The stable version of current Chainer is separated in here: v3.
deep-learning neural-networks machine-learning gpu cuda cudnn numpy cupy chainer neural-networkCuPy is an implementation of NumPy-compatible multi-dimensional array on CUDA. CuPy consists of the core multi-dimensional array class, cupy.ndarray, and many functions on it. It supports a subset of numpy.ndarray interface. For detailed instructions on installing CuPy, see the installation guide.
cuda cudnn cublas cusolver nccl numpy cupy curand cusparseTel-Aviv Deep Learning Bootcamp is an intensive (and free!) 5-day program intended to teach you all about deep learning. It is nonprofit focused on advancing data science education and fostering entrepreneurship. The Bootcamp is a prominent venue for graduate students, researchers, and data science professionals. It offers a chance to study the essential and innovative aspects of deep learning. Participation is via a donation to the A.L.S ASSOCIATION for promoting research of the Amyotrophic Lateral Sclerosis (ALS) disease.
gpu nvidia docker-image machine-learning deep-learning data-science cuda-kernels kaggle-competition cuda pytorch pytorch-tutorials pytorch-tutorial bootcamp meetup kaggle kaggle-scripts pycudaDiscussion of the results at https://ajolicoeur.wordpress.com/cats.
deep-learning cat cuda gan pictureA realtime CPU/GPU profiler hosted in a single C file with a viewer that runs in a web browser. Windows (MSVC) - add lib/Remotery.c and lib/Remotery.h to your program. Set include directories to add Remotery/lib path. The required library ws2_32.lib should be picked up through the use of the #pragma comment(lib, "ws2_32.lib") directive in Remotery.c.
profiler gpu cpu d3d11 opengl cuda metalNOTE: For the latest stable README.md ensure you are on the main branch. Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.
anaconda gpu arrow machine-learning-algorithms h2o cuda pandas python-api mapd gpu-dataframe rapids cudfcuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects. cuML enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs without going into the details of CUDA programming. In most cases, cuML's Python API matches the API from scikit-learn.
machine-learning gpu machine-learning-algorithms cuda nvidiaOpen3D is an open-source library that supports rapid development of software that deals with 3D data. The Open3D frontend exposes a set of carefully selected data structures and algorithms in both C++ and Python. The backend is highly optimized and is set up for parallelization. We welcome contributions from the open-source community. For more, please visit the Open3D documentation.
visualization machine-learning arm gui opengl cpp tensorflow gpu rendering computer-graphics cuda pytorch registration reconstruction 3d odometry pointcloud mesh-processing 3d-perceptionVkFFT is an efficient GPU-accelerated multidimensional Fast Fourier Transform library for Vulkan/CUDA/HIP/OpenCL projects. VkFFT aims to provide the community with an open-source alternative to Nvidia's cuFFT library while achieving better performance. VkFFT is written in C language and supports Vulkan, CUDA, HIP and OpenCL as backends. Vulkan version: Include the vkFFT.h file and glslang compiler. Provide the library with correctly chosen VKFFT_BACKEND definition (VKFFT_BACKEND=0 for Vulkan). Sample CMakeLists.txt file configures project based on Vulkan_FFT.cpp file, which contains examples on how to use VkFFT to perform FFT, iFFT and convolution calculations, use zero padding, multiple feature/batch convolutions, C2C FFTs of big systems, R2C/C2R transforms, R2R DCT-II, III and IV, double precision FFTs, half precision FFTs. For single and double precision, Vulkan 1.0 is required. For half precision, Vulkan 1.1 is required.
hpc vulkan opencl cuda convolution fft hip dct r2c r2r vulkan-fft c2rscikit-cuda provides Python interfaces to many of the functions in the CUDA device/runtime, CUBLAS, CUFFT, and CUSOLVER libraries distributed as part of NVIDIA's CUDA Programming Toolkit, as well as interfaces to select functions in the CULA Dense Toolkit. Both low-level wrapper functions similar to their C counterparts and high-level functions comparable to those in NumPy and Scipy are provided. Package documentation is available at http://scikit-cuda.readthedocs.org/. Many of the high-level functions have examples in their docstrings. More illustrations of how to use both the wrappers and high-level functions can be found in the demos/ and tests/ subdirectories.
gpu cuda blas lapack numericalData.Array.Accelerate defines an embedded language of array computations for high-performance computing in Haskell. Computations on multi-dimensional, regular arrays are expressed in the form of parameterised collective operations (such as maps, reductions, and permutations). These computations are online-compiled and executed on a range of architectures. Chapter 6 of Simon Marlow's book Parallel and Concurrent Programming in Haskell contains a tutorial introduction to Accelerate.
haskell accelerate llvm cuda parallel-computing gpu-computingArrayFire is a high performance software library for parallel computing with an easy-to-use API. Its array based function set makes parallel programming simple. ArrayFire's multiple backends (CUDA, OpenCL and native CPU) make it platform independent and highly portable. A few lines of code in ArrayFire can replace dozens of lines of parallel computing code, saving you valuable time and lowering development costs.
parallel-computing parallel cuda libraryNeanderthal is a Clojure library for fast matrix and linear algebra computations based on the highly optimized native libraries of BLAS and LAPACK computation routines for both CPU and GPU.. Read the documentation at Neanderthal Web Site.
clojure-library matrix gpu gpu-computing gpgpu opencl cuda high-performance-computing vectorization api matrix-factorization matrix-multiplication matrix-functions matrix-calculationsRustaCUDA helps you bring GPU-acceleration to your projects by providing a flexible, easy-to-use interface to the CUDA GPU computing toolkit. RustaCUDA makes it easy to manage GPU memory, transfer data to and from the GPU, and load and launch compute kernels written in any language. RustaCUDA is intended to provide a programmer-friendly library for working with the host-side CUDA Driver API. It is not intended to assist in compiling Rust code to CUDA kernels (though see rust-ptx-builder for that) or to provide device-side utilities to be used within the kernels themselves.
gpu cuda cuda-apiVexCL is a vector expression template library for OpenCL/CUDA. It has been created for ease of GPGPU development with C++. VexCL strives to reduce amount of boilerplate code needed to develop GPGPU applications. The library provides convenient and intuitive notation for vector arithmetic, reduction, sparse matrix-vector products, etc. Multi-device and even multi-platform computations are supported. The source code of the library is distributed under very permissive MIT license.
opencl cuda c-plus-plus gpgpu scientific-computing cpp11K-means implementation is based on "Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup". While it introduces some overhead and many conditional clauses which are bad for CUDA, it still shows 1.6-2x speedup against the Lloyd algorithm. K-nearest neighbors employ the same triangle inequality idea and require precalculated centroids and cluster assignments, similar to the flattened ball tree. Technically, this project is a shared library which exports two functions defined in kmcuda.h: kmeans_cuda and knn_cuda. It has built-in Python3 and R native extension support, so you can from libKMCUDA import kmeans_cuda or dyn.load("libKMCUDA.so").
cuda kmeans yinyang knn-search machine-learning afk-mc2Gunrock is a CUDA library for graph-processing designed specifically for the GPU. It uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. For more details, please visit our website, read Why Gunrock, our TOPC 2017 paper Gunrock: GPU Graph Analytics, look at our results, and find more details in our publications. See Release Notes to keep up with the our latest changes.
gunrock cuda graph-processing graph-analytics gpu graph-primitives⚠️ You must update miners to version 2.5 before April 6 due Monero PoW change. XMRig is high performance Monero (XMR) NVIDIA miner, with the official full Windows support.
monero xmr gpu-mining nvidia-miner aeon xmrig cuda electroneum sumokoin cryptonight
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.