Nyuzi is an experimental GPGPU processor hardware design focused on compute intensive tasks. It is optimized for use cases like blockchain mining, deep learning, and autonomous driving. This project includes a synthesizable hardware design written in System Verilog, an instruction set emulator, an LLVM based C/C++ compiler, software libraries, and tests. It can be used to experiment with microarchitectural and instruction set design tradeoffs.
fpga gpu-computing gpu verilog hardware microprocessor graphics processor-architectureCatBoost is a machine learning method based on gradient boosting over decision trees. All CatBoost documentation is available here.
machine-learning decision-trees gradient-boosting gbm gbdt r kaggle gpu-computing catboost tutorial categorical-features distributed gpu coreml opensource data-science big-data⚠ Please note that while Emu 0.2.0 is quite usable, it suffers from 2 key issues. It firstly does nothing to minimize CPU-GPU data transfer and secondly it's compiler is not well-tested. These can be reasons not to use Emu 0.2.0. A new version of Emu is in the works, however, with significant improvements in the language, compiler, and compile-time checker. This new version of Emu should be released some time in Q4 of 2019. But unlike OpenCL/CUDA/Halide/Futhark, Emu is embedded in Rust. This lets it take advantage of the ecosystem in ways...
emu gpu gpgpu gpu-computing gpu-acceleration gpu-programmingLingvo is a framework for building neural networks in Tensorflow, particularly sequence models. A list of publications using Lingvo can be found here.
nlp research translation tensorflow machine-translation speech distributed tts speech-synthesis mnist speech-recognition lm seq2seq speech-to-text gpu-computing language-model asrData.Array.Accelerate defines an embedded language of array computations for high-performance computing in Haskell. Computations on multi-dimensional, regular arrays are expressed in the form of parameterised collective operations (such as maps, reductions, and permutations). These computations are online-compiled and executed on a range of architectures. Chapter 6 of Simon Marlow's book Parallel and Concurrent Programming in Haskell contains a tutorial introduction to Accelerate.
haskell accelerate llvm cuda parallel-computing gpu-computingNeanderthal is a Clojure library for fast matrix and linear algebra computations based on the highly optimized native libraries of BLAS and LAPACK computation routines for both CPU and GPU.. Read the documentation at Neanderthal Web Site.
clojure-library matrix gpu gpu-computing gpgpu opencl cuda high-performance-computing vectorization api matrix-factorization matrix-multiplication matrix-functions matrix-calculationsThis library provides high-performance components leveraging the hardware acceleration support and automatic differentiation of TensorFlow. The library will provide TensorFlow support for foundational mathematical methods, mid-level methods, and specific pricing models. The coverage is being rapidly expanded over the next few months. Foundational methods. Core mathematical methods - optimisation, interpolation, root finders, linear algebra, random and quasi-random number generation, etc.
tensorflow quantitative-finance finance numerical-methods numerical-optimization numerical-integration high-performance high-performance-computing gpu gpu-computing quantlibThis guide should help fellow researchers and hobbyists to easily automate and accelerate there deep leaning training with their own Kubernetes GPU cluster. Therefore I will explain how to easily setup a GPU cluster on multiple Ubuntu 16.04 bare metal servers and provide some useful scripts and .yaml files that do the entire setup for you. By the way: If you need a Kubernetes GPU-cluster for other reasons, this guide might be helpful to you as well.
kubernetes kubernetes-cluster kubernetes-setup deep-learning gpu-computing distributed-systems guide kubernetes-gpu-cluster cluster gpu worker-nodesPIConGPU is a fully relativistic, manycore, 3D3V particle-in-cell (PIC) code. The Particle-in-Cell algorithm is a central tool in plasma physics. It describes the dynamics of a plasma by computing the motion of electrons and ions in the plasma based on Maxwell's equations. As one of our supported compute platforms, GPUs provide a computational performance of several TFLOP/s at considerable lower invest and maintenance costs compared to multi CPU-based compute architectures of similar performance. The latest high-performance systems (TOP500) are enhanced by accelerator hardware that boost their peak performance up to the multi-PFLOP/s level. With its outstanding performance and scalability to more than 18'000 GPUs, PIConGPU was one of the finalists of the 2013 Gordon Bell Prize.
laser plasma physics gpu physics-simulation gpu-computing particle-accelerator particle-in-cell pic researchnVIDIA's Runtime API for CUDA is intended for use both in C and C++ code. As such, it uses a C-style API, the lowest common denominator (with a few notable exceptions of templated function overloads). This library of wrappers around the Runtime API is intended to allow us to embrace many of the features of C++ (including some C++11) for using the runtime API - but without reducing expressivity or increasing the level of abstraction (as in, e.g., the Thrust library). Using cuda-api-wrappers, you still have your devices, streams, events and so on - but they will be more convenient to work with in more C++-idiomatic ways.
wrapper gpu modern-cpp cuda nvidia gpgpu api-wrapper gpu-memory gpu-computing cuda-toolkit cuda-device cuda-runtime-api gpgpu-computing cuda-api-wrappersArraymancer is a tensor (N-dimensional array) project in Nim. The main focus is providing a fast and ergonomic CPU, Cuda and OpenCL ndarray library on which to build a scientific computing and in particular a deep learning ecosystem. The library is inspired by Numpy and PyTorch. The library provides ergonomics very similar to Numpy, Julia and Matlab but is fully parallel and significantly faster than those libraries. It is also faster than C-based Torch.
tensor nim multidimensional-arrays cuda deep-learning machine-learning cudnn high-performance-computing gpu-computing matrix-library neural-networks parallel-computing openmp linear-algebra ndarray opencl gpgpu iot automatic-differentiation autogradA Clojure Library for Bayesian Data Analysis and Machine Learning on the GPU. Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.
bayesian-inference bayesian-data-analysis gpu-computing gpu-acceleration statistics machine-learning clojure-library bayesian opencl cuda high-performance-computing gpu mcmc markov-chain-monte-carloAn embedded language for GPU kernel programming.
haskell-library gpu gpu-computing gpu-acceleration edsl haskellFAST (Framework for Heterogeneous Medical Image Computing and Visualization) is an open-source cross-platform framework with the main goal of making it easier to do processing and visualization of medical images on heterogeneous systems (CPU+GPU). A detailed description of the framework design can be found on the project wiki or in the research article: FAST: framework for heterogeneous medical image computing and visualization. Erik Smistad, Mohammadmehdi Bozorgi, Frank Lindseth. International Journal of Computer Assisted Radiology and Surgery. February 2015.
opencl visualization parallel-computing medical-imaging gpu-computingAn OpenGL 4.3 / C++ 11 rendering engine oriented towards animation. The build was compiled against GCC 4.9.
cplusplus-11 opengl animation engine computergraphics gpu-computingThis package compiles Accelerate code to LLVM IR, and executes that code on multicore CPUs as well as NVIDIA GPUs. This avoids the need to go through nvcc or clang. For details on Accelerate, refer to the main repository. We love all kinds of contributions, so feel free to open issues for missing features as well as report (or fix!) bugs on the issue tracker.
haskell accelerate llvm cuda parallel-computing gpu-computingAnvilKit tames Metal. It's a collection of code that seems to come up in just about every project that everyone seems to roll themselves. Object that wraps MTLDevice and makes it into a singleton so that you don't need to pass it around.
metal metalkit gpu gpu-acceleration gpu-computinggpuOwl is a Mersenne (see http://mersenne.org/ ) primality tester implemented in OpenCL, that works well on AMD GPUs. gpuOwl implements the PRP test with a powerful self-validating algorithm that protects agains errors. gpuOwl uses FFT transforms of size 8M and 16M, and is best used with Mersenne exponents in the vicinity of 150M and 300M.
opencl gpu-computing gpgpu lucas-lehmer mersenne-numbersThis library is intended to be the fastest matrix library for JavaScript, with the power of GPU computing. To gain best performance, WebCL technology is used to access GPU from JavaScript. Since this project is written in TypeScript, transpiling to JavaScript is necessary.
gpu-computing matrix-library webcl matrixOpt (optlang.org) is a new language in which a user simply writes energy functions over image- or graph-structured unknowns, and a compiler automatically generates state-of-the-art GPU optimization kernels. Real-world energy functions compile directly into highly optimized GPU solver implementations with performance competitive with the best published hand-tuned, application-specific GPU solvers. This is an alpha release of the software to get feedback on the expressiveness of the language. We are interested in seeing what problems can be expressed and what features will be necessary to support more problems.
non-linear-optimization least-squares gauss-newton levenberg-marquardt gpu-computing
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.