- 53

Neanderthal is a Clojure library for fast matrix and linear algebra computations based on the highly optimized native libraries of BLAS and LAPACK computation routines for both CPU and GPU.. Read the documentation at Neanderthal Web Site.

http://neanderthal.uncomplicate.orghttps://github.com/uncomplicate/neanderthal

Tags | clojure-library matrix gpu gpu-computing gpgpu opencl cuda high-performance-computing vectorization api matrix-factorization matrix-multiplication matrix-functions matrix-calculations |

Implementation | Clojure |

License | creative-commons |

Platform | OS-Independent |

Arraymancer is a tensor (N-dimensional array) project in Nim. The main focus is providing a fast and ergonomic CPU, Cuda and OpenCL ndarray library on which to build a scientific computing and in particular a deep learning ecosystem. The library is inspired by Numpy and PyTorch. The library provides ergonomics very similar to Numpy, Julia and Matlab but is fully parallel and significantly faster than those libraries. It is also faster than C-based Torch.

tensor nim multidimensional-arrays cuda deep-learning machine-learning cudnn high-performance-computing gpu-computing matrix-library neural-networks parallel-computing openmp linear-algebra ndarray opencl gpgpu iot automatic-differentiation autogradArmadillo: fast C++ library for linear algebra & scientific computing - http://arma.sourceforge.net

linear-algebra matrix matrix-functions linear-algebra-library statistics matlab blas lapack hpc scientific-computing mkl machine-learning armadillo openmp gaussian-mixture-models cpp11 vector sparse-matrix expression-template matrix-factorizationMShadow is a lightweight CPU/GPU Matrix/Tensor Template Library in C++/CUDA. The goal of mshadow is to support efficient, device invariant and simple tensor library for machine learning project that aims for maximum performance and control, while also emphasize simplicity.MShadow also provides interface that allows writing Multi-GPU and distributed deep learning programs in an easy and unified way.

VexCL is a vector expression template library for OpenCL/CUDA. It has been created for ease of GPGPU development with C++. VexCL strives to reduce amount of boilerplate code needed to develop GPGPU applications. The library provides convenient and intuitive notation for vector arithmetic, reduction, sparse matrix-vector products, etc. Multi-device and even multi-platform computations are supported. The source code of the library is distributed under very permissive MIT license.

opencl cuda c-plus-plus gpgpu scientific-computing cpp11BLIS is a portable software framework for instantiating high-performance BLAS-like dense linear algebra libraries. The framework was designed to isolate essential kernels of computation that, when optimized, immediately enable optimized implementations of most of its commonly used and computationally intensive operations. BLIS is written in ISO C99 and available under a new/modified/3-clause BSD license. While BLIS exports a new BLAS-like API, it also includes a BLAS compatibility layer which gives application developers access to BLIS implementations via traditional BLAS routine calls. An object-based API unique to BLIS is also available. For a thorough presentation of our framework, please read our journal article, "BLIS: A Framework for Rapidly Instantiating BLAS Functionality". For those who just want an executive summary, please see the next section.

blis blas linear-algebra linear-algebra-library matrix-multiplication matrix-calculations matrix-libraryThe blocksparse package contains TensorFlow Ops and corresponding GPU kernels for block-sparse matrix multiplication. Also included are related ops like edge bias, sparse weight norm and layer norm. To learn more, see the launch post on the OpenAI blog.

A Clojure Library for Bayesian Data Analysis and Machine Learning on the GPU. Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

bayesian-inference bayesian-data-analysis gpu-computing gpu-acceleration statistics machine-learning clojure-library bayesian opencl cuda high-performance-computing gpu mcmc markov-chain-monte-carloSurge is a Swift library that uses the Accelerate framework to provide high-performance functions for matrix math, digital signal processing, and image manipulation. Accelerate exposes SIMD instructions available in modern CPUs to significantly improve performance of certain calculations. Because of its relative obscurity and inconvenient APIs, Accelerate is not commonly used by developers, which is a shame, since many applications could benefit from these performance optimizations.

Owl is an emerging numerical library for scientific computing and engineering. The library is developed in the OCaml language and inherits all its powerful features such as static type checking, powerful module system, and superior runtime efficiency. Owl allows you to write succinct type-safe numerical applications in functional language without sacrificing performance, significantly reduces the cost from prototype to production use. Owl's documentation contains a lot of learning materials to help you start. The full documentation consists of two parts: Tutorial Book and API Reference. Both are perfectly synchronised with the code in the repository by the automatic building system. You can access both parts with the following link.

matrix linear-algebra ndarray statistical-functions topic-modeling regression maths gsl plotting sparse-linear-systems scientific-computing numerical-calculations statistics mcmc optimization autograd algorithmic-differentation automatic-differentiation machine-learning neural-networkVisit matrixmultiplication.xyz. This question bothered me a few times until I studied math in the university. There, I had in total four linear algebra courses, so matrix multiplication became my bread-and-butter. One day it just snapped in my mind how the number of rows of the first matrix has to match the number of columns in the second matrix, which means they must perfectly align when the second matrix is rotated by 90°. From there, the second matrix trickles down, "combing" the values in the first matrix. The values are multiplied and added together. In my head, I called this the "waterfall method", and used it to perform my calculations in the university courses. It worked.

An implementation of linear algebra numerical structures and methods for the CLR. NPack is unique in that it uses generics for matrix element definitions, and a set of matrix operations via an interface, allowing a CLR-based operations engine as well as the opportunity to use ...

math matrix npack algorithms blas gpgpu gpuBoost.Compute is a GPU/parallel-computing library for C++ based on OpenCL. The core library is a thin C++ wrapper over the OpenCL API and provides access to compute devices, contexts, command queues and memory buffers.

opencl boost c-plus-plus cpp compute gpu gpgpu performance hpcNNPACK is an acceleration package for neural network computations. NNPACK aims to provide high-performance implementations of convnet layers for multi-core CPUs. NNPACK is not intended to be directly used by machine learning researchers; instead it provides low-level performance primitives leveraged in leading deep learning frameworks, such as PyTorch, Caffe2, MXNet, tiny-dnn, Caffe, Torch, and Darknet.

neural-network neural-networks convolutional-layers inference high-performance high-performance-computing simd cpu multithreading fast-fourier-transform winograd-transform matrix-multiplicationJavascript Matrix and Vector library for High Performance WebGL apps

CUTLASS 1.0 is a collection of CUDA C++ template abstractions for implementing high-performance matrix-multiplication (GEMM) at all levels and scales within CUDA. It incorporates strategies for hierarchical decomposition and data movement similar to those used to implement cuBLAS. CUTLASS decomposes these "moving parts" into reusable, modular software components abstracted by C++ template classes. These thread-wide, warp-wide, block-wide, and device-wide primitives can be specialized and tuned via custom tiling sizes, data types, and other algorithmic policy. The resulting flexibility simplifies their use as building blocks within custom kernels and applications. To support a wide variety of applications, CUTLASS provides extensive support for mixed-precision computations, providing specialized data-movement and multiply-accumulate abstractions for 8-bit integer, half-precision floating point (FP16), single-precision floating point (FP32), and double-precision floating point (FP64) types. Furthermore, CUTLASS demonstrates CUDA's WMMA API for targeting the programmable, high-throughput Tensor Cores provided by NVIDIA's Volta architecture and beyond.

Gunrock is a CUDA library for graph-processing designed specifically for the GPU. It uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. For more details, please visit our website, read Why Gunrock, our TOPC 2017 paper Gunrock: GPU Graph Analytics, look at our results, and find more details in our publications. See Release Notes to keep up with the our latest changes.

gunrock cuda graph-processing graph-analytics gpu graph-primitivesThe Meta.Numerics math and statistics library supports scientific computing on the .NET platform. It offers an object-oriented API for matrix algebra, advanced functions of real and complex numbers, signal processing, and data analysis.

math numerics ironpython maths matrix numerical-algorithms scienceMaztrix is a matrix library and program (written in ANSI C++) for computing matrix calculations. Maztrix can find determinants, row reduce, and much more.

Low-Rank and Sparse tools for Background Modeling and Subtraction in Videos. The LRSLibrary provides a collection of low-rank and sparse decomposition algorithms in MATLAB. The library was designed for motion segmentation in videos, but it can be also used (or adapted) for other computer vision problems (for more information, please see this page). Currently the LRSLibrary offers more than 100 algorithms based on matrix and tensor methods. The LRSLibrary was tested successfully in several MATLAB versions (e.g. R2014, R2015, R2016, R2017, on both x86 and x64 versions). It requires minimum R2014b.

rpca matrix-factorization matrix-completion tensor-decomposition tensor matlab matrix subspace-tracking subspace-learning⚠ Please note that while Emu 0.2.0 is quite usable, it suffers from 2 key issues. It firstly does nothing to minimize CPU-GPU data transfer and secondly it's compiler is not well-tested. These can be reasons not to use Emu 0.2.0. A new version of Emu is in the works, however, with significant improvements in the language, compiler, and compile-time checker. This new version of Emu should be released some time in Q4 of 2019. But unlike OpenCL/CUDA/Halide/Futhark, Emu is embedded in Rust. This lets it take advantage of the ecosystem in ways...

emu gpu gpgpu gpu-computing gpu-acceleration gpu-programming
We have large collection of open source products. Follow the tags from
Tag Cloud >>

Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
**Add Projects.**