neanderthal - Fast Clojure Matrix Library

  •        53

Neanderthal is a Clojure library for fast matrix and linear algebra computations based on the highly optimized native libraries of BLAS and LAPACK computation routines for both CPU and GPU.. Read the documentation at Neanderthal Web Site.



Related Projects

Arraymancer - A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU, OpenCL and embedded devices

  •    Nim

Arraymancer is a tensor (N-dimensional array) project in Nim. The main focus is providing a fast and ergonomic CPU, Cuda and OpenCL ndarray library on which to build a scientific computing and in particular a deep learning ecosystem. The library is inspired by Numpy and PyTorch. The library provides ergonomics very similar to Numpy, Julia and Matlab but is fully parallel and significantly faster than those libraries. It is also faster than C-based Torch.

mshadow - Matrix Shadow:Lightweight CPU/GPU Matrix and Tensor Template Library in C++/CUDA for (Deep) Machine Learning

  •    C++

MShadow is a lightweight CPU/GPU Matrix/Tensor Template Library in C++/CUDA. The goal of mshadow is to support efficient, device invariant and simple tensor library for machine learning project that aims for maximum performance and control, while also emphasize simplicity.MShadow also provides interface that allows writing Multi-GPU and distributed deep learning programs in an easy and unified way.

vexcl - VexCL is a C++ vector expression template library for OpenCL/CUDA

  •    C++

VexCL is a vector expression template library for OpenCL/CUDA. It has been created for ease of GPGPU development with C++. VexCL strives to reduce amount of boilerplate code needed to develop GPGPU applications. The library provides convenient and intuitive notation for vector arithmetic, reduction, sparse matrix-vector products, etc. Multi-device and even multi-platform computations are supported. The source code of the library is distributed under very permissive MIT license.

blis - BLAS-like Library Instantiation Software Framework

  •    C

BLIS is a portable software framework for instantiating high-performance BLAS-like dense linear algebra libraries. The framework was designed to isolate essential kernels of computation that, when optimized, immediately enable optimized implementations of most of its commonly used and computationally intensive operations. BLIS is written in ISO C99 and available under a new/modified/3-clause BSD license. While BLIS exports a new BLAS-like API, it also includes a BLAS compatibility layer which gives application developers access to BLIS implementations via traditional BLAS routine calls. An object-based API unique to BLIS is also available. For a thorough presentation of our framework, please read our journal article, "BLIS: A Framework for Rapidly Instantiating BLAS Functionality". For those who just want an executive summary, please see the next section.

blocksparse - Efficient GPU kernels for block-sparse matrix multiplication and convolution

  •    Cuda

The blocksparse package contains TensorFlow Ops and corresponding GPU kernels for block-sparse matrix multiplication. Also included are related ops like edge bias, sparse weight norm and layer norm. To learn more, see the launch post on the OpenAI blog.

Surge - A Swift library that uses the Accelerate framework to provide high-performance functions for matrix math, digital signal processing, and image manipulation

  •    Swift

Surge is a Swift library that uses the Accelerate framework to provide high-performance functions for matrix math, digital signal processing, and image manipulation. Accelerate exposes SIMD instructions available in modern CPUs to significantly improve performance of certain calculations. Because of its relative obscurity and inconvenient APIs, Accelerate is not commonly used by developers, which is a shame, since many applications could benefit from these performance optimizations.

owl - Owl is an OCaml library for scientific and engineering computing.

  •    OCaml

Owl is an emerging numerical library for scientific computing and engineering. The library is developed in the OCaml language and inherits all its powerful features such as static type checking, powerful module system, and superior runtime efficiency. Owl allows you to write succinct type-safe numerical applications in functional language without sacrificing performance, significantly reduces the cost from prototype to production use. Owl's documentation contains a lot of learning materials to help you start. The full documentation consists of two parts: Tutorial Book and API Reference. Both are perfectly synchronised with the code in the repository by the automatic building system. You can access both parts with the following link.

  •    TypeScript

Visit This question bothered me a few times until I studied math in the university. There, I had in total four linear algebra courses, so matrix multiplication became my bread-and-butter. One day it just snapped in my mind how the number of rows of the first matrix has to match the number of columns in the second matrix, which means they must perfectly align when the second matrix is rotated by 90°. From there, the second matrix trickles down, "combing" the values in the first matrix. The values are multiplied and added together. In my head, I called this the "waterfall method", and used it to perform my calculations in the university courses. It worked.



An implementation of linear algebra numerical structures and methods for the CLR. NPack is unique in that it uses generics for matrix element definitions, and a set of matrix operations via an interface, allowing a CLR-based operations engine as well as the opportunity to use ...

compute - A C++ GPU Computing Library for OpenCL

  •    C++

Boost.Compute is a GPU/parallel-computing library for C++ based on OpenCL. The core library is a thin C++ wrapper over the OpenCL API and provides access to compute devices, contexts, command queues and memory buffers.

NNPACK - Acceleration package for neural networks on multi-core CPUs

  •    C

NNPACK is an acceleration package for neural network computations. NNPACK aims to provide high-performance implementations of convnet layers for multi-core CPUs. NNPACK is not intended to be directly used by machine learning researchers; instead it provides low-level performance primitives leveraged in leading deep learning frameworks, such as PyTorch, Caffe2, MXNet, tiny-dnn, Caffe, Torch, and Darknet.

gl-matrix - Javascript Matrix and Vector library for High Performance WebGL apps

  •    Javascript

Javascript Matrix and Vector library for High Performance WebGL apps

cutlass - CUDA Templates for Linear Algebra Subroutines

  •    C++

CUTLASS 1.0 is a collection of CUDA C++ template abstractions for implementing high-performance matrix-multiplication (GEMM) at all levels and scales within CUDA. It incorporates strategies for hierarchical decomposition and data movement similar to those used to implement cuBLAS. CUTLASS decomposes these "moving parts" into reusable, modular software components abstracted by C++ template classes. These thread-wide, warp-wide, block-wide, and device-wide primitives can be specialized and tuned via custom tiling sizes, data types, and other algorithmic policy. The resulting flexibility simplifies their use as building blocks within custom kernels and applications. To support a wide variety of applications, CUTLASS provides extensive support for mixed-precision computations, providing specialized data-movement and multiply-accumulate abstractions for 8-bit integer, half-precision floating point (FP16), single-precision floating point (FP32), and double-precision floating point (FP64) types. Furthermore, CUTLASS demonstrates CUDA's WMMA API for targeting the programmable, high-throughput Tensor Cores provided by NVIDIA's Volta architecture and beyond.

gunrock - High-Performance Graph Primitives on GPUs

  •    Cuda

Gunrock is a CUDA library for graph-processing designed specifically for the GPU. It uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. For more details, please visit our website, read Why Gunrock, our TOPC 2017 paper Gunrock: GPU Graph Analytics, look at our results, and find more details in our publications. See Release Notes to keep up with the our latest changes.

Meta Numerics


The Meta.Numerics math and statistics library supports scientific computing on the .NET platform. It offers an object-oriented API for matrix algebra, advanced functions of real and complex numbers, signal processing, and data analysis.


  •    C++

Maztrix is a matrix library and program (written in ANSI C++) for computing matrix calculations. Maztrix can find determinants, row reduce, and much more.

lrslibrary - Low-Rank and Sparse Tools for Background Modeling and Subtraction in Videos

  •    Matlab

Low-Rank and Sparse tools for Background Modeling and Subtraction in Videos. The LRSLibrary provides a collection of low-rank and sparse decomposition algorithms in MATLAB. The library was designed for motion segmentation in videos, but it can be also used (or adapted) for other computer vision problems (for more information, please see this page). Currently the LRSLibrary offers more than 100 algorithms based on matrix and tensor methods. The LRSLibrary was tested successfully in several MATLAB versions (e.g. R2014, R2015, R2016, R2017, on both x86 and x64 versions). It requires minimum R2014b.

emu - a language for programming GPUs, with a focus on ergonomics first and performance second

  •    Rust

⚠ Please note that while Emu 0.2.0 is quite usable, it suffers from 2 key issues. It firstly does nothing to minimize CPU-GPU data transfer and secondly it's compiler is not well-tested. These can be reasons not to use Emu 0.2.0. A new version of Emu is in the works, however, with significant improvements in the language, compiler, and compile-time checker. This new version of Emu should be released some time in Q4 of 2019. But unlike OpenCL/CUDA/Halide/Futhark, Emu is embedded in Rust. This lets it take advantage of the ecosystem in ways...