Displaying 1 to 13 from 13 results

NNPACK - Acceleration package for neural networks on multi-core CPUs

  •    C

NNPACK is an acceleration package for neural network computations. NNPACK aims to provide high-performance implementations of convnet layers for multi-core CPUs. NNPACK is not intended to be directly used by machine learning researchers; instead it provides low-level performance primitives leveraged in leading deep learning frameworks, such as PyTorch, Caffe2, MXNet, tiny-dnn, Caffe, Torch, and Darknet.

neanderthal - Fast Clojure Matrix Library

  •    Clojure

Neanderthal is a Clojure library for fast matrix and linear algebra computations based on the highly optimized native libraries of BLAS and LAPACK computation routines for both CPU and GPU.. Read the documentation at Neanderthal Web Site.

blis - BLAS-like Library Instantiation Software Framework

  •    C

BLIS is a portable software framework for instantiating high-performance BLAS-like dense linear algebra libraries. The framework was designed to isolate essential kernels of computation that, when optimized, immediately enable optimized implementations of most of its commonly used and computationally intensive operations. BLIS is written in ISO C99 and available under a new/modified/3-clause BSD license. While BLIS exports a new BLAS-like API, it also includes a BLAS compatibility layer which gives application developers access to BLIS implementations via traditional BLAS routine calls. An object-based API unique to BLIS is also available. For a thorough presentation of our framework, please read our journal article, "BLIS: A Framework for Rapidly Instantiating BLAS Functionality". For those who just want an executive summary, please see the next section.

m4ri - MIRROR: M4RI is a library for fast arithmetic with dense matrices over GF(2)

  •    C

M4RI is a library for fast arithmetic with dense matrices over F2. The name M4RI comes from the first implemented algorithm: The “Method of the Four Russians” inversion algorithm published by Gregory Bard. This algorithm in turn is named after the “Method of the Four Russians” multiplication algorithm which is probably better referred to as Kronrod's method. M4RI is available under the General Public License Version 2 or later (GPLv2+). and support for Linux, Solaris, and OS X (GCC).




PP-MM-A03 - Parallel Processing - Matrix Multiplication (Cannon, DNS, LUdecomp)

  •    TeX

This repository started out as a class project for the Parallel Processing course at UIC with Professor Kshemkalyani. Several matrix multiplication algorithms are implemented in C, timings are recorded and a report was written.

dbcsr - DBCSR: Distributed Block Compressed Sparse Row matrix library

  •    Fortran

DBCSR is a library designed to efficiently perform sparse matrix matrix multiplication, among other operations. It is MPI and OpenMP parallel and can exploit GPUs via CUDA. Optionally, you can install libxsmm.

blislab - BLISlab: A Sandbox for Optimizing GEMM

  •    C

Matrix-matrix multiplication is a fundamental operation of great importance to scientific computing and, increasingly, machine learning. It is a simple enough concept to be introduced in a typical high school algebra course yet in practice important enough that its implementation on computers continues to be an active research topic. This note describes a set of exercises that use this operation to illustrate how high performance can be attained on modern CPUs with hierarchical memories (multiple caches). It does so by building on the insights that underly the BLAS-like Library Instantiation Softare (BLIS) framework by exposing a simplified “sandbox” that mimics the implementation in BLIS. As such, it also becomes a vehicle for the “crowd sourcing” of the optimization of BLIS. We call this set of exercises BLISlab. Check the tutorial for more details.

how-to-optimize-gemm

  •    C

Copyright by Prof. Robert van de Geijn (rvdg@cs.utexas.edu). Adapted to Github Markdown Wiki by Jianyu Huang (jianyu@cs.utexas.edu).


mir-glas - [Experimental] LLVM-accelerated Generic Linear Algebra Subprograms

  •    D

GLAS is a C library written in Dlang. No C++/D runtime is required but libc, which is available everywhere. CBLAS API can be provided by linking with Netlib's CBLAS library.

cython-blis - πŸ’₯ Fast matrix-multiplication as a self-contained Python library – no system dependencies!

  •    C

This repository provides the Blis linear algebra routines as a self-contained Python C-extension. Clearly the Dell's numpy+OpenBLAS performance is the outlier, so it's likely something has gone wrong in the compilation and architecture detection.

matrix.h - A Collection some matrix manipulation algorithms

  •    C

For more information on how to use this header file, please consult the documentation. If you got this file from a matrix package, then the documentation is available in the docs/ directory.

matrix-multiplication-threading - Matrix multiplication using c++11 threads

  •    C++

I decided to do this simple project in order to get used with the new thread class in C++11. The idea is to take two matrices and multiply them using different threads. I want to see how the implementation differs, the problems that may arise and how the execution time scales with the number of threads and the size of the matrices. It's as easy as that. One thing to note here is that I am using a two dimension array of pointers instead of just floats. This has a reason and it has to do with threads.