NNPACK is an acceleration package for neural network computations. NNPACK aims to provide high-performance implementations of convnet layers for multi-core CPUs. NNPACK is not intended to be directly used by machine learning researchers; instead it provides low-level performance primitives leveraged in leading deep learning frameworks, such as PyTorch, Caffe2, MXNet, tiny-dnn, Caffe, Torch, and Darknet.
neural-network neural-networks convolutional-layers inference high-performance high-performance-computing simd cpu multithreading fast-fourier-transform winograd-transform matrix-multiplicationNeanderthal is a Clojure library for fast matrix and linear algebra computations based on the highly optimized native libraries of BLAS and LAPACK computation routines for both CPU and GPU.. Read the documentation at Neanderthal Web Site.
clojure-library matrix gpu gpu-computing gpgpu opencl cuda high-performance-computing vectorization api matrix-factorization matrix-multiplication matrix-functions matrix-calculationsBLIS is a portable software framework for instantiating high-performance BLAS-like dense linear algebra libraries. The framework was designed to isolate essential kernels of computation that, when optimized, immediately enable optimized implementations of most of its commonly used and computationally intensive operations. BLIS is written in ISO C99 and available under a new/modified/3-clause BSD license. While BLIS exports a new BLAS-like API, it also includes a BLAS compatibility layer which gives application developers access to BLIS implementations via traditional BLAS routine calls. An object-based API unique to BLIS is also available. For a thorough presentation of our framework, please read our journal article, "BLIS: A Framework for Rapidly Instantiating BLAS Functionality". For those who just want an executive summary, please see the next section.
blis blas linear-algebra linear-algebra-library matrix-multiplication matrix-calculations matrix-libraryM4RI is a library for fast arithmetic with dense matrices over F2. The name M4RI comes from the first implemented algorithm: The “Method of the Four Russians” inversion algorithm published by Gregory Bard. This algorithm in turn is named after the “Method of the Four Russians” multiplication algorithm which is probably better referred to as Kronrod's method. M4RI is available under the General Public License Version 2 or later (GPLv2+). and support for Linux, Solaris, and OS X (GCC).
linear-algebra matrix-multiplication matrix-factorizationThis repository started out as a class project for the Parallel Processing course at UIC with Professor Kshemkalyani. Several matrix multiplication algorithms are implemented in C, timings are recorded and a report was written.
mpi matrix-multiplication parallel-processingDBCSR is a library designed to efficiently perform sparse matrix matrix multiplication, among other operations. It is MPI and OpenMP parallel and can exploit GPUs via CUDA. Optionally, you can install libxsmm.
cp2k blas matrix-multiplication gemm cuda sparse-matrix openmp-parallelization mpiMatrix-matrix multiplication is a fundamental operation of great importance to scientific computing and, increasingly, machine learning. It is a simple enough concept to be introduced in a typical high school algebra course yet in practice important enough that its implementation on computers continues to be an active research topic. This note describes a set of exercises that use this operation to illustrate how high performance can be attained on modern CPUs with hierarchical memories (multiple caches). It does so by building on the insights that underly the BLAS-like Library Instantiation Softare (BLIS) framework by exposing a simplified “sandbox” that mimics the implementation in BLIS. As such, it also becomes a vehicle for the “crowd sourcing” of the optimization of BLIS. We call this set of exercises BLISlab. Check the tutorial for more details.
gemm matrix-multiplication code-optimization blisCopyright by Prof. Robert van de Geijn (rvdg@cs.utexas.edu). Adapted to Github Markdown Wiki by Jianyu Huang (jianyu@cs.utexas.edu).
gemm matrix-multiplication gotoblas blis code-optimizationGLAS is a C library written in Dlang. No C++/D runtime is required but libc, which is available everywhere. CBLAS API can be provided by linking with Netlib's CBLAS library.
blas glas linear-algebra-subprograms algebra matrix-multiplication matrix lapack simdThis repository provides the Blis linear algebra routines as a self-contained Python C-extension. Clearly the Dell's numpy+OpenBLAS performance is the outlier, so it's likely something has gone wrong in the compilation and architecture detection.
cython blis blas blas-libraries openblas linear-algebra matrix-multiplication numpy neural-networks neural-networkA semi-compliant D3DX implementation for vectors, matrices, and quaternions. Please refer to D3DX original manual.
3d-math linear-algebra matrix vector graphics-library computer-graphics matrix-calculations matrix-library matrix-functions vector-math matrix-math matrix-multiplication math-library 3d-graphicsI decided to do this simple project in order to get used with the new thread class in C++11. The idea is to take two matrices and multiply them using different threads. I want to see how the implementation differs, the problems that may arise and how the execution time scales with the number of threads and the size of the matrices. It's as easy as that. One thing to note here is that I am using a two dimension array of pointers instead of just floats. This has a reason and it has to do with threads.
threading matrix-multiplicationFor more information on how to use this header file, please consult the documentation. If you got this file from a matrix package, then the documentation is available in the docs/ directory.
matrix-algorithms matrix-multiplication matrixIt is the only fully-implemented generic-tensor library for C#. Allows to work with tensors of custom types. Tensor - is an extentension of matrices, a N-dimensional array. Soon you will find all common functions that are defined for matrices and vectors here. In order to make it custom, Tensor class is generic, which means that you could use not only built-in types like int, float, etc., but also your own types.
performance vector matrix generic matrix-multiplication tensor tensors custom-type
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.