- 239

NNPACK is an acceleration package for neural network computations. NNPACK aims to provide high-performance implementations of convnet layers for multi-core CPUs. NNPACK is not intended to be directly used by machine learning researchers; instead it provides low-level performance primitives leveraged in leading deep learning frameworks, such as PyTorch, Caffe2, MXNet, tiny-dnn, Caffe, Torch, and Darknet.

https://github.com/Maratyszcza/NNPACKTags | neural-network neural-networks convolutional-layers inference high-performance high-performance-computing simd cpu multithreading fast-fourier-transform winograd-transform matrix-multiplication |

Implementation | C |

License | Public |

Platform |

ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployment and uses on mobile phones from the beginning of design. ncnn does not have third party dependencies. it is cross-platform, and runs faster than all known open source frameworks on mobile phone cpu. Developers can easily deploy deep learning algorithm models to the mobile platform by using efficient ncnn implementation, create intelligent APPs, and bring the artificial intelligence to your fingertips. ncnn is currently being used in many Tencent applications, such as QQ, Qzone, WeChat, Pitu and so on.

nerual-network inference high-preformance simd arm-neon deep-learning artificial-intelligence android iosFeatherCNN, developed by Tencent TEG AI Platform, is a high-performance lightweight CNN inference library. FeatherCNN is currently targeting at ARM CPUs, and is capable to extend to other devices in the future. Highly Performant FeatherCNN delivers state-of-the-art inference computing performance on a wide range of devices, including mobile phones (iOS/Android), embedded devices (Linux) as well as ARM-based servers (Linux).

convolutional-neural-networks inference-engine caffe android ios arm-neonWe show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron (MLP) to learn high-frequency functions in low-dimensional problem domains. These results shed light on recent advances in computer vision and graphics that achieve state-of-the-art results by using MLPs to represent complex 3D objects and scenes. Using tools from the neural tangent kernel (NTK) literature, we show that a standard MLP fails to learn high frequencies both in theory and in practice. To overcome this spectral bias, we use a Fourier feature mapping to transform the effective NTK into a stationary kernel with a tunable bandwidth. We suggest an approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities. We provide a demo IPython notebook as a simple reference for the core idea. The scripts used to generate the paper plots and tables are located in the Experiments directory.

Arraymancer is a tensor (N-dimensional array) project in Nim. The main focus is providing a fast and ergonomic CPU, Cuda and OpenCL ndarray library on which to build a scientific computing and in particular a deep learning ecosystem. The library is inspired by Numpy and PyTorch. The library provides ergonomics very similar to Numpy, Julia and Matlab but is fully parallel and significantly faster than those libraries. It is also faster than C-based Torch.

tensor nim multidimensional-arrays cuda deep-learning machine-learning cudnn high-performance-computing gpu-computing matrix-library neural-networks parallel-computing openmp linear-algebra ndarray opencl gpgpu iot automatic-differentiation autogradDeep learning is a group of exciting new technologies for neural networks. Through a combination of advanced training techniques and neural network architectural components, it is now possible to create neural networks of much greater complexity. Deep learning allows a neural network to learn hierarchies of information in a way that is like the function of the human brain. This course will introduce the student to computer vision with Convolution Neural Networks (CNN), time series analysis with Long Short-Term Memory (LSTM), classic neural network structures and application to computer security. High Performance Computing (HPC) aspects will demonstrate how deep learning can be leveraged both on graphical processing units (GPUs), as well as grids. Focus is primarily upon the application of deep learning to problems, with some introduction mathematical foundations. Students will use the Python programming language to implement deep learning using Google TensorFlow and Keras. It is not necessary to know Python prior to this course; however, familiarity of at least one programming language is assumed. This course will be delivered in a hybrid format that includes both classroom and online instruction. This syllabus presents the expected class schedule, due dates, and reading assignments. Download current syllabus.

neural-network machine-learning tensorflow keras deeplearningChainer is a Python-based deep learning framework aiming at flexibility. It provides automatic differentiation APIs based on the define-by-run approach (a.k.a. dynamic computational graphs) as well as object-oriented high-level APIs to build and train neural networks. It also supports CUDA/cuDNN using CuPy for high performance training and inference. For more details of Chainer, see the documents and resources listed above and join the community in Forum, Slack, and Twitter. The stable version of current Chainer is separated in here: v3.

deep-learning neural-networks machine-learning gpu cuda cudnn numpy cupy chainer neural-networkNiftyNet is a consortium of research organisations (BMEIS -- School of Biomedical Engineering and Imaging Sciences, King's College London; WEISS -- Wellcome EPSRC Centre for Interventional and Surgical Sciences, UCL; CMIC -- Centre for Medical Image Computing, UCL; HIG -- High-dimensional Imaging Group, UCL), where BMEIS acts as the consortium lead. NiftyNet is not intended for clinical use.

tensorflow distributed ml neural-network python2 python3 pip deep-neural-networks deep-learning convolutional-neural-networks medical-imaging medical-image-computing medical-image-processing medical-images segmentation gan autoencoder medical-image-analysis image-guided-therapyWelcome to the open-source repository for the Intel® nGraph™ Library. Our code base provides a Compiler and runtime suite of tools (APIs) designed to give developers maximum flexibility for their software design, allowing them to create or customize a scalable solution using any framework while also avoiding device-level hardware lock-in that is so common with many AI vendors. A neural network model compiled with nGraph can run on any of our currently-supported backends, and it will be able to run on any backends we support in the future with minimal disruption to your model. With nGraph, you can co-evolve your software and hardware's capabilities to stay at the forefront of your industry. The nGraph Compiler is Intel's graph compiler for Artificial Neural Networks. Documentation in this repo describes how you can program any framework to run training and inference computations on a variety of Backends including Intel® Architecture Processors (CPUs), Intel® Nervana™ Neural Network Processors (NNPs), cuDNN-compatible graphics cards (GPUs), custom VPUs like Movidius, and many others. The default CPU Backend also provides an interactive Interpreter mode that can be used to zero in on a DL model and create custom nGraph optimizations that can be used to further accelerate training or inference, in whatever scenario you need.

ngraph tensorflow mxnet deep-learning compiler performance onnx paddlepaddle neural-network deep-neural-networks pytorch caffe2Neanderthal is a Clojure library for fast matrix and linear algebra computations based on the highly optimized native libraries of BLAS and LAPACK computation routines for both CPU and GPU.. Read the documentation at Neanderthal Web Site.

clojure-library matrix gpu gpu-computing gpgpu opencl cuda high-performance-computing vectorization api matrix-factorization matrix-multiplication matrix-functions matrix-calculationsThe Simd Library is a free open source image processing library, designed for C and C++ programmers. It provides many useful high performance algorithms for image processing such as: pixel format conversion, image scaling and filtration, extraction of statistic information from images, motion detection, object detection (HAAR and LBP classifier cascades) and classification, neural network. The algorithms are optimized with using of different SIMD CPU extensions. In particular the library supports following CPU extensions: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX-512 for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC (big-endian), NEON for ARM.

simd sse avx neon image-processing altivec c-plus-plus vsx sse2 avx2 ssse3 simd-library sse41 arm powerpc lbp haar-cascade avx512GPU accelerated handwritten digit recognition with regl. Note that this network will probably be slower than the corresponding network implemented on the CPU. This is because of the overhead associated with transferring data to and from the GPU. But in the future we will attempt implementing more complex networks in the browser, such as Neural Style, and then we think that we will see a significant speedup compared to the CPU.

regl cnn digit-recognition demo gpu webgl convolutional-neural-networks gpgpu deep-learning glsl digit recognition mnist convolutional neural network networksPyTorch is a flexible deep learning framework that allows automatic differentiation through dynamic neural networks (i.e., networks that utilise dynamic control flow like if statements and while loops). It supports GPU acceleration, distributed training, various optimisations, and plenty more neat features. These are some notes on how I think about using PyTorch, and don't encompass all parts of the library or every best practice, but may be helpful to others. Neural networks are a subclass of computation graphs. Computation graphs receive input data, and data is routed to and possibly transformed by nodes which perform processing on the data. In deep learning, the neurons (nodes) in neural networks typically transform data with parameters and differentiable functions, such that the parameters can be optimised to minimise a loss via gradient descent. More broadly, the functions can be stochastic, and the structure of the graph can be dynamic. So while neural networks may be a good fit for dataflow programming, PyTorch's API has instead centred around imperative programming, which is a more common way for thinking about programs. This makes it easier to read code and reason about complex programs, without necessarily sacrificing much performance; PyTorch is actually pretty fast, with plenty of optimisations that you can safely forget about as an end user (but you can dig in if you really want to).

deep-learningKeras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

deep-learning tensorflow theano neural-networks machine-learning data-scienceDLL is a library that aims to provide a C++ implementation of Restricted Boltzmann Machine (RBM) and Deep Belief Network (DBN) and their convolution versions as well. It also has support for some more standard neural networks. Note: When you clone the library, you need to clone the sub modules as well, using the --recursive option.

c-plus-plus cpp cpp11 cpp14 performance machine-learning deep-learning artificial-neural-networks gpu rbm cpu convolutional-neural-networksCompute Library for Deep Neural Networks (clDNN) is an open source performance library for Deep Learning (DL) applications intended for acceleration of DL Inference on Intel® Processor Graphics – including HD Graphics and Iris® Graphics. clDNN includes highly optimized building blocks for implementation of convolutional neural networks (CNN) with C and C++ interfaces. We created this project to enable the DL community to innovate on Intel® processors. Usages supported: Image recognition, image detection, and image segmentation.

deep-neural-networks deep-learning intel intel-hd-graphics cldnnGosl is a Go library to develop Artificial Intelligence and High-Performance Scientific Computations. The library tries to be as general and easy as possible. Gosl considers the use of both Go concurrency routines and parallel computing using the message passing interface (MPI). Gosl has several modules (sub-packages) for a variety of tasks in scientific computing, image analysis, and data post-processing.

scientific-computing visualization linear-algebra differential-equations sparse-systems plotting mkl parallel-computations computational-geometry graph-theory tensor-algebra fast-fourier-transform eigenvalues eigenvectors hacktoberfest machine-learning artificial-intelligence optimization optimization-algorithms linear-programmingGrenade is a composable, dependently typed, practical, and fast recurrent neural network library for concise and precise specifications of complex networks in Haskell. And that's it. Because the types are so rich, there's no specific term level code required to construct this network; although it is of course possible and easy to construct and deconstruct the networks and layers explicitly oneself.

machine-learning deep-neural-networks haskell deep-learning generative-adversarial-networks convolutional-neural-networks'Openpose' for human pose estimation have been implemented using Tensorflow. It also provides several variants that have made some changes to the network structure for real-time processing on the CPU or low-power embedded devices. 2018.5.21 Post-processing part is implemented in c++. It is required compiling the part. See: https://github.com/ildoonet/tf-pose-estimation/tree/master/src/pafprocess 2018.2.7 Arguments in run.py script changed. Support dynamic input size.

deep-learning openpose tensorflow mobilenet pose-estimation convolutional-neural-networks neural-network image-processing human-pose-estimation embedded realtime cnn mobile ros robotics catkinMulti-platform high performance deep learning inference engine (『飞桨』多平台高性能深度学习预测引擎）

arm mobile embedded fpga deep-learning neural-network mali baidu mdl mobile-deep-learningIntel MKL-DNN repository migrated to https://github.com/intel/mkl-dnn. The old address will continue to be available and will redirect to the new repo. Please update your links. Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN) is an open source performance library for deep learning applications. The library accelerates deep learning applications and framework on Intel(R) architecture. Intel(R) MKL-DNN contains vectorized and threaded building blocks which you can use to implement deep neural networks (DNN) with C and C++ interfaces.

intel mkl-dnn deep-learning deep-neural-networks cnn rnn lstm c-plus-plus intel-architecture xeon xeon-phi atom core simd sse42 avx2 avx512 avx512-vnni performance
We have large collection of open source products. Follow the tags from
Tag Cloud >>

Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
**Add Projects.**