- 120

NNPACK is an acceleration package for neural network computations. NNPACK aims to provide high-performance implementations of convnet layers for multi-core CPUs. NNPACK is not intended to be directly used by machine learning researchers; instead it provides low-level performance primitives leveraged in leading deep learning frameworks, such as PyTorch, Caffe2, MXNet, tiny-dnn, Caffe, Torch, and Darknet.

https://github.com/Maratyszcza/NNPACKTags | neural-network neural-networks convolutional-layers inference high-performance high-performance-computing simd cpu multithreading fast-fourier-transform winograd-transform matrix-multiplication |

Implementation | C |

License | Public |

Platform |

ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployment and uses on mobile phones from the beginning of design. ncnn does not have third party dependencies. it is cross-platform, and runs faster than all known open source frameworks on mobile phone cpu. Developers can easily deploy deep learning algorithm models to the mobile platform by using efficient ncnn implementation, create intelligent APPs, and bring the artificial intelligence to your fingertips. ncnn is currently being used in many Tencent applications, such as QQ, Qzone, WeChat, Pitu and so on.

nerual-network inference high-preformance simd arm-neon deep-learning artificial-intelligence android iosFeatherCNN, developed by Tencent TEG AI Platform, is a high-performance lightweight CNN inference library. FeatherCNN is currently targeting at ARM CPUs, and is capable to extend to other devices in the future. Highly Performant FeatherCNN delivers state-of-the-art inference computing performance on a wide range of devices, including mobile phones (iOS/Android), embedded devices (Linux) as well as ARM-based servers (Linux).

convolutional-neural-networks inference-engine caffe android ios arm-neonArraymancer is a tensor (N-dimensional array) project in Nim. The main focus is providing a fast and ergonomic CPU, Cuda and OpenCL ndarray library on which to build a scientific computing and in particular a deep learning ecosystem. The library is inspired by Numpy and PyTorch. The library provides ergonomics very similar to Numpy, Julia and Matlab but is fully parallel and significantly faster than those libraries. It is also faster than C-based Torch.

tensor nim multidimensional-arrays cuda deep-learning machine-learning cudnn high-performance-computing gpu-computing matrix-library neural-networks parallel-computing openmp linear-algebra ndarray opencl gpgpu iot automatic-differentiation autogradDeep learning is a group of exciting new technologies for neural networks. Through a combination of advanced training techniques and neural network architectural components, it is now possible to create neural networks of much greater complexity. Deep learning allows a neural network to learn hierarchies of information in a way that is like the function of the human brain. This course will introduce the student to computer vision with Convolution Neural Networks (CNN), time series analysis with Long Short-Term Memory (LSTM), classic neural network structures and application to computer security. High Performance Computing (HPC) aspects will demonstrate how deep learning can be leveraged both on graphical processing units (GPUs), as well as grids. Focus is primarily upon the application of deep learning to problems, with some introduction mathematical foundations. Students will use the Python programming language to implement deep learning using Google TensorFlow and Keras. It is not necessary to know Python prior to this course; however, familiarity of at least one programming language is assumed. This course will be delivered in a hybrid format that includes both classroom and online instruction. This syllabus presents the expected class schedule, due dates, and reading assignments. Download current syllabus.

neural-network machine-learning tensorflow keras deeplearningChainer is a Python-based deep learning framework aiming at flexibility. It provides automatic differentiation APIs based on the define-by-run approach (a.k.a. dynamic computational graphs) as well as object-oriented high-level APIs to build and train neural networks. It also supports CUDA/cuDNN using CuPy for high performance training and inference. For more details of Chainer, see the documents and resources listed above and join the community in Forum, Slack, and Twitter. The stable version of current Chainer is separated in here: v3.

deep-learning neural-networks machine-learning gpu cuda cudnn numpy cupy chainer neural-networkNiftyNet is a consortium of research organisations (BMEIS -- School of Biomedical Engineering and Imaging Sciences, King's College London; WEISS -- Wellcome EPSRC Centre for Interventional and Surgical Sciences, UCL; CMIC -- Centre for Medical Image Computing, UCL; HIG -- High-dimensional Imaging Group, UCL), where BMEIS acts as the consortium lead. NiftyNet is not intended for clinical use.

tensorflow distributed ml neural-network python2 python3 pip deep-neural-networks deep-learning convolutional-neural-networks medical-imaging medical-image-computing medical-image-processing medical-images segmentation gan autoencoder medical-image-analysis image-guided-therapyWelcome to the open-source repository for the Intel® nGraph™ Library. Our code base provides a Compiler and runtime suite of tools (APIs) designed to give developers maximum flexibility for their software design, allowing them to create or customize a scalable solution using any framework while also avoiding device-level hardware lock-in that is so common with many AI vendors. A neural network model compiled with nGraph can run on any of our currently-supported backends, and it will be able to run on any backends we support in the future with minimal disruption to your model. With nGraph, you can co-evolve your software and hardware's capabilities to stay at the forefront of your industry. The nGraph Compiler is Intel's graph compiler for Artificial Neural Networks. Documentation in this repo describes how you can program any framework to run training and inference computations on a variety of Backends including Intel® Architecture Processors (CPUs), Intel® Nervana™ Neural Network Processors (NNPs), cuDNN-compatible graphics cards (GPUs), custom VPUs like Movidius, and many others. The default CPU Backend also provides an interactive Interpreter mode that can be used to zero in on a DL model and create custom nGraph optimizations that can be used to further accelerate training or inference, in whatever scenario you need.

ngraph tensorflow mxnet deep-learning compiler performance onnx paddlepaddle neural-network deep-neural-networks pytorch caffe2The Simd Library is a free open source image processing library, designed for C and C++ programmers. It provides many useful high performance algorithms for image processing such as: pixel format conversion, image scaling and filtration, extraction of statistic information from images, motion detection, object detection (HAAR and LBP classifier cascades) and classification, neural network. The algorithms are optimized with using of different SIMD CPU extensions. In particular the library supports following CPU extensions: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX-512 for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC (big-endian), NEON for ARM.

simd sse avx neon image-processing altivec c-plus-plus vsx sse2 avx2 ssse3 simd-library sse41 arm powerpc lbp haar-cascade avx512Neanderthal is a Clojure library for fast matrix and linear algebra computations based on the highly optimized native libraries of BLAS and LAPACK computation routines for both CPU and GPU.. Read the documentation at Neanderthal Web Site.

clojure-library matrix gpu gpu-computing gpgpu opencl cuda high-performance-computing vectorization api matrix-factorization matrix-multiplication matrix-functions matrix-calculationsGPU accelerated handwritten digit recognition with regl. Note that this network will probably be slower than the corresponding network implemented on the CPU. This is because of the overhead associated with transferring data to and from the GPU. But in the future we will attempt implementing more complex networks in the browser, such as Neural Style, and then we think that we will see a significant speedup compared to the CPU.

regl cnn digit-recognition demo gpu webgl convolutional-neural-networks gpgpu deep-learning glsl digit recognition mnist convolutional neural network networksPyTorch is a flexible deep learning framework that allows automatic differentiation through dynamic neural networks (i.e., networks that utilise dynamic control flow like if statements and while loops). It supports GPU acceleration, distributed training, various optimisations, and plenty more neat features. These are some notes on how I think about using PyTorch, and don't encompass all parts of the library or every best practice, but may be helpful to others. Neural networks are a subclass of computation graphs. Computation graphs receive input data, and data is routed to and possibly transformed by nodes which perform processing on the data. In deep learning, the neurons (nodes) in neural networks typically transform data with parameters and differentiable functions, such that the parameters can be optimised to minimise a loss via gradient descent. More broadly, the functions can be stochastic, and the structure of the graph can be dynamic. So while neural networks may be a good fit for dataflow programming, PyTorch's API has instead centred around imperative programming, which is a more common way for thinking about programs. This makes it easier to read code and reason about complex programs, without necessarily sacrificing much performance; PyTorch is actually pretty fast, with plenty of optimisations that you can safely forget about as an end user (but you can dig in if you really want to).

deep-learningKeras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

deep-learning tensorflow theano neural-networks machine-learning data-scienceDLL is a library that aims to provide a C++ implementation of Restricted Boltzmann Machine (RBM) and Deep Belief Network (DBN) and their convolution versions as well. It also has support for some more standard neural networks. Note: When you clone the library, you need to clone the sub modules as well, using the --recursive option.

c-plus-plus cpp cpp11 cpp14 performance machine-learning deep-learning artificial-neural-networks gpu rbm cpu convolutional-neural-networksCompute Library for Deep Neural Networks (clDNN) is an open source performance library for Deep Learning (DL) applications intended for acceleration of DL Inference on Intel® Processor Graphics – including HD Graphics and Iris® Graphics. clDNN includes highly optimized building blocks for implementation of convolutional neural networks (CNN) with C and C++ interfaces. We created this project to enable the DL community to innovate on Intel® processors. Usages supported: Image recognition, image detection, and image segmentation.

deep-neural-networks deep-learning intel intel-hd-graphics cldnnGrenade is a composable, dependently typed, practical, and fast recurrent neural network library for concise and precise specifications of complex networks in Haskell. And that's it. Because the types are so rich, there's no specific term level code required to construct this network; although it is of course possible and easy to construct and deconstruct the networks and layers explicitly oneself.

machine-learning deep-neural-networks haskell deep-learning generative-adversarial-networks convolutional-neural-networks'Openpose' for human pose estimation have been implemented using Tensorflow. It also provides several variants that have made some changes to the network structure for real-time processing on the CPU or low-power embedded devices. 2018.5.21 Post-processing part is implemented in c++. It is required compiling the part. See: https://github.com/ildoonet/tf-pose-estimation/tree/master/src/pafprocess 2018.2.7 Arguments in run.py script changed. Support dynamic input size.

deep-learning openpose tensorflow mobilenet pose-estimation convolutional-neural-networks neural-network image-processing human-pose-estimation embedded realtime cnn mobile ros robotics catkinGosl is a Go library to develop Artificial Intelligence and High-Performance Scientific Computations. The library tries to be as general and easy as possible. Gosl considers the use of both Go concurrency routines and parallel computing using the message passing interface (MPI). Gosl has several modules (sub-packages) for a variety of tasks in scientific computing, image analysis, and data post-processing.

scientific-computing visualization linear-algebra differential-equations sparse-systems plotting mkl parallel-computations computational-geometry graph-theory tensor-algebra fast-fourier-transform eigenvalues eigenvectors hacktoberfest machine-learning artificial-intelligence optimization optimization-algorithms linear-programmingIntel MKL-DNN repository migrated to https://github.com/intel/mkl-dnn. The old address will continue to be available and will redirect to the new repo. Please update your links. Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN) is an open source performance library for deep learning applications. The library accelerates deep learning applications and framework on Intel(R) architecture. Intel(R) MKL-DNN contains vectorized and threaded building blocks which you can use to implement deep neural networks (DNN) with C and C++ interfaces.

intel mkl-dnn deep-learning deep-neural-networks cnn rnn lstm c-plus-plus intel-architecture xeon xeon-phi atom core simd sse42 avx2 avx512 avx512-vnni performanceThis repository contains the implementation of a convolutional neural network used to segment blood vessels in retina fundus images. This is a binary classification task: the neural network predicts if each pixel in the fundus image is either a vessel or not. The neural network structure is derived from the U-Net architecture, described in this paper. The performance of this neural network is tested on the DRIVE database, and it achieves the best score in terms of area under the ROC curve in comparison to the other methods published so far. Also on the STARE datasets, this method reports one of the best performances. The training of the neural network is performed on sub-images (patches) of the pre-processed full images. Each patch, of dimension 48x48, is obtained by randomly selecting its center inside the full image. Also the patches partially or completely outside the Field Of View (FOV) are selected, in this way the neural network learns how to discriminate the FOV border from blood vessels. A set of 190000 patches is obtained by randomly extracting 9500 patches in each of the 20 DRIVE training images. Although the patches overlap, i.e. different patches may contain same part of the original images, no further data augmentation is performed. The first 90% of the dataset is used for training (171000 patches), while the last 10% is used for validation (19000 patches).

Cellular Neural Networks (CNN) [wikipedia] [paper] are a parallel computing paradigm that was first proposed in 1988. Cellular neural networks are similar to neural networks, with the difference that communication is allowed only between neighboring units. Image Processing is one of its applications. CNN processors were designed to perform image processing; specifically, the original application of CNN processors was to perform real-time ultra-high frame-rate (>10,000 frame/s) processing unachievable by digital processors. This python library is the implementation of CNN for the application of Image Processing.

cellular neural-network cnn image-processing cnn-processors paper edge-detection corner-detection library cross-platform feedback computer-vision computer-science control
We have large collection of open source products. Follow the tags from
Tag Cloud >>

Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
**Add Projects.**