mediapipe - MediaPipe is a cross-platform framework for building multimodal applied machine learning pipelines

  •        70

MediaPipe is a framework for building multimodal (eg. video, audio, any time series data) applied ML pipelines. With MediaPipe, a perception pipeline can be built as a graph of modular components, including, for instance, inference models (e.g., TensorFlow, TFLite) and media processing functions. Follow these instructions.



Related Projects

Accord.NET - Machine learning, Computer vision, Statistics and general scientific computing for .NET

  •    CSharp

The Accord.NET project provides machine learning, statistics, artificial intelligence, computer vision and image processing methods to .NET. It can be used on Microsoft Windows, Xamarin, Unity3D, Windows Store applications, Linux or mobile.

opencv - Open Source Computer Vision Library

  •    C++

Please read the contribution guidelines before starting work on a pull request.

sod - An Embedded Computer Vision & Machine Learning Library (CPU Optimized & IoT Capable)

  •    C

SOD is an embedded, modern cross-platform computer vision and machine learning software library that expose a set of APIs for deep-learning, advanced media analysis & processing including real-time, multi-class object detection and model training on embedded systems with limited computational resource and IoT devices. SOD was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in open source as well commercial products.


  •    Javascript

This is a library for processing images/video in pure JavaScript using HTML5 features like Canvas, WebWorkers, WebGL and SVG (in progress) or analogs in Node.js. Some filters code has been adapted from open source libraries (mostly c, java and flash, plus a couple from javascript libraries), see the comments in the code for details.


  •    CSharp

LibLab is a C# Library, Networking, Camera, Image Processing, Audio Processing, Video Processing and Computer Vision

vpp - Video++, a C++14 high performance video and image processing library.

  •    C++

The generic container imageNd<V, N> represents a dense N-dimensional rectangle set of pixels with values of type V. For convenience, image1d, image2d, image3d are respectively aliases to imageNd<V, 1>, imageNd<V, 2>, and imageNd<V, 3>. These types provide accesses to the pixel buffer and to other piece of information useful to process the image. In contrast to std::vector, assigning an image to the other does not copy the data, but share them so no accidental expensive deep copy happen.

SerpentAI - Game Agent Framework. Helping you create AIs / Bots to play any game you own!

  •    Jupyter

The framework features a large assortment of supporting modules that provide solutions to commonly encountered scenarios when using video games as environments as well as CLI tools to accelerate development. It provides some useful conventions but is absolutely NOT opiniated about what you put in your agents: Want to use the latest, cutting-edge deep reinforcement learning algorithm? ALLOWED. Want to use computer vision techniques, image processing and trigonometry? ALLOWED. Want to randomly press the Left or Right buttons? sigh ALLOWED. To top it all off, Serpent.AI was designed to be entirely plugin-based (for both game support and game agents) so your experiments are actually portable and distributable to your peers and random strangers on the Internet. You'll also be glad to hear that all 3 major OSes are supported: Linux, Windows & macOS.

essentia - C++ library for audio and music analysis, description and synthesis, including Python bindings

  •    Jupyter

Essentia is an open-source C++ library for audio analysis and audio-based music information retrieval released under the Affero GPL license. It contains an extensive collection of reusable algorithms which implement audio input/output functionality, standard digital signal processing blocks, statistical characterization of data, and a large set of spectral, temporal, tonal and high-level music descriptors. The library is also wrapped in Python and includes a number of predefined executable extractors for the available music descriptors, which facilitates its use for fast prototyping and allows setting up research experiments very rapidly. Furthermore, it includes a Vamp plugin to be used with Sonic Visualiser for visualization purposes. Essentia is designed with a focus on the robustness of the provided music descriptors and is optimized in terms of the computational cost of the algorithms. The provided functionality, specifically the music descriptors included in-the-box and signal processing algorithms, is easily expandable and allows for both research experiments and development of large-scale industrial applications. If you use example extractors (located in src/examples), or your own code employing Essentia algorithms to compute descriptors, you should be aware of possible incompatibilities when using different versions of Essentia.

t81_558_deep_learning - Washington University (in St

  •    Jupyter

Deep learning is a group of exciting new technologies for neural networks. Through a combination of advanced training techniques and neural network architectural components, it is now possible to create neural networks of much greater complexity. Deep learning allows a neural network to learn hierarchies of information in a way that is like the function of the human brain. This course will introduce the student to computer vision with Convolution Neural Networks (CNN), time series analysis with Long Short-Term Memory (LSTM), classic neural network structures and application to computer security. High Performance Computing (HPC) aspects will demonstrate how deep learning can be leveraged both on graphical processing units (GPUs), as well as grids. Focus is primarily upon the application of deep learning to problems, with some introduction mathematical foundations. Students will use the Python programming language to implement deep learning using Google TensorFlow and Keras. It is not necessary to know Python prior to this course; however, familiarity of at least one programming language is assumed. This course will be delivered in a hybrid format that includes both classroom and online instruction. This syllabus presents the expected class schedule, due dates, and reading assignments. Download current syllabus.

eos - A lightweight 3D Morphable Face Model fitting library in modern C++11/14

  •    C++

eos is a lightweight 3D Morphable Face Model fitting library that provides basic functionality to use face models, as well as camera and shape fitting functionality. It's written in modern C++11/14. An experimental model viewer to visualise 3D Morphable Models and blendshapes is available here.

dlib - A toolkit for making real world machine learning and data analysis applications in C++

  •    C++

Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real world problems. See for the main project documentation and API reference. Doing so will make some things run faster.

BotSharp - The Open Source AI Chatbot Platform Builder in 100% C# Running in

  •    CSharp

BotSharp is an open source machine learning framework for AI Bot platform builder. This project involves natural language understanding, computer vision and audio processing technologies, and aims to promote the development and application of intelligent robot assistants in information systems. Out-of-the-box machine learning algorithms allow ordinary programmers to develop artificial intelligence applications faster and easier. It's witten in C# running on .Net Core that is full cross-platform framework. C# is a enterprise grade programming language which is widely used to code business logic in information management related system. More friendly to corporate developers. BotSharp adopts machine learning algrithm in C# directly. That will facilitate the feature of the typed language C#, and be more easier when refactoring code in system scope.


  •    Objective-C

MediaPipe is a flexible framework to manipulate medias. It allows building customized decoding/filtering/encoding pipelines. This project contains the MediaPipe SDK along with some sample pipes.

DeepLearn - Implementation of research papers on Deep Learning+ NLP+ CV in Python using Keras, Tensorflow and Scikit Learn

  •    Python

Implementation of research papers on Deep Learning+ NLP+ CV in Python using Keras, Tensorflow and Scikit Learn.

pyannote-audio - Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding

  •    Python

Open Phd/postdoc positions at LIMSI combining machine learning, NLP, speech processing, and computer vision. If you use in your research, please use the following citations.

3dmatch-toolbox - 3DMatch - a 3D ConvNet-based local geometric descriptor for aligning 3D meshes and point clouds

  •    C++

Matching local geometric features on real-world depth images is a challenging task due to the noisy, low-resolution, and incomplete nature of 3D scan data. These difficulties limit the performance of current state-of-art methods, which are typically based on histograms over geometric properties. In this paper, we present 3DMatch, a data-driven model that learns a local volumetric patch descriptor for establishing correspondences between partial 3D data. To amass training data for our model, we propose an unsupervised feature learning method that leverages the millions of correspondence labels found in existing RGB-D reconstructions. Experiments show that our descriptor is not only able to match local geometry in new scenes for reconstruction, but also generalize to different tasks and spatial scales (e.g. instance-level object model alignment for the Amazon Picking Challenge, and mesh surface correspondence). Results show that 3DMatch consistently outperforms other state-of-the-art approaches by a significant margin. This code is released under the Simplified BSD License (refer to the LICENSE file for details).

practical-machine-learning-with-python - Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system

  •    Jupyter

"Data is the new oil" is a saying which you must have heard by now along with the huge interest building up around Big Data and Machine Learning in the recent past along with Artificial Intelligence and Deep Learning. Besides this, data scientists have been termed as having "The sexiest job in the 21st Century" which makes it all the more worthwhile to build up some valuable expertise in these areas. Getting started with machine learning in the real world can be overwhelming with the vast amount of resources out there on the web. "Practical Machine Learning with Python" follows a structured and comprehensive three-tiered approach packed with concepts, methodologies, hands-on examples, and code. This book is packed with over 500 pages of useful information which helps its readers master the essential skills needed to recognize and solve complex problems with Machine Learning and Deep Learning by following a data-driven mindset. By using real-world case studies that leverage the popular Python Machine Learning ecosystem, this book is your perfect companion for learning the art and science of Machine Learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute Machine Learning systems and projects successfully.

patchfield - Audio infrastructure for Android

  •    Java

Patchfield is an audio infrastructure for Android that provides a simple, callback-driven API for implementing audio modules (such as synthesizers and effects), a graph-based API for connecting audio modules, plus support for inter-app audio routing. It is inspired by JACK, the JACK audio connection kit.Patchfield consists of a remote or local service that acts as a virtual patchbay that audio apps can connect to, plus a number of sample apps that illustrate how to implement audio modules and how to manipulate the signal processing graph. The implementation resides entirely in userland and works on many stock consumer devices, such as Nexus 7 and 10.

pcl - Point Cloud Library (PCL)

  •    C++

The Point Cloud Library (PCL) is a standalone, large scale, open project for 2D/3D image and point cloud processing. PCL is released under the terms of the BSD license, and thus free for commercial and research use. We are financially supported by a consortium of commercial companies, with our own non-profit organization, Open Perception. We would also like to thank individual donors and contributors that have been helping the project.