Dataset_Synthesizer - NVIDIA Deep learning Dataset Synthesizer (NDDS)

  •        143

NDDS is a UE4 plugin from NVIDIA to empower computer vision researchers to export high-quality synthetic images with metadata. NDDS supports images, segmentation, depth, object pose, bounding box, keypoints, and custom stencils. In addition to the exporter, the plugin includes different components for generating highly randomized images. This randomization includes lighting, objects, camera position, poses, textures, and distractors, as well as camera path following, and so forth. Together, these components allow researchers to easily create randomized scenes for training deep neural networks. Example of an image generated using NDDS, along with ground truth segmentation, depth, and object poses. For utilities to help visualize annotation data associated with synthesized images, see the NVIDIA dataset utilities (NVDU)



Related Projects

AlphaPose - Multi-Person Pose Estimation System

  •    Jupyter

Alpha Pose is an accurate multi-person pose estimator, which is the first open-source system that achieves 70+ mAP (72.3 mAP) on COCO dataset and 80+ mAP (82.1 mAP) on MPII dataset. To match poses that correspond to the same person across frames, we also provide an efficient online pose tracker called Pose Flow. It is the first open-source online pose tracker that achieves both 60+ mAP (66.5 mAP) and 50+ MOTA (58.3 MOTA) on PoseTrack Challenge dataset. Note: Please read PoseFlow/ for details.

ImageAI - A python library built to empower developers to build applications and systems with self-contained Computer Vision capabilities

  •    Python

A python library built to empower developers to build applications and systems with self-contained Deep Learning and Computer Vision capabilities using simple and few lines of code. Built with simplicity in mind, ImageAI supports a list of state-of-the-art Machine Learning algorithms for image prediction, custom image prediction, object detection, video detection, video object tracking and image predictions trainings. ImageAI currently supports image prediction and training using 4 different Machine Learning algorithms trained on the ImageNet-1000 dataset. ImageAI also supports object detection, video detection and object tracking using RetinaNet, YOLOv3 and TinyYOLOv3 trained on COCO dataset. Eventually, ImageAI will provide support for a wider and more specialized aspects of Computer Vision including and not limited to image recognition in special environments and special fields.

AdaptSegNet - Learning to Adapt Structured Output Space for Semantic Segmentation, CVPR 2018 (spotlight)

  •    Python

Pytorch implementation of our method for adapting semantic segmentation from the synthetic dataset (source domain) to the real dataset (target domain). Based on this implementation, our result is ranked 3rd in the VisDA Challenge. Learning to Adapt Structured Output Space for Semantic Segmentation Yi-Hsuan Tsai*, Wei-Chih Hung*, Samuel Schulter, Kihyuk Sohn, Ming-Hsuan Yang and Manmohan Chandraker IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 (spotlight) (* indicates equal contribution).

deep-head-pose - :fire::fire: Deep Learning Head Pose Estimation using PyTorch.

  •    Python

Hopenet is an accurate and easy to use head pose estimation network. Models have been trained on the 300W-LP dataset and have been tested on real data with good qualitative performance. For details about the method and quantitative results please check the paper.

jetson-inference - Guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson

  •    C++

Welcome to our training guide for inference and deep vision runtime library for NVIDIA DIGITS and Jetson Xavier/TX1/TX2. This repo uses NVIDIA TensorRT for efficiently deploying neural networks onto the embedded platform, improving performance and power efficiency using graph optimizations, kernel fusion, and half-precision FP16 on the Jetson.

OpenFace - OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation

  •    C++

Over the past few years, there has been an increased interest in automatic facial behavior analysis and understanding. We present OpenFace – a tool intended for computer vision and machine learning researchers, affective computing community and people interested in building interactive applications based on facial behavior analysis. OpenFace is the first toolkit capable of facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation with available source code for both running and training the models. The computer vision algorithms which represent the core of OpenFace demonstrate state-of-the-art results in all of the above mentioned tasks. Furthermore, our tool is capable of real-time performance and is able to run from a simple webcam without any specialist hardware. OpenFace is an implementation of a number of research papers from the Multicomp group, Language Technologies Institute at the Carnegie Mellon University and Rainbow Group, Computer Laboratory, University of Cambridge. The founder of the project and main developer is Tadas Baltrušaitis.

sod - An Embedded Computer Vision & Machine Learning Library (CPU Optimized & IoT Capable)

  •    C

SOD is an embedded, modern cross-platform computer vision and machine learning software library that expose a set of APIs for deep-learning, advanced media analysis & processing including real-time, multi-class object detection and model training on embedded systems with limited computational resource and IoT devices. SOD was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in open source as well commercial products.

deepgaze - Computer Vision library for human-computer interaction

  •    Python

Update 04/06/2017 Article "Head pose estimation in the wild using Convolutional Neural Networks and adaptive gradient methods" have been accepted for publication in Pattern Recogntion (Elsevier). The Deepgaze CNN head pose estimator module is based on this work. Update 22/03/2017 Fixed a bug in and almost completed a more robust version of the CNN head pose estimator.

raster-vision - deep learning for aerial/satellite imagery

  •    Python

Note: this project is under development and may be difficult to use at the moment. The overall goal of Raster Vision is to make it easy to train and run deep learning models over aerial and satellite imagery. At the moment, it includes functionality for making training data, training models, making predictions, and evaluating models for the task of object detection implemented via the Tensorflow Object Detection API. It also supports running experimental workflows using AWS Batch. The library is designed to be easy to extend to new data sources, machine learning tasks, and machine learning implementation.

openpose - OpenPose: Real-time multi-person keypoint detection library for body, face, and hands estimation

  •    C++

OpenPose represents the first real-time multi-person system to jointly detect human body, hand, and facial keypoints (in total 135 keypoints) on single images. For further details, check all released features and release notes.

luminoth - Deep Learning toolkit for Computer Vision

  •    Python

Luminoth is an open source toolkit for computer vision. Currently, we support object detection, but we are aiming for much more. It is built in Python, using TensorFlow and Sonnet. Read the full documentation here.

DetectAndTrack - The implementation of an algorithm presented in the CVPR18 paper: "Detect-and-Track: Efficient Pose Estimation in Videos"

  •    Python

R. Girdhar, G. Gkioxari, L. Torresani, M. Paluri and D. Tran. Detect-and-Track: Efficient Pose Estimation in Videos. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018. This code was developed and tested on NVIDIA P100 (16GB), M40 (12GB) and 1080Ti (11GB) GPUs. Training requires at least 4 GPUs for most configurations, and some were trained with 8 GPUs. It might be possible to train on a single GPU by scaling down the learning rate and scaling up the iteration schedule, but we have not tested all possible setups. Testing can be done on a single GPU. Unfortunately it is currently not possible to run this on a CPU as some ops do not have CPU implementations.

hed - code for Holistically-Nested Edge Detection

  •    C++

We develop a new edge detection algorithm, holistically-nested edge detection (HED), which performs image-to-image prediction by means of a deep learning model that leverages fully convolutional neural networks and deeply-supervised nets. HED automatically learns rich hierarchical representations (guided by deep supervision on side responses) that are important in order to resolve the challenging ambiguity in edge and object boundary detection. We significantly advance the state-of-the-art on the BSD500 dataset (ODS F-score of .790) and the NYU Depth dataset (ODS F-score of .746), and do so with an improved speed (0.4s per image). Detailed description of the system can be found in our paper. If you have downloaded the previous version (testing code) of HED, please note that we updated the code base to the new version of Caffe. We uploaded a new pretrained model with better performance. We adopted the python interface written for the FCN paper instead of our own implementation for training and testing. The evaluation protocol doesn't change.

fashion-mnist - A MNIST-like fashion product database. Benchmark :point_right:

  •    Python

Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.

Realtime_Multi-Person_Pose_Estimation - Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)

  •    Jupyter

By Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh. Code repo for winning 2016 MSCOCO Keypoints Challenge, 2016 ECCV Best Demo Award, and 2017 CVPR Oral paper.

T-CNN - ImageNet 2015 Object Detection from Video (VID)

  •    Python

The TCNN framework is a deep learning framework for object detection in videos. This framework was orginally designed for the ImageNet VID chellenge in ILSVRC2015. If you are using the T-CNN code in you project, please cite the following works.

tf-pose-estimation - Deep Pose Estimation implemented using Tensorflow with Custom Architectures for fast inference

  •    PureBasic

'Openpose' for human pose estimation have been implemented using Tensorflow. It also provides several variants that have made some changes to the network structure for real-time processing on the CPU or low-power embedded devices. 2018.5.21 Post-processing part is implemented in c++. It is required compiling the part. See: 2018.2.7 Arguments in script changed. Support dynamic input size.

SiaNet - An easy to use C# deep learning library with CUDA/OpenCL support

  •    CSharp

Developing a C# wrapper to help developer easily create and train deep neural network models. The below is a classification example with Titanic dataset. Able to reach 75% accuracy within 10 epoch.

robot-surgery-segmentation - Wining solution and its improvement for MICCAI 2017 Robotic Instrument Segmentation Sub-Challenge

  •    Jupyter

Here we present our wining solution and its improvement for MICCAI 2017 Robotic Instrument Segmentation Sub-Challenge. In this work, we describe our winning solution for MICCAI 2017 Endoscopic Vision Sub-Challenge: Robotic Instrument Segmentation and demonstrate further improvement over that result. Our approach is originally based on U-Net network architecture that we improved using state-of-the-art semantic segmentation neural networks known as LinkNet and TernausNet. Our results shows superior performance for a binary as well as for multi-class robotic instrument segmentation. We believe that our methods can lay a good foundation for the tracking and pose estimation in the vicinity of surgical scenes.