Dataset_Utilities - NVIDIA Dataset Utilities (NVDU)

  •        58

This project is a collection of Python scripts to help work with datasets for deep learning. For example, visualizing annotation data associated with captured sensor images generated by NVIDIA Deep learning Dataset Synthesizer (NDDS) This module depends on OpenCV-python which currently doesn't work with Python 3.7.



Related Projects

AlphaPose - Multi-Person Pose Estimation System

  •    Jupyter

Alpha Pose is an accurate multi-person pose estimator, which is the first open-source system that achieves 70+ mAP (72.3 mAP) on COCO dataset and 80+ mAP (82.1 mAP) on MPII dataset. To match poses that correspond to the same person across frames, we also provide an efficient online pose tracker called Pose Flow. It is the first open-source online pose tracker that achieves both 60+ mAP (66.5 mAP) and 50+ MOTA (58.3 MOTA) on PoseTrack Challenge dataset. Note: Please read PoseFlow/ for details.

deep-head-pose - :fire::fire: Deep Learning Head Pose Estimation using PyTorch.

  •    Python

Hopenet is an accurate and easy to use head pose estimation network. Models have been trained on the 300W-LP dataset and have been tested on real data with good qualitative performance. For details about the method and quantitative results please check the paper.

tf-pose-estimation - Deep Pose Estimation implemented using Tensorflow with Custom Architectures for fast inference

  •    PureBasic

'Openpose' for human pose estimation have been implemented using Tensorflow. It also provides several variants that have made some changes to the network structure for real-time processing on the CPU or low-power embedded devices. 2018.5.21 Post-processing part is implemented in c++. It is required compiling the part. See: 2018.2.7 Arguments in script changed. Support dynamic input size.

AdaptSegNet - Learning to Adapt Structured Output Space for Semantic Segmentation, CVPR 2018 (spotlight)

  •    Python

Pytorch implementation of our method for adapting semantic segmentation from the synthetic dataset (source domain) to the real dataset (target domain). Based on this implementation, our result is ranked 3rd in the VisDA Challenge. Learning to Adapt Structured Output Space for Semantic Segmentation Yi-Hsuan Tsai*, Wei-Chih Hung*, Samuel Schulter, Kihyuk Sohn, Ming-Hsuan Yang and Manmohan Chandraker IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 (spotlight) (* indicates equal contribution).

DeepLabCut - Markerless tracking of user-defined features with deep learning

  •    Python

Welcome to the DeepLabCut repository, a toolbox for markerless tracking of body parts of animals in lab settings performing various tasks, like trail tracking, reaching in mice and various Drosophila behaviors during egg-laying (see Mathis et al. for details). There is, however, nothing specific that makes the toolbox only applicable to these tasks and/or species. The toolbox has also already been successfully applied to rats, humans, various fish species, bacteria, leeches, various robots, and race horses. Please check out for video demonstrations of automated tracking. This work utilizes the feature detectors (ResNet + readout layers) of one of the state-of-the-art algorithms for human pose estimation by Insafutdinov et al., called DeeperCut, which inspired the name for our toolbox (see references below).

T2F - T2F: text to face generation using Deep Learning

  •    Python

Text-to-Face generation using Deep Learning. This project combines two of the recent architectures StackGAN and ProGAN for synthesizing faces from textual descriptions. The project uses Face2Text dataset which contains 400 facial images and textual captions for each of them. The data can be obtained by contacting either the RIVAL group or the authors of the aforementioned paper. The code is present in the implementation/ subdirectory. The implementation is done using the PyTorch framework. So, for running this code, please install PyTorch version 0.4.0 before continuing.

LSTM-Human-Activity-Recognition - Human Activity Recognition example using TensorFlow on smartphone sensors dataset and an LSTM RNN (Deep Learning algo)

  •    Jupyter

Compared to a classical approach, using a Recurrent Neural Networks (RNN) with Long Short-Term Memory cells (LSTMs) require no or almost no feature engineering. Data can be fed directly into the neural network who acts like a black box, modeling the problem correctly. Other research on the activity recognition dataset can use a big amount of feature engineering, which is rather a signal processing approach combined with classical data science techniques. The approach here is rather very simple in terms of how much was the data preprocessed. Let's use Google's neat Deep Learning library, TensorFlow, demonstrating the usage of an LSTM, a type of Artificial Neural Network that can process sequential data / time series.

DeepNeuralClassifier - Deep neural network using rectified linear units to classify hand written symbols from the MNIST dataset

  •    Julia

Example scripts for a deep, feed-forward neural network have been written from scratch. No machine learning packages are used, providing an example of how to implement the underlying algorithms of an artificial neural network. The code is written in the Julia, a programming language with a syntax similar to Matlab. The neural network is trained on the MNIST dataset of handwritten digits. On the test dataset, the neural network correctly classifies 98.42 % of the handwritten digits. The results are pretty good for a fully connected neural network that does not contain a priori knowledge about the geometric invariances of the dataset like a Convolutional Neural Network would.

DetectAndTrack - The implementation of an algorithm presented in the CVPR18 paper: "Detect-and-Track: Efficient Pose Estimation in Videos"

  •    Python

R. Girdhar, G. Gkioxari, L. Torresani, M. Paluri and D. Tran. Detect-and-Track: Efficient Pose Estimation in Videos. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018. This code was developed and tested on NVIDIA P100 (16GB), M40 (12GB) and 1080Ti (11GB) GPUs. Training requires at least 4 GPUs for most configurations, and some were trained with 8 GPUs. It might be possible to train on a single GPU by scaling down the learning rate and scaling up the iteration schedule, but we have not tested all possible setups. Testing can be done on a single GPU. Unfortunately it is currently not possible to run this on a CPU as some ops do not have CPU implementations.

textgenrnn - Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code

  •    Python

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code, or quickly train on a text using a pretrained model. The included model can easily be trained on new texts, and can generate appropriate text even after a single pass of the input data.

fashion-mnist - A MNIST-like fashion product database. Benchmark :point_right:

  •    Python

Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.

Realtime_Multi-Person_Pose_Estimation - Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)

  •    Jupyter

By Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh. Code repo for winning 2016 MSCOCO Keypoints Challenge, 2016 ECCV Best Demo Award, and 2017 CVPR Oral paper.

Everybody_Dance_Now - This is the code for "Everybody Dance Now!" By Siraj Raval on Youtube

  •    Python

This is a pix2pix demo that learns from pose and translates this into a human. A webcam-enabled application is also provided that translates your pose to the trained pose. If you want to download my dataset, here is also the video file that I used and the generated training dataset (1427 images already split into training and validation).

ImageAI - A python library built to empower developers to build applications and systems with self-contained Computer Vision capabilities

  •    Python

A python library built to empower developers to build applications and systems with self-contained Deep Learning and Computer Vision capabilities using simple and few lines of code. Built with simplicity in mind, ImageAI supports a list of state-of-the-art Machine Learning algorithms for image prediction, custom image prediction, object detection, video detection, video object tracking and image predictions trainings. ImageAI currently supports image prediction and training using 4 different Machine Learning algorithms trained on the ImageNet-1000 dataset. ImageAI also supports object detection, video detection and object tracking using RetinaNet, YOLOv3 and TinyYOLOv3 trained on COCO dataset. Eventually, ImageAI will provide support for a wider and more specialized aspects of Computer Vision including and not limited to image recognition in special environments and special fields.

openpose - OpenPose: Real-time multi-person keypoint detection library for body, face, and hands estimation

  •    C++

OpenPose represents the first real-time multi-person system to jointly detect human body, hand, and facial keypoints (in total 135 keypoints) on single images. For further details, check all released features and release notes.

fma - FMA: A Dataset For Music Analysis

  •    Jupyter

Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2. The dataset is a dump of the Free Music Archive (FMA), an interactive library of high-quality, legal audio downloads. Below the abstract from the paper.

sphereface - Implementation for <SphereFace: Deep Hypersphere Embedding for Face Recognition> in CVPR'17

  •    Jupyter

SphereFace is released under the MIT License (refer to the LICENSE file for details). 2018.8.14: We recommand an interesting ECCV 2018 paper that comprehensively evaluates SphereFace (A-Softmax) on current widely used face datasets and their proposed noise-controlled IMDb-Face dataset. Interested users can try to train SphereFace on their IMDb-Face dataset. Take a look here.

hed - code for Holistically-Nested Edge Detection

  •    C++

We develop a new edge detection algorithm, holistically-nested edge detection (HED), which performs image-to-image prediction by means of a deep learning model that leverages fully convolutional neural networks and deeply-supervised nets. HED automatically learns rich hierarchical representations (guided by deep supervision on side responses) that are important in order to resolve the challenging ambiguity in edge and object boundary detection. We significantly advance the state-of-the-art on the BSD500 dataset (ODS F-score of .790) and the NYU Depth dataset (ODS F-score of .746), and do so with an improved speed (0.4s per image). Detailed description of the system can be found in our paper. If you have downloaded the previous version (testing code) of HED, please note that we updated the code base to the new version of Caffe. We uploaded a new pretrained model with better performance. We adopted the python interface written for the FCN paper instead of our own implementation for training and testing. The evaluation protocol doesn't change.

SiaNet - An easy to use C# deep learning library with CUDA/OpenCL support

  •    CSharp

Developing a C# wrapper to help developer easily create and train deep neural network models. The below is a classification example with Titanic dataset. Able to reach 75% accuracy within 10 epoch.