3d-pose-baseline - A simple baseline for 3d human pose estimation in tensorflow

  •        163

Julieta Martinez, Rayat Hossain, Javier Romero, James J. Little. A simple yet effective baseline for 3d human pose estimation. In ICCV, 2017. https://arxiv.org/pdf/1705.03098.pdf. The code in this repository was mostly written by Julieta Martinez, Rayat Hossain and Javier Romero.




Related Projects

Realtime_Multi-Person_Pose_Estimation - Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)

  •    Jupyter

By Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh. Code repo for winning 2016 MSCOCO Keypoints Challenge, 2016 ECCV Best Demo Award, and 2017 CVPR Oral paper.

siamfc-tf - SiamFC tracking in TensorFlow.

  •    Python

TensorFlow port of the tracking method described in the paper Fully-Convolutional Siamese nets for object tracking. In particular, it is the improved version presented as baseline in End-to-end representation learning for Correlation Filter based tracking, which achieves state-of-the-art performance at high framerate. The other methods presented in the paper (similar performance, shallower network) haven't been ported yet.

AliceVision - Photogrammetric Computer Vision Framework

  •    C++

AliceVision is a Photogrammetric Computer Vision Framework which provides a 3D Reconstruction and Camera Tracking algorithms. AliceVision aims to provide strong software basis with state-of-the-art computer vision algorithms that can be tested, analyzed and reused. The project is a result of collaboration between academia and industry to provide cutting-edge algorithms with the robustness and the quality required for production usage. Learn more details about the pipeline and tools based on it on AliceVision website.

deepgaze - Computer Vision library for human-computer interaction

  •    Python

Update 04/06/2017 Article "Head pose estimation in the wild using Convolutional Neural Networks and adaptive gradient methods" have been accepted for publication in Pattern Recogntion (Elsevier). The Deepgaze CNN head pose estimator module is based on this work. Update 22/03/2017 Fixed a bug in mask_analysis.py and almost completed a more robust version of the CNN head pose estimator.

OpenCV - Open Source Computer Vision

  •    C++

OpenCV (Open Source Computer Vision) is a library of programming functions for real time computer vision. The library has more than 500 optimized algorithms. It is used to interactive art, to mine inspection, stitching maps on the web on through advanced robotics.

Sophus - C++ implementation of Lie Groups using Eigen.

  •    C++

This is a c++ implementation of Lie groups commonly used for 2d and 3d geometric problems (i.e. for Computer Vision or Robotics applications). Among others, this package includes the special orthogonal groups SO(2) and SO(3) to present rotations in 2d and 3d as well as the special Euclidean group SE(2) and SE(3) to represent rigid body transformations (i.e. rotations and translations) in 2d and 3d. Sophus compiles with clang and gcc on Linux and OS X as well as msvc on Windows. The specific compiler and operating system versions which are supported are the ones which are used in the Continuous Integration (CI): See TravisCI and AppVeyor for details.

video-classification-3d-cnn-pytorch - Video classification tools using 3D ResNet

  •    Python

This is a pytorch code for video (action) classification using 3D ResNet trained by this code. The 3D ResNet is trained on the Kinetics dataset, which includes 400 action classes. This code uses videos as inputs and outputs class names and predicted class scores for each 16 frames in the score mode. In the feature mode, this code outputs features of 512 dims (after global average pooling) for each 16 frames. Torch (Lua) version of this code is available here.

robot-surgery-segmentation - Wining solution and its improvement for MICCAI 2017 Robotic Instrument Segmentation Sub-Challenge

  •    Jupyter

Here we present our wining solution and its improvement for MICCAI 2017 Robotic Instrument Segmentation Sub-Challenge. In this work, we describe our winning solution for MICCAI 2017 Endoscopic Vision Sub-Challenge: Robotic Instrument Segmentation and demonstrate further improvement over that result. Our approach is originally based on U-Net network architecture that we improved using state-of-the-art semantic segmentation neural networks known as LinkNet and TernausNet. Our results shows superior performance for a binary as well as for multi-class robotic instrument segmentation. We believe that our methods can lay a good foundation for the tracking and pose estimation in the vicinity of surgical scenes.

3dmatch-toolbox - 3DMatch - a 3D ConvNet-based local geometric descriptor for aligning 3D meshes and point clouds

  •    C++

Matching local geometric features on real-world depth images is a challenging task due to the noisy, low-resolution, and incomplete nature of 3D scan data. These difficulties limit the performance of current state-of-art methods, which are typically based on histograms over geometric properties. In this paper, we present 3DMatch, a data-driven model that learns a local volumetric patch descriptor for establishing correspondences between partial 3D data. To amass training data for our model, we propose an unsupervised feature learning method that leverages the millions of correspondence labels found in existing RGB-D reconstructions. Experiments show that our descriptor is not only able to match local geometry in new scenes for reconstruction, but also generalize to different tasks and spatial scales (e.g. instance-level object model alignment for the Amazon Picking Challenge, and mesh surface correspondence). Results show that 3DMatch consistently outperforms other state-of-the-art approaches by a significant margin. This code is released under the Simplified BSD License (refer to the LICENSE file for details).

lip-reading-deeplearning - :unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures

  •    Python

The input pipeline must be prepared by the users. This code is aimed to provide the implementation for Coupled 3D Convolutional Neural Networks for audio-visual matching. Lip-reading can be a specific application for this work. Audio-visual recognition (AVR) has been considered as a solution for speech recognition tasks when the audio is corrupted, as well as a visual recognition method used for speaker verification in multi-speaker scenarios. The approach of AVR systems is to leverage the extracted information from one modality to improve the recognition ability of the other modality by complementing the missing information.

crfasrnn_keras - CRF-RNN Keras/Tensorflow version

  •    Python

This repository contains Keras/Tensorflow code for the "CRF-RNN" semantic image segmentation method, published in the ICCV 2015 paper Conditional Random Fields as Recurrent Neural Networks. This paper was initially described in an arXiv tech report. The online demo of this project won the Best Demo Prize at ICCV 2015. Original Caffe-based code of this project can be found here. Results produced with this Keras/Tensorflow code are almost identical to that with the Caffe-based version. The root directory of the clone will be referred to as crfasrnn_keras hereafter.

ihog - Visualizing Object Detection Features. ICCV 2013

  •    C++

This software package contains tools to invert and visualize HOG features. It implements the Paired Dictionary Learning algorithm described in our paper "HOGgles: Visualizing Object Detection Features" [1]. If you run into trouble compiling the SPAMS code, you might try opening the file /path/to/ihog/spams/compile.m and adjusting the settings for your computer.

gvnn - gvnn: Geometric Vision with Neural Networks

  •    Lua

gvnn is primarily intended for self-supervised learning using low-level vision. It is inspired by the Spatial Transformer Networks (STN) paper that appeared in NIPS in 2015 and its open source code made available by Maxime Oquab. The code is self contained i.e. the original implementation of STN by Maxime is also within the repository. STs were mainly limited to applying only 2D transformations to the input. We added a new set of transformations often needed for manipulating data in 3D geometric computer vision. These include the 3D counterparts of what were used in original STN together with a lot more new transformations and different M-estimators.

cvat - Computer Vision Annotation Tool (CVAT) is a web-based tool which helps to annotate video and images for Computer Vision algorithms

  •    Javascript

CVAT is completely re-designed and re-implemented version of Video Annotation Tool from Irvine, California tool. It is free, online, interactive video and image annotation tool for computer vision. It is being used by our team to annotate million of objects with different properties. Many UI and UX decisions are based on feedbacks from professional data annotation team. Code released under the MIT License.

Mask_RCNN - Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

  •    Python

This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone. The code is documented and designed to be easy to extend. If you use it in your research, please consider citing this repository (bibtex below). If you work on 3D vision, you might find our recently released Matterport3D dataset useful as well. This dataset was created from 3D-reconstructed spaces captured by our customers who agreed to make them publicly available for academic use. You can see more examples here.

3D-Machine-Learning - A resource repository for 3D machine learning


In recent years, tremendous amount of progress is being made in the field of 3D Machine Learning, which is an interdisciplinary field that fuses computer vision, computer graphics and machine learning. This repo is derived from my study notes and will be used as a place for triaging new research papers. To contribute to this Repo, you may add content through pull requests or open an issue to let me know.

luminoth - Deep Learning toolkit for Computer Vision

  •    Python

Luminoth is an open source toolkit for computer vision. Currently, we support object detection, but we are aiming for much more. It is built in Python, using TensorFlow and Sonnet. Read the full documentation here.

OpenFace - OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation

  •    C++

Over the past few years, there has been an increased interest in automatic facial behavior analysis and understanding. We present OpenFace – a tool intended for computer vision and machine learning researchers, affective computing community and people interested in building interactive applications based on facial behavior analysis. OpenFace is the first toolkit capable of facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation with available source code for both running and training the models. The computer vision algorithms which represent the core of OpenFace demonstrate state-of-the-art results in all of the above mentioned tasks. Furthermore, our tool is capable of real-time performance and is able to run from a simple webcam without any specialist hardware. OpenFace is an implementation of a number of research papers from the Multicomp group, Language Technologies Institute at the Carnegie Mellon University and Rainbow Group, Computer Laboratory, University of Cambridge. The founder of the project and main developer is Tadas Baltrušaitis.

VideoMan Library

  •    C++

Library for capturing video from cameras, 3d sensors, frame-grabbers, video files and image sequences. It can also display multiple images using OpenGL with different layouts. Easy integration with OpenCV, CUDA... Perfect for computer vision. Keywords: video capture, computer vision, machine vision, opencv, opengl, cameras, video input devices, firewire, usb, gige

2D-and-3D-face-alignment - This repository implements a demo of the networks described in "How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)" paper

  •    Lua

This repository implements a demo of the networks described in "How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)" paper. Please visit our webpage or read bellow for instructions on how to run the code and access the dataset. Note: If you are interested in a binarized version, capable of running on devices with limited resources please also check https://github.com/1adrianb/binary-face-alignment for a demo.