Displaying 1 to 13 from 13 results

3D-ResNets-PyTorch - 3D ResNets for Action Recognition (CVPR 2018)

  •    Python

Our paper "Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?" is accepted to CVPR2018! We update the paper information. We uploaded some of fine-tuned models on UCF-101 and HMDB-51.

temporal-segment-networks - Code & Models for Temporal Segment Networks (TSN) in ECCV 2016

  •    Python

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition, Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool, ECCV 2016, Amsterdam, Netherlands. Sep. 8, 2017 - We released TSN models trained on the Kinetics dataset with 76.6% single model top-1 accuracy. Find the model weights and transfer learning experiment results on the website.

action-detection - temporal action detection with SSN

  •    Python

Temporal Action Detection with Structured Segment Networks Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, Dahua Lin, ICCV 2017, Venice, Italy. A Pursuit of Temporal Accuracy in General Activity Detection Yuanjun Xiong, Yue Zhao, Limin Wang, Dahua Lin, and Xiaoou Tang, arXiv:1703.02716.

tsn-pytorch - Temporal Segment Networks (TSN) in PyTorch

  •    Python

Now in experimental release, suggestions welcome. Note: always use git clone --recursive https://github.com/yjxiong/tsn-pytorch to clone this project. Otherwise you will not be able to use the inception series CNN archs.




video-classification-3d-cnn-pytorch - Video classification tools using 3D ResNet

  •    Python

This is a pytorch code for video (action) classification using 3D ResNet trained by this code. The 3D ResNet is trained on the Kinetics dataset, which includes 400 action classes. This code uses videos as inputs and outputs class names and predicted class scores for each 16 frames in the score mode. In the feature mode, this code outputs features of 512 dims (after global average pooling) for each 16 frames. Torch (Lua) version of this code is available here.

temporal-augmentation - Temporal augmentation with two-stream ConvNet features on human action recognition

  •    Lua

Two-stream ConvNet has been recognized as one of the most deep ConvNet on video understanding, specifically human action recogniton. However, it suffers from the insufficient temporal datas for training. This repository aims to implement the temporal segments RNN for training on vidoes with temporal augmentation. The implementation is based on example code from fb.resnet.torch, and was largely modified in order to work with frame level features.

realtime-action-detection - This repository host the code for real-time action detection paper

  •    Matlab

An implementation of our work (Online Real-time Multiple Spatiotemporal Action Localisation and Prediction) published in ICCV 2017. Originally, we used Caffe implementation of SSD-V2 for publication. I have forked the version of SSD-CAFFE which I used to generate results for paper, you try that if you want to use caffe. You can use that repo if like caffe other I would recommend using this version. This implementation is bit off from original work. It works slightly, better on lower IoU and higher IoU and vice-versa. Tube generation part in original implementations as same as this. I found that this implementation of SSD is slight worse @ IoU greater or equal to 0.5 in context of the UCF24 dataset.

action-recognition-using-3d-resnet - Use 3D ResNet to extract features of UCF101 and HMDB51 and then classify them

  •    Python

Use 3D ResNet to extract features of UCF101 and HMDB51 and then classify them. Also, you can download my extracted features of ucf101 and hmdb51 at here and here. Remember to put the first one to data/jsons/ucf101 before you download the second one, otherwise the first one will be convered.


3D-ResNets - 3D ResNets for Action Recognition

  •    Lua

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh, "Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?", arXiv preprint, arXiv:1711.09577, 2017. Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh, "Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition", Proceedings of the ICCV Workshop on Action, Gesture, and Emotion Recognition, 2017.

video-classification-3d-cnn - Video classification tools using 3D ResNet

  •    Lua

This is a torch code for video (action) classification using 3D ResNet trained by this code. The 3D ResNet is trained on the Kinetics dataset, which includes 400 action classes. This code uses videos as inputs and outputs class names and predicted class scores for each 16 frames. PyTorch (Python) version of this code is available here.

adascan-public - Code for AdaScan: Adaptive Scan Pooling (CVPR 2017)

  •    Python

This repository contains the source code for the paper Adascan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos, Amlan Kar* (IIT Kanpur), Nishant Rai* (IIT Kanpur), Karan Sikka (UCSD and SRI), Gaurav Sharma (IIT Kanpur), with support for multi-GPU training and testing. These models have been trained on UCF-101. We will be releasing the updated models soon.

ActionVLAD - ActionVLAD for video action classification (CVPR 2017)

  •    Python

Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic and Bryan Russell. ActionVLAD: Learning spatio-temporal aggregation for action classification. In Conference on Computer Vision and Pattern Recognition (CVPR), 2017. Note: Be careful to re-organize them given our filename and class ordering.

AttentionalPoolingAction - Code/Model release for NIPS 2017 paper "Attentional Pooling for Action Recognition"

  •    Python

Rohit Girdhar and Deva Ramanan. Attentional Pooling for Action Recognition. Advances in Neural Information Processing Systems (NIPS), 2017. Convert the MPII data into tfrecords. The system also can read from individual JPEG files, but that needs a slightly different intial setup.