vatic - Efficiently Scaling Up Video Annotation with Crowdsourced Marketplaces. IJCV 2012

  •        137

VATIC is an online video annotation tool for computer vision research that crowdsources work to Amazon's Mechanical Turk. Our tool makes it easy to build massive, affordable video data sets. Note: VATIC has only been tested on Ubuntu with Apache 2.2 HTTP server and a MySQL server. This document will describe installation on this platform, however it should work any operating system and with any server.

http://mit.edu/vondrick/vatic/
https://github.com/cvondrick/vatic

Tags
Implementation
License
Platform

   




Related Projects

cvat - Computer Vision Annotation Tool (CVAT) is a web-based tool which helps to annotate video and images for Computer Vision algorithms

  •    Javascript

CVAT is completely re-designed and re-implemented version of Video Annotation Tool from Irvine, California tool. It is free, online, interactive video and image annotation tool for computer vision. It is being used by our team to annotate million of objects with different properties. Many UI and UX decisions are based on feedbacks from professional data annotation team. Code released under the MIT License.

ImageAI - A python library built to empower developers to build applications and systems with self-contained Computer Vision capabilities

  •    Python

A python library built to empower developers to build applications and systems with self-contained Deep Learning and Computer Vision capabilities using simple and few lines of code. Built with simplicity in mind, ImageAI supports a list of state-of-the-art Machine Learning algorithms for image prediction, custom image prediction, object detection, video detection, video object tracking and image predictions trainings. ImageAI currently supports image prediction and training using 4 different Machine Learning algorithms trained on the ImageNet-1000 dataset. ImageAI also supports object detection, video detection and object tracking using RetinaNet, YOLOv3 and TinyYOLOv3 trained on COCO dataset. Eventually, ImageAI will provide support for a wider and more specialized aspects of Computer Vision including and not limited to image recognition in special environments and special fields.

VideoMan Library

  •    C++

Library for capturing video from cameras, 3d sensors, frame-grabbers, video files and image sequences. It can also display multiple images using OpenGL with different layouts. Easy integration with OpenCV, CUDA... Perfect for computer vision. Keywords: video capture, computer vision, machine vision, opencv, opengl, cameras, video input devices, firewire, usb, gige

labelme - Image Polygonal Annotation with Python (polygon, rectangle, line, point and image-level flag annotation)

  •    Python

Labelme is a graphical image annotation tool inspired by http://labelme.csail.mit.edu. It is written in Python and uses Qt for its graphical interface. Fig 2. VOC dataset example of instance segmentation.

video-classification-3d-cnn-pytorch - Video classification tools using 3D ResNet

  •    Python

This is a pytorch code for video (action) classification using 3D ResNet trained by this code. The 3D ResNet is trained on the Kinetics dataset, which includes 400 action classes. This code uses videos as inputs and outputs class names and predicted class scores for each 16 frames in the score mode. In the feature mode, this code outputs features of 512 dims (after global average pooling) for each 16 frames. Torch (Lua) version of this code is available here.


LiBLaB

  •    CSharp

LibLab is a C# Library, Networking, Camera, Image Processing, Audio Processing, Video Processing and Computer Vision

sloth - Sloth is a tool for labeling image and video data for computer vision research.

  •    Python

sloth is a tool for labeling image and video data for computer vision research. The documentation can be found at http://sloth.readthedocs.org/ .

SerpentAI - Game Agent Framework. Helping you create AIs / Bots to play any game you own!

  •    Jupyter

The framework features a large assortment of supporting modules that provide solutions to commonly encountered scenarios when using video games as environments as well as CLI tools to accelerate development. It provides some useful conventions but is absolutely NOT opiniated about what you put in your agents: Want to use the latest, cutting-edge deep reinforcement learning algorithm? ALLOWED. Want to use computer vision techniques, image processing and trigonometry? ALLOWED. Want to randomly press the Left or Right buttons? sigh ALLOWED. To top it all off, Serpent.AI was designed to be entirely plugin-based (for both game support and game agents) so your experiments are actually portable and distributable to your peers and random strangers on the Internet. You'll also be glad to hear that all 3 major OSes are supported: Linux, Windows & macOS.

soundnet - SoundNet: Learning Sound Representations from Unlabeled Video. NIPS 2016

  •    Lua

We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild. We leverage the natural synchronization between vision and sound to learn an acoustic representation using two-million unlabeled videos. We propose a student-teacher training procedure which transfers discriminative visual knowledge from well established visual models (e.g. ImageNet and PlacesCNN) into the sound modality using unlabeled video as a bridge. We provide pre-trained models that are trained over 2,000,000 unlabeled videos. You can download the 8 layer and 5 layer models here. We recommend the 8 layer network.

ViPER

  •    Java

The Video Processing Evaluation Resource: A toolkit for evaluating computer vision algorithms on video, and a corresponding tool for annotating video streams with spatial metadata.

QVision: Computer Vision Library for Qt

  •    C++

Computer vision and image processing library for Qt.

tracking.js - A modern approach for Computer Vision on the web

  •    Javascript

The tracking.js library brings different computer vision algorithms and techniques into the browser environment. By using modern HTML5 specifications, we enable you to do real-time color tracking, face detection and much more — all that with a lightweight core (~7 KB) and intuitive interface. You can plug tracking.js into some well supported HTML elements such as <canvas>, <video> and <img>.

CamStudio - Desktop Screen Recorder

  •    C++

CamStudio is able to record all screen and audio activity on your computer and create industry-standard AVI video files and using its built-in SWF Producer can turn those AVIs into lean, mean, bandwidth-friendly Streaming Flash videos (SWFs)

videogan - Generating Videos with Scene Dynamics. NIPS 2016.

  •    Lua

This repository contains an implementation of Generating Videos with Scene Dynamics by Carl Vondrick, Hamed Pirsiavash, Antonio Torralba, to appear at NIPS 2016. The model learns to generate tiny videos using adversarial networks. Below are some selected videos that are generated by our model. These videos are not real; they are hallucinated by a generative video model. While they are not photo-realistic, the motions are fairly reasonable for the scene category they are trained on.

T-CNN - ImageNet 2015 Object Detection from Video (VID)

  •    Python

The TCNN framework is a deep learning framework for object detection in videos. This framework was orginally designed for the ImageNet VID chellenge in ILSVRC2015. If you are using the T-CNN code in you project, please cite the following works.

OpenIMAJ - Open Intelligent Multimedia Analysis for Java

  •    Java

OpenIMAJ is an award-winning set of libraries and tools for multimedia (images, text, video, audio, etc.) content analysis and content generation. OpenIMAJ is very broad and contains everything from state-of-the-art computer vision (e.g. SIFT descriptors, salient region detection, face detection, etc.) and advanced data clustering, through to software that performs analysis on the content, layout and structure of webpages.

FILTER

  •    Javascript

This is a library for processing images/video in pure JavaScript using HTML5 features like Canvas, WebWorkers, WebGL and SVG (in progress) or analogs in Node.js. Some filters code has been adapted from open source libraries (mostly c, java and flash, plus a couple from javascript libraries), see the comments in the code for details.

4dface - Real-time 3D face tracking and reconstruction from 2D video

  •    C++

This is a demo app showing face tracking and 3D Morphable Model fitting on live webcams and videos. It builds upon the 3D face model library eos and the landmark detection and optimisation library superviseddescent. Clone with submodules: git clone --recursive git://github.com/patrikhuber/4dface.git, or, if you've already cloned it, get the submodules with git submodule update --init --recursive inside the 4dface directory.

Video Annotation and Reference System

  •    Lua

The Video Annotation and Reference System (VARS) is a software interface and database system that provides tools for describing, cataloging, retrieving, and viewing the visual, descriptive, and quantitative data associated with video.