ava_downloader - :arrow_double_down: Script to download AVA database from the website (A Large-Scale Database for Aesthetic Visual Analysis)

  •        33

The script is deprecated (your IP would be blocked by dpchallenge.com), please check the links below to download. 📝 The entire dataset has been split into 64 7z files. Download all the zip files, unzip the first file and it should work. About 32GB and 255,500 picture files.




Related Projects

neural-image-assessment - Implementation of NIMA: Neural Image Assessment in Keras

  •    Python

Implementation of NIMA: Neural Image Assessment in Keras + Tensorflow with weights for MobileNet model trained on AVA dataset. NIMA assigns a Mean + Standard Deviation score to images, and can be used as a tool to automatically inspect quality of images or as a loss function to further improve the quality of generated images.

ava - Futuristic JavaScript test runner

  •    Javascript

Even though JavaScript is single-threaded, IO in Node.js can happen in parallel due to its async nature. AVA takes advantage of this and runs your tests concurrently, which is especially beneficial for IO heavy tests. In addition, test files are run in parallel as separate processes, giving you even better performance and an isolated environment for each test file. Switching from Mocha to AVA in Pageres brought the test time down from 31 to 11 seconds. Having tests run concurrently forces you to write atomic tests, meaning tests don't depend on global state or the state of other tests, which is a great thing!

ImageAI - A python library built to empower developers to build applications and systems with self-contained Computer Vision capabilities

  •    Python

A python library built to empower developers to build applications and systems with self-contained Deep Learning and Computer Vision capabilities using simple and few lines of code. Built with simplicity in mind, ImageAI supports a list of state-of-the-art Machine Learning algorithms for image prediction, custom image prediction, object detection, video detection, video object tracking and image predictions trainings. ImageAI currently supports image prediction and training using 4 different Machine Learning algorithms trained on the ImageNet-1000 dataset. ImageAI also supports object detection, video detection and object tracking using RetinaNet, YOLOv3 and TinyYOLOv3 trained on COCO dataset. Eventually, ImageAI will provide support for a wider and more specialized aspects of Computer Vision including and not limited to image recognition in special environments and special fields.

sod - An Embedded Computer Vision & Machine Learning Library (CPU Optimized & IoT Capable)

  •    C

SOD is an embedded, modern cross-platform computer vision and machine learning software library that expose a set of APIs for deep-learning, advanced media analysis & processing including real-time, multi-class object detection and model training on embedded systems with limited computational resource and IoT devices. SOD was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in open source as well commercial products.

CatPapers - Cool vision, learning, and graphics papers on Cats!

  •    HTML

As reported by Cisco, 90% of net traffic will be visual, and indeed, most of the visual data are cat photos and videos. Thus, understanding, modeling and synthesizing our feline friends becomes a more and more important research problem these days, especially for our cat lovers. Cat Paper Collection is an academic paper collection that includes computer graphics, computer vision, machine learning and human-computer interaction papers that produce experimental results related to cats. If you want to add/remove a paper, please send an email to Jun-Yan Zhu (junyanz at berkeley dot edu).

AdaptSegNet - Learning to Adapt Structured Output Space for Semantic Segmentation, CVPR 2018 (spotlight)

  •    Python

Pytorch implementation of our method for adapting semantic segmentation from the synthetic dataset (source domain) to the real dataset (target domain). Based on this implementation, our result is ranked 3rd in the VisDA Challenge. Learning to Adapt Structured Output Space for Semantic Segmentation Yi-Hsuan Tsai*, Wei-Chih Hung*, Samuel Schulter, Kihyuk Sohn, Ming-Hsuan Yang and Manmohan Chandraker IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 (spotlight) (* indicates equal contribution).

OpenCV - Open Source Computer Vision

  •    C++

OpenCV (Open Source Computer Vision) is a library of programming functions for real time computer vision. The library has more than 500 optimized algorithms. It is used to interactive art, to mine inspection, stitching maps on the web on through advanced robotics.

OpenFace - OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation

  •    C++

Over the past few years, there has been an increased interest in automatic facial behavior analysis and understanding. We present OpenFace – a tool intended for computer vision and machine learning researchers, affective computing community and people interested in building interactive applications based on facial behavior analysis. OpenFace is the first toolkit capable of facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation with available source code for both running and training the models. The computer vision algorithms which represent the core of OpenFace demonstrate state-of-the-art results in all of the above mentioned tasks. Furthermore, our tool is capable of real-time performance and is able to run from a simple webcam without any specialist hardware. OpenFace is an implementation of a number of research papers from the Multicomp group, Language Technologies Institute at the Carnegie Mellon University and Rainbow Group, Computer Laboratory, University of Cambridge. The founder of the project and main developer is Tadas Baltrušaitis.

Accord.NET - Machine learning, Computer vision, Statistics and general scientific computing for .NET

  •    CSharp

The Accord.NET project provides machine learning, statistics, artificial intelligence, computer vision and image processing methods to .NET. It can be used on Microsoft Windows, Xamarin, Unity3D, Windows Store applications, Linux or mobile.

PyVision Computer Vision Toolkit

  •    Python

PyVision is a object-oriented Computer Vision Toolkit for researchers that contains vision and machine learning algorithms and algorithm analysis and easily interfaces with scipy/numpy, PIL, opencv and other computer and machine learning libraries.

video-classification-3d-cnn-pytorch - Video classification tools using 3D ResNet

  •    Python

This is a pytorch code for video (action) classification using 3D ResNet trained by this code. The 3D ResNet is trained on the Kinetics dataset, which includes 400 action classes. This code uses videos as inputs and outputs class names and predicted class scores for each 16 frames in the score mode. In the feature mode, this code outputs features of 512 dims (after global average pooling) for each 16 frames. Torch (Lua) version of this code is available here.

AlphaPose - Multi-Person Pose Estimation System

  •    Jupyter

Alpha Pose is an accurate multi-person pose estimator, which is the first open-source system that achieves 70+ mAP (72.3 mAP) on COCO dataset and 80+ mAP (82.1 mAP) on MPII dataset. To match poses that correspond to the same person across frames, we also provide an efficient online pose tracker called Pose Flow. It is the first open-source online pose tracker that achieves both 60+ mAP (66.5 mAP) and 50+ MOTA (58.3 MOTA) on PoseTrack Challenge dataset. Note: Please read PoseFlow/README.md for details.

2D-and-3D-face-alignment - This repository implements a demo of the networks described in "How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)" paper

  •    Lua

This repository implements a demo of the networks described in "How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)" paper. Please visit our webpage or read bellow for instructions on how to run the code and access the dataset. Note: If you are interested in a binarized version, capable of running on devices with limited resources please also check https://github.com/1adrianb/binary-face-alignment for a demo.

fashion-mnist - A MNIST-like fashion product database. Benchmark :point_right:

  •    Python

Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.

soundnet - SoundNet: Learning Sound Representations from Unlabeled Video. NIPS 2016

  •    Lua

We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild. We leverage the natural synchronization between vision and sound to learn an acoustic representation using two-million unlabeled videos. We propose a student-teacher training procedure which transfers discriminative visual knowledge from well established visual models (e.g. ImageNet and PlacesCNN) into the sound modality using unlabeled video as a bridge. We provide pre-trained models that are trained over 2,000,000 unlabeled videos. You can download the 8 layer and 5 layer models here. We recommend the 8 layer network.

piwise - Pixel-wise segmentation on VOC2012 dataset using pytorch.

  •    Python

Pixel-wise segmentation on the VOC2012 dataset using pytorch. For a more complete implementation of segmentation networks checkout semseg.

cvat - Computer Vision Annotation Tool (CVAT) is a web-based tool which helps to annotate video and images for Computer Vision algorithms

  •    Javascript

CVAT is completely re-designed and re-implemented version of Video Annotation Tool from Irvine, California tool. It is free, online, interactive video and image annotation tool for computer vision. It is being used by our team to annotate million of objects with different properties. Many UI and UX decisions are based on feedbacks from professional data annotation team. Code released under the MIT License.

CoreAR - AR(Augmented reality) framework for iOS, based on a visual code like ARToolKit

  •    C

CoreAR.framework is open source AR framework. You can make an AR application using visual code like ARToolKit using this framework. CoreAR.framework does not depend on the other computer vision library like OpenCV. Considered portability, this framework is written only C or C++. The pixel array of an image is passed to CoreAR.framework and then visual code's identification number, rotation and translation matrix are obtained from the image including a visual code. Image processing speed of this framework is about 15 fps on iPhone4. Take notice that CoreAR.framework depends on Quartz Help Library and Real time image processing framework for iOS. You have to download these libraries and put on them at the path where CoreAR.framework has been installed.

lip-reading-deeplearning - :unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures

  •    Python

The input pipeline must be prepared by the users. This code is aimed to provide the implementation for Coupled 3D Convolutional Neural Networks for audio-visual matching. Lip-reading can be a specific application for this work. Audio-visual recognition (AVR) has been considered as a solution for speech recognition tasks when the audio is corrupted, as well as a visual recognition method used for speaker verification in multi-speaker scenarios. The approach of AVR systems is to leverage the extracted information from one modality to improve the recognition ability of the other modality by complementing the missing information.

OpenIMAJ - Open Intelligent Multimedia Analysis for Java

  •    Java

OpenIMAJ is an award-winning set of libraries and tools for multimedia (images, text, video, audio, etc.) content analysis and content generation. OpenIMAJ is very broad and contains everything from state-of-the-art computer vision (e.g. SIFT descriptors, salient region detection, face detection, etc.) and advanced data clustering, through to software that performs analysis on the content, layout and structure of webpages.