show-attend-and-tell - TensorFlow Implementation of "Show, Attend and Tell"

  •        44

Update (December 2, 2016) TensorFlow implementation of Show, Attend and Tell: Neural Image Caption Generation with Visual Attention which introduces an attention based image caption generator. The model changes its attention to the relevant part of the image while it generates each word.First, clone this repo and pycocoevalcap in same directory.

https://github.com/yunjey/show-attend-and-tell

Tags
Implementation
License
Platform

   




Related Projects

neuraltalk2 - Efficient Image Captioning code in Torch, runs on GPU

  •    Jupyter

Update (September 22, 2016): The Google Brain team has released the image captioning model of Vinyals et al. (2015). The core model is very similar to NeuralTalk2 (a CNN followed by RNN), but the Google release should work significantly better as a result of better CNN, some tricks, and more careful engineering. Find it under im2txt repo in tensorflow. I'll leave this code base up for educational purposes and as a Torch implementation. Recurrent Neural Network captions your images. Now much faster and better than the original NeuralTalk. Compared to the original NeuralTalk this implementation is batched, uses Torch, runs on a GPU, and supports CNN finetuning. All of these together result in quite a large increase in training speed for the Language Model (~100x), but overall not as much because we also have to forward a VGGNet. However, overall very good models can be trained in 2-3 days, and they show a much better performance.

sketch-code - Keras model to generate HTML code from hand-drawn website mockups

  •    Python

SketchCode is a deep learning model that takes hand-drawn web mockups and converts them into working HTML code. It uses an image captioning architecture to generate its HTML markup from hand-drawn website wireframes. This project builds on the synthetically generated dataset and model architecture from pix2code by Tony Beltramelli and the Design Mockups project from Emil Wallner.

self-critical

  •    Python

This repository includes the unofficial implementation Self-critical Sequence Training for Image Captioning and Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. (Skip if you are using bottom-up feature): If you want to use resnet to extract image features, you need to download pretrained resnet model for both training and evaluation. The models can be downloaded from here, and should be placed in data/imagenet_weights.

neural-image-assessment - Implementation of NIMA: Neural Image Assessment in Keras

  •    Python

Implementation of NIMA: Neural Image Assessment in Keras + Tensorflow with weights for MobileNet model trained on AVA dataset. NIMA assigns a Mean + Standard Deviation score to images, and can be used as a tool to automatically inspect quality of images or as a loss function to further improve the quality of generated images.

Tello - 🐣 A simple and delightful way to track and manage TV shows.

  •    Javascript

I created Tello because I was sick of hunting for TV shows. I wanted a tool that would show me which of my favourite shows had new episodes. There are a lot of things Tello doesn't do. It doesn't tell you how to find the TV show, nor whether it's available on Netflix or Hulu. It doesn't recommend similar shows you may enjoy. It doesn't tell you what your friends are watching, or offer social integrations so you can discuss what you're watching.


tf-image-segmentation - Image Segmentation framework based on Tensorflow and TF-Slim library

  •    Python

So far, the framework contains an implementation of the FCN models (training and evaluation) in Tensorflow and TF-Slim library with training routine, reported accuracy, trained models for PASCAL VOC 2012 dataset. To train these models on your data, convert your dataset to tfrecords and follow the instructions below. The end goal is to provide utilities to convert other datasets, report accuracies on them and provide models.

KShowmail

  •    C++

New Maintainer wanted! I'm looking for a new maintainer for this project because I'm not able to attend it anymore. If you want to help please write to kuddel-fl. KShowmail is a POP3 mail checker for the KDE with these features: show number, size and more information about mails on pop3 servers in a list view, show the mail headers or complete mails, delete unwanted mail from server by configurable filters.

neuralmonkey - An open-source tool for sequence learning in NLP built on TensorFlow.

  •    Python

The Neural Monkey package provides a higher level abstraction for sequential neural network models, most prominently in Natural Language Processing (NLP). It is built on TensorFlow. It can be used for fast prototyping of sequential models in NLP which can be used e.g. for neural machine translation or sentence classification. The higher-level API brings together a collection of standard building blocks (RNN encoder and decoder, multi-layer perceptron) and a simple way of adding new building blocks implemented directly in TensorFlow.

Mask_RCNN - Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

  •    Python

This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone. The code is documented and designed to be easy to extend. If you use it in your research, please consider citing this repository (bibtex below). If you work on 3D vision, you might find our recently released Matterport3D dataset useful as well. This dataset was created from 3D-reconstructed spaces captured by our customers who agreed to make them publicly available for academic use. You can see more examples here.

TensorBox - Object detection in TensorFlow. Brought to you by Kuprel Industries.

  •    Python

TensorBox is a project for training neural networks to detect objects in images. Training requires a json file (e.g. here) containing a list of images and the bounding boxes in each image. The basic model implements the simple and robust GoogLeNet-OverFeat algorithm with attention. Note that running on your own dataset should only require modifying the hypes/overfeat_rezoom.json file.

transformer - A TensorFlow Implementation of the Transformer: Attention Is All You Need

  •    Python

I tried to implement the idea in Attention Is All You Need. They authors claimed that their model, the Transformer, outperformed the state-of-the-art one in machine translation with only attention, no CNNs, no RNNs. How cool it is! At the end of the paper, they promise they will make their code available soon, but apparently it is not so yet. I have two goals with this project. One is I wanted to have a full understanding of the paper. Often it's hard for me to have a good grasp before writing some code for it. Another is to share my code with people who are interested in this model before the official code is unveiled. I got a BLEU score of 17.14. (Recollect I trained with a small dataset, limited vocabulary) Some of the evaluation results are as follows. Details are available in the results folder.

tf-rnn-attention - Tensorflow implementation of attention mechanism for text classification tasks.

  •    Python

Tensorflow implementation of attention mechanism for text classification tasks. Inspired by "Hierarchical Attention Networks for Document Classification", Zichao Yang et al. (http://www.aclweb.org/anthology/N16-1174).

seq2seq - A general-purpose encoder-decoder framework for Tensorflow

  •    Python

A general-purpose encoder-decoder framework for Tensorflow that can be used for Machine Translation, Text Summarization, Conversational Modeling, Image Captioning, and more.The official code used for the Massive Exploration of Neural Machine Translation Architectures paper.

tensorflow-image-detection - A generic image detection program that uses Google's Machine Learning library, Tensorflow and a pre-trained Deep Learning Convolutional Neural Network model called Inception

  •    Python

A generic image detection program that uses Google's Machine Learning library, Tensorflow and a pre-trained Deep Learning Convolutional Neural Network model called Inception. This model has been pre-trained for the ImageNet Large Visual Recognition Challenge using the data from 2012, and it can differentiate between 1,000 different classes, like Dalmatian, dishwasher etc. The program applies Transfer Learning to this existing model and re-trains it to classify a new set of images.

ImageAI - A python library built to empower developers to build applications and systems with self-contained Computer Vision capabilities

  •    Python

A python library built to empower developers to build applications and systems with self-contained Deep Learning and Computer Vision capabilities using simple and few lines of code. Built with simplicity in mind, ImageAI supports a list of state-of-the-art Machine Learning algorithms for image prediction, custom image prediction, object detection, video detection, video object tracking and image predictions trainings. ImageAI currently supports image prediction and training using 4 different Machine Learning algorithms trained on the ImageNet-1000 dataset. ImageAI also supports object detection, video detection and object tracking using RetinaNet, YOLOv3 and TinyYOLOv3 trained on COCO dataset. Eventually, ImageAI will provide support for a wider and more specialized aspects of Computer Vision including and not limited to image recognition in special environments and special fields.

ImSter - Image Steganographer

  •    Java

ImSter is a tool that lets you hide and view encrypted text inside images securely. Text is password encrypted using 256-bit AES and encoded into the pixels of the image themselves rather than any metadata. It is impossible for anyone to tell by eye that there is hidden content within an image.

DensePose - A real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body

  •    Jupyter

Dense human pose estimation aims at mapping all human pixels of an RGB image to the 3D surface of the human body. DensePose-RCNN is implemented in the Detectron framework and is powered by Caffe2. In this repository, we provide the code to train and evaluate DensePose-RCNN. We also provide notebooks to visualize the collected DensePose-COCO dataset and show the correspondences to the SMPL model.

moa - An image download extension of the image view written in Swift for iOS, tvOS and macOS.

  •    Swift

Moa is an image download library written in Swift. It allows to download and show an image in an image view by setting its moa.url property. 'Hunting Moa' drawing by Joseph Smit (1836-1929). File source: Wikimedia Commons.

Image viewer cum editor

  •    

This is a project on image viewing and editing. The project have following features VIEWER: Album Password security for albums Inbuilt Browser Mailing system Basic image display options like slide show etc Editor: Various types of filter Image Compression. Stenography.