efficient_softmax - BlackOut and Adaptive Softmax for language models by Chainer

  •        3

Implementations of Blackout and Adaptive Softmax for efficiently calculating word distribution for language modeling of very large vocabularies. LSTM language models are derived from rnnlm_chainer.




Related Projects

adaptive-softmax - Implements an efficient softmax approximation as described in the paper "Efficient softmax approximation for GPUs" (http://arxiv

  •    Lua

The adaptive-softmax project is a Torch implementation of the efficient softmax approximation for graphical processing units (GPU), described in the paper "Efficient softmax approximation for GPUs" (http://arxiv.org/abs/1609.04309). This method is useful for training language models with large vocabularies. We provide a script to train large recurrent neural network language models, in order to reproduce the results of the paper.

faster-rnnlm - Faster Recurrent Neural Network Language Modeling Toolkit with Noise Contrastive Estimation and Hierarchical Softmax

  •    C++

In a nutshell, the goal of this project is to create an rnnlm implementation that can be trained on huge datasets (several billions of words) and very large vocabularies (several hundred thousands) and used in real-world ASR and MT problems. Besides, to achieve better results this implementation supports such praised setups as ReLU+DiagonalInitialization [1], GRU [2], NCE [3], and RMSProp [4]. How fast is it? Well, on One Billion Word Benchmark [8] and 3.3GHz CPU the program with standard parameters (sigmoid hidden layer of size 256 and hierarchical softmax) processes more then 250k words per second in 8 threads, i.e. 15 millions of words per minute. As a result an epoch takes less than one hour. Check Experiments section for more numbers and figures.

LargeMargin_Softmax_Loss - Implementation for <Large-Margin Softmax Loss for Convolutional Neural Networks> in ICML'16

  •    C++

We introduce a large-margin softmax (L-Softmax) loss for convolutional neural networks. L-Softmax loss can greatly improve the generalization ability of CNNs, so it is very suitable for general classification, feature embedding and biometrics (e.g. face) verification. We give the 2D feature visualization on MNIST to illustrate our L-Softmax loss. The paper is published in ICML 2016 and also available at arXiv.

keras-quora-question-pairs - A Keras model that addresses the Quora Question Pairs dyadic prediction task

  •    Jupyter

A Keras model that addresses the Quora Question Pairs [1] dyadic prediction task. The model architecture is based on the Stanford Natural Language Inference [2] benchmark model developed by Stephen Merity [3], specifically the version using a simple summation of GloVe word embeddings [4] to represent each question in the pair. A difference between this and the Merity SNLI benchmark is that our final layer is Dense with sigmoid activation, as opposed to softmax. Another key difference is that we are using the max operator as opposed to sum to combine word embeddings into a question representation. We use binary cross-entropy as a loss function and Adam for optimization.

sphereface - Implementation for <SphereFace: Deep Hypersphere Embedding for Face Recognition> in CVPR'17

  •    Jupyter

SphereFace is released under the MIT License (refer to the LICENSE file for details). 2018.8.14: We recommand an interesting ECCV 2018 paper that comprehensively evaluates SphereFace (A-Softmax) on current widely used face datasets and their proposed noise-controlled IMDb-Face dataset. Interested users can try to train SphereFace on their IMDb-Face dataset. Take a look here.

grt - gesture recognition toolkit

  •    C++

The Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, C++ machine learning library designed for real-time gesture recognition. Classification: Adaboost, Decision Tree, Dynamic Time Warping, Gaussian Mixture Models, Hidden Markov Models, k-nearest neighbor, Naive Bayes, Random Forests, Support Vector Machine, Softmax, and more...

char-rnn - Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch

  •    Lua

This code implements multi-layer Recurrent Neural Network (RNN, LSTM, and GRU) for training/sampling from character-level language models. In other words the model takes one text file as input and trains a Recurrent Neural Network that learns to predict the next character in a sequence. The RNN can then be used to generate text character by character that will look like the original training data. The context of this code base is described in detail in my blog post. If you are new to Torch/Lua/Neural Nets, it might be helpful to know that this code is really just a slightly more fancy version of this 100-line gist that I wrote in Python/numpy. The code in this repo additionally: allows for multiple layers, uses an LSTM instead of a vanilla RNN, has more supporting code for model checkpointing, and is of course much more efficient since it uses mini-batches and can run on a GPU.

ConvNetJS - Javascript implementation of Neural networks

  •    Javascript

ConvNetJS is a Javascript implementation of Neural networks, It currently supports Common Neural Network modules, Classification (SVM/Softmax) and Regression (L2) cost functions, A MagicNet class for fully automatic neural network learning (automatic hyperparameter search and cross-validatations), Ability to specify and train Convolutional Networks that process images, An experimental Reinforcement Learning module, based on Deep Q Learning.

recurrentjs - Deep Recurrent Neural Networks and LSTMs in Javascript

  •    HTML

You'll notice that the Softmax and so on isn't folded very neatly into the library yet and you have to understand backpropagation. I'll fix this soon. This code works fine, but it's a bit rough around the edges - you have to understand Neural Nets well if you want to use it and it isn't beautifully modularized. I thought I would still make the code available now and work on polishing it further later, since I hope that even in this state it can be useful to others who may want to browse around and get their feet wet with training these models or learning about them.

convnet-benchmarks - Easy benchmarking of all publicly accessible implementations of convnets

  •    Python

Easy benchmarking of all public open-source implementations of convnets. A summary is provided in the section below. I pick some popular imagenet models, and I clock the time for a full forward + backward pass. I average my times over 10 runs. I ignored dropout and softmax layers.

semi-supervised-pytorch - Implementations of different VAE-based semi-supervised and generative models in PyTorch

  •    Python

A PyTorch-based package containing useful models for modern deep semi-supervised learning and deep generative models. Want to jump right into it? Look into the notebooks. 2018.04.17 - The Gumbel softmax notebook has been added to show how you can use discrete latent variables in VAEs. 2018.02.28 - The β-VAE notebook was added to show how VAEs can learn disentangled representations.

Person_reID_baseline_pytorch - Pytorch implement of Person re-identification baseline

  •    Python

Baseline Code (with bottleneck) for Person-reID (pytorch). It is consistent with the new baseline result in Beyond Part Models: Person Retrieval with Refined Part Pooling and Camera Style Adaptation for Person Re-identification. We arrived Rank@1=88.24%, mAP=70.68% only with softmax loss.

AMSoftmax - A simple yet effective loss function for face verification.

  •    Matlab

The paper is available as a technical report at arXiv. In this work, we design a new loss function which merges the merits of both NormFace and SphereFace. It is much easier to understand and train, and outperforms the previous state-of-the-art loss function (SphereFace) by 2-5% on MegaFace.

Seq2Seq-PyTorch - Sequence to Sequence Models with PyTorch

  •    Python

A vanilla sequence to sequence model presented in https://arxiv.org/abs/1409.3215, https://arxiv.org/abs/1406.1078 consits of using a recurrent neural network such as an LSTM (http://dl.acm.org/citation.cfm?id=1246450) or GRU (https://arxiv.org/abs/1412.3555) to encode a sequence of words or characters in a source language into a fixed length vector representation and then deocoding from that representation using another RNN in the target language. An extension of sequence to sequence models that incorporate an attention mechanism was presented in https://arxiv.org/abs/1409.0473 that uses information from the RNN hidden states in the source language at each time step in the deocder RNN. This attention mechanism significantly improves performance on tasks like machine translation. A few variants of the attention model for the task of machine translation have been presented in https://arxiv.org/abs/1508.04025.

neuraltalk2 - Efficient Image Captioning code in Torch, runs on GPU

  •    Jupyter

Update (September 22, 2016): The Google Brain team has released the image captioning model of Vinyals et al. (2015). The core model is very similar to NeuralTalk2 (a CNN followed by RNN), but the Google release should work significantly better as a result of better CNN, some tricks, and more careful engineering. Find it under im2txt repo in tensorflow. I'll leave this code base up for educational purposes and as a Torch implementation. Recurrent Neural Network captions your images. Now much faster and better than the original NeuralTalk. Compared to the original NeuralTalk this implementation is batched, uses Torch, runs on a GPU, and supports CNN finetuning. All of these together result in quite a large increase in training speed for the Language Model (~100x), but overall not as much because we also have to forward a VGGNet. However, overall very good models can be trained in 2-3 days, and they show a much better performance.

torch-rnn - Efficient, reusable RNNs and LSTMs for torch

  •    Lua

torch-rnn provides high-performance, reusable RNN and LSTM modules for torch7, and uses these modules for character-level language modeling similar to char-rnn. You can find documentation for the RNN and LSTM modules here; they have no dependencies other than torch and nn, so they should be easy to integrate into existing projects.

rnn-tutorial-rnnlm - Recurrent Neural Network Tutorial, Part 2 - Implementing a RNN in Python and Theano

  •    Jupyter

To start a public notebook server that is accessible over the network you can follow the official instructions.

rwa - Machine Learning on Sequential Data Using a Recurrent Weighted Average

  •    Python

This repository holds the code to a new kind of RNN model for processing sequential data. The model computes a recurrent weighted average (RWA) over every previous processing step. With this approach, the model can form direct connections anywhere along a sequence. This stands in contrast to traditional RNN architectures that only use the previous processing step. A detailed description of the RWA model has been published in a manuscript at https://arxiv.org/pdf/1703.01253.pdf. Because the RWA can be computed as a running average, it does not need to be completely recomputed with each processing step. The numerator and denominator can be saved from the previous step. Consequently, the model scales like that of other RNN models such as the LSTM model.

word-rnn-tensorflow - Multi-layer Recurrent Neural Networks (LSTM, RNN) for word-level language models in Python using TensorFlow

  •    Python

Multi-layer Recurrent Neural Networks (LSTM, RNN) for word-level language models in Python using TensorFlow. Mostly reused code from https://github.com/sherjilozair/char-rnn-tensorflow which was inspired from Andrej Karpathy's char-rnn.