awd-lstm-lm - LSTM and QRNN Language Model Toolkit for PyTorch

  •        42

The model can be composed of an LSTM or a Quasi-Recurrent Neural Network (QRNN) which is two or more times faster than the cuDNN LSTM in this setup while achieving equivalent or better accuracy. The codebase is now PyTorch 0.4 compatible for most use cases (a big shoutout to https://github.com/shawntan for a fairly comprehensive PR https://github.com/salesforce/awd-lstm-lm/pull/43). Mild readjustments to hyperparameters may be necessary to obtain quoted performance. If you desire exact reproducibility (or wish to run on PyTorch 0.3 or lower), we suggest using an older commit of this repository. We are still working on pointer, finetune and generate functionalities.

https://github.com/salesforce/awd-lstm-lm

Tags
Implementation
License
Platform

   




Related Projects

pytorch-qrnn - PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM

  •    Python

Updated to support multi-GPU environments via DataParallel - see the the multigpu_dataparallel.py example. This repository contains a PyTorch implementation of Salesforce Research's Quasi-Recurrent Neural Networks paper.

Seq2Seq-PyTorch - Sequence to Sequence Models with PyTorch

  •    Python

A vanilla sequence to sequence model presented in https://arxiv.org/abs/1409.3215, https://arxiv.org/abs/1406.1078 consits of using a recurrent neural network such as an LSTM (http://dl.acm.org/citation.cfm?id=1246450) or GRU (https://arxiv.org/abs/1412.3555) to encode a sequence of words or characters in a source language into a fixed length vector representation and then deocoding from that representation using another RNN in the target language. An extension of sequence to sequence models that incorporate an attention mechanism was presented in https://arxiv.org/abs/1409.0473 that uses information from the RNN hidden states in the source language at each time step in the deocder RNN. This attention mechanism significantly improves performance on tasks like machine translation. A few variants of the attention model for the task of machine translation have been presented in https://arxiv.org/abs/1508.04025.

NCRFpp - NCRF++, an Open-source Neural Sequence Labeling Toolkit

  •    Python

Sequence labeling models are quite popular in many NLP tasks, such as Named Entity Recognition (NER), part-of-speech (POS) tagging and word segmentation. State-of-the-art sequence labeling models mostly utilize the CRF structure with input word features. LSTM (or bidirectional LSTM) is a popular deep learning based feature extractor in sequence labeling task. And CNN can also be used due to faster computation. Besides, features within word are also useful to represent word, which can be captured by character LSTM or character CNN structure or human-defined neural features. NCRF++ is a PyTorch based framework with flexiable choices of input features and output structures. The design of neural sequence labeling models with NCRF++ is fully configurable through a configuration file, which does not require any code work. NCRF++ is a neural version of CRF++, which is a famous statistical CRF framework.

char-rnn - Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch

  •    Lua

This code implements multi-layer Recurrent Neural Network (RNN, LSTM, and GRU) for training/sampling from character-level language models. In other words the model takes one text file as input and trains a Recurrent Neural Network that learns to predict the next character in a sequence. The RNN can then be used to generate text character by character that will look like the original training data. The context of this code base is described in detail in my blog post. If you are new to Torch/Lua/Neural Nets, it might be helpful to know that this code is really just a slightly more fancy version of this 100-line gist that I wrote in Python/numpy. The code in this repo additionally: allows for multiple layers, uses an LSTM instead of a vanilla RNN, has more supporting code for model checkpointing, and is of course much more efficient since it uses mini-batches and can run on a GPU.

pytorch-openai-transformer-lm - A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

  •    Python

This is a PyTorch implementation of the TensorFlow code provided with OpenAI's paper "Improving Language Understanding by Generative Pre-Training" by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. This implementation comprises a script to load in the PyTorch model the weights pre-trained by the authors with the TensorFlow implementation.


torch-light - Deep-learning by using Pytorch

  •    Python

This repository includes basics and advanced examples for deep learning by using Pytorch. Basics which are basic nns like Logistic, CNN, RNN, LSTM are implemented with few lines of code, advanced examples are implemented by complex model. It is better finish Official Pytorch Tutorial before this.

theano_lstm - :microscope: Nano size Theano LSTM module

  •    Python

Implements most of the great things that came out in 2014 concerning recurrent neural networks, and some good optimizers for these types of networks. This module also contains the SGD, AdaGrad, and AdaDelta gradient descent methods that are constructed using an objective function and a set of theano variables, and returns an updates dictionary to pass to a theano function.

word-rnn-tensorflow - Multi-layer Recurrent Neural Networks (LSTM, RNN) for word-level language models in Python using TensorFlow

  •    Python

Multi-layer Recurrent Neural Networks (LSTM, RNN) for word-level language models in Python using TensorFlow. Mostly reused code from https://github.com/sherjilozair/char-rnn-tensorflow which was inspired from Andrej Karpathy's char-rnn.

clstm - A small C++ implementation of LSTM networks, focused on OCR.

  •    Jupyter

CLSTM is an implementation of the LSTM recurrent neural network model in C++, using the Eigen library for numerical computations. CLSTM is mainly in maintenance mode now. It was created at a time when there weren't a lot of good LSTM implementations around, but several good options have become available over the last year. Nevertheless, if you need a small library for text line recognition with few dependencies, CLSTM is still a good option.

paraphrase-id-tensorflow - Various models and code (Manhattan LSTM, Siamese LSTM + Matching Layer, BiMPM) for the paraphrase identification task, specifically with the Quora Question Pairs dataset

  •    Python

Various models and code for paraphrase identification implemented in Tensorflow (1.1.0). A basic Siamese LSTM baseline, loosely based on the model in Mueller, Jonas and Aditya Thyagarajan. "Siamese Recurrent Architectures for Learning Sentence Similarity." AAAI (2016).

fairseq - Facebook AI Research Sequence-to-Sequence Toolkit

  •    Lua

This is fairseq, a sequence-to-sequence learning toolkit for Torch from Facebook AI Research tailored to Neural Machine Translation (NMT). It implements the convolutional NMT models proposed in Convolutional Sequence to Sequence Learning and A Convolutional Encoder Model for Neural Machine Translation as well as a standard LSTM-based model. It features multi-GPU training on a single machine as well as fast beam search generation on both CPU and GPU. We provide pre-trained models for English to French, English to German and English to Romanian translation. Note, there is now a PyTorch version fairseq-py of this toolkit and new development efforts will focus on it.

tensorflow-lstm-regression - Sequence prediction using recurrent neural networks(LSTM) with TensorFlow

  •    Jupyter

The objective is to predict continuous values, sin and cos functions in this example, based on previous observations using the LSTM architecture. This example has been updated with a new version compatible with the tensrflow-1.1.0. This new version is using a library polyaxon that provides an API to create deep learning models and experiments based on tensorflow.

seq2seq-attn - Sequence-to-sequence model with LSTM encoder/decoders and attention

  •    Lua

UPDATE: Check-out the beta release of OpenNMT a fully supported feature-complete rewrite of seq2seq-attn. Seq2seq-attn will remain supported, but new features and optimizations will focus on the new codebase. Torch implementation of a standard sequence-to-sequence model with (optional) attention where the encoder-decoder are LSTMs. Encoder can be a bidirectional LSTM. Additionally has the option to use characters (instead of input word embeddings) by running a convolutional neural network followed by a highway network over character embeddings to use as inputs.

sketch-rnn - Multilayer LSTM and Mixture Density Network for modelling path-level SVG Vector Graphics data in TensorFlow

  •    Python

This version of sketch-rnn has been depreciated. Please see an updated version of sketch-rnn, which is a full generative model for vector drawings. Implementation multi-layer recurrent neural network (RNN, LSTM GRU) used to model and generate sketches stored in .svg vector graphic files. The methodology used is to combine Mixture Density Networks with a RNN, along with modelling dynamic end-of-stroke and end-of-content probabilities learned from a large corpus of similar .svg files, to generate drawings that is simlar to the vector training data.

char-rnn-tensorflow - Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow

  •    Python

Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow. Inspired from Andrej Karpathy's char-rnn.

practical6 - Practical 6: LSTM language models

  •    Lua

In this practical, we train an LSTM for character-level language modelling. Since this is the last week for practicals, it will be extremely short and does not require writing code, and is due by the end of the Friday's session (regardless of whether you are from the Wednesday or Friday session). See PDF for details.

LSTM-Human-Activity-Recognition - Human Activity Recognition example using TensorFlow on smartphone sensors dataset and an LSTM RNN (Deep Learning algo)

  •    Jupyter

Compared to a classical approach, using a Recurrent Neural Networks (RNN) with Long Short-Term Memory cells (LSTMs) require no or almost no feature engineering. Data can be fed directly into the neural network who acts like a black box, modeling the problem correctly. Other research on the activity recognition dataset can use a big amount of feature engineering, which is rather a signal processing approach combined with classical data science techniques. The approach here is rather very simple in terms of how much was the data preprocessed. Let's use Google's neat Deep Learning library, TensorFlow, demonstrating the usage of an LSTM, a type of Artificial Neural Network that can process sequential data / time series.

LatticeLSTM - Chinese NER using Lattice LSTM. Code for ACL 2018 paper.

  •    Python

Lattice LSTM for Chinese NER. Character based LSTM with Lattice embeddings as input. Models and results can be found at our ACL 2018 paper Chinese NER Using Lattice LSTM. It achieves 93.18% F1-value on MSRA dataset, which is the state-of-the-art result on Chinese NER task.

caffe-lstm - LSTM implementation on Caffe

  •    C++

Note that the master branch of Caffe supports LSTM now. (Jeff Donahue's implementation has been merged.) This repo is no longer maintained. Jeff's code is more modularized, whereas this code is optimized for LSTM. This code computes gradient w.r.t. recurrent weights with a single matrix computation.

nlpcaffe - natural language processing with Caffe

  •    C++

NLP-Caffe is a pull request [1] on the Caffe framework developed by Yangqing Jia and Evan Shelhamer, among other members of the BVLC lab at Berkeley and a large number of independent online contributers. This fork makes it easier for NLP users to get started without merging C++ code. The current example constructs a language model for a small subset of Google's Billion Word corpus. It uses a two-layer LSTM architecture that processes in excess of 15,000 words per second [2], and achieves a perplexity of 79. More examples for Machine Translation using the encoder-decoder model and character-level RNNs are in the works. This code will eventually be merged into the Caffe master branch. This work was funded by the Stanford NLP Group, under the guidance of Chris Manning, and with the invaluable expertise of Thang Luong.