Displaying 1 to 20 from 30 results

awd-lstm-lm - LSTM and QRNN Language Model Toolkit for PyTorch

  •    Python

The model can be composed of an LSTM or a Quasi-Recurrent Neural Network (QRNN) which is two or more times faster than the cuDNN LSTM in this setup while achieving equivalent or better accuracy. The codebase is now PyTorch 0.4 compatible for most use cases (a big shoutout to https://github.com/shawntan for a fairly comprehensive PR https://github.com/salesforce/awd-lstm-lm/pull/43). Mild readjustments to hyperparameters may be necessary to obtain quoted performance. If you desire exact reproducibility (or wish to run on PyTorch 0.3 or lower), we suggest using an older commit of this repository. We are still working on pointer, finetune and generate functionalities.

transformers - 🤗Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX

  •    Python

🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation and more in over 100 languages. Its aim is to make cutting-edge NLP easier to use for everyone. 🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.

spago - Self-contained Machine Learning and Natural Language Processing library in Go

  •    Go

A Machine Learning library written in pure Go designed to support relevant neural architectures in Natural Language Processing. spaGO is self-contained, in that it uses its own lightweight computational graph framework for both training and inference, easy to understand from start to finish.




Haystack - Build a natural language interface for your data

  •    Python

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want to perform Question Answering or semantic document search, you can use the State-of-the-Art NLP models in Haystack to provide unique search experiences and allow your users to query in natural language. Haystack is built in a modular fashion so that you can combine the best technology from other open-source projects like Huggingface's Transformers, Elasticsearch, or Milvus.

lingvo - Lingvo

  •    Python

Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models. A list of publications using Lingvo can be found here.

spacy-transformers - 🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

  •    Python

This package provides spaCy components and architectures to use transformer models via Hugging Face's transformers in spaCy. The result is convenient access to state-of-the-art transformer architectures, such as BERT, GPT-2, XLNet, etc. This release requires spaCy v3. For the previous version of this library, see the v0.6.x branch.

pytorch-openai-transformer-lm - A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

  •    Python

This is a PyTorch implementation of the TensorFlow code provided with OpenAI's paper "Improving Language Understanding by Generative Pre-Training" by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. This implementation comprises a script to load in the PyTorch model the weights pre-trained by the authors with the TensorFlow implementation.


bluebert - BlueBERT, pre-trained on PubMed abstracts and clinical notes (MIMIC-III).

  •    Python

We uploaded the preprocessed PubMed texts that were used to pre-train the BlueBERT models. This repository provides codes and models of BlueBERT, pre-trained on PubMed abstracts and clinical notes (MIMIC-III). Please refer to our paper Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets for more details.

lingo - package lingo provides the data structures and algorithms required for natural language processing

  •    Go

package lingo provides the data structures and algorithms required for natural language processing.Specifically, it provides a POS Tagger (lingo/pos), a Dependency Parser (lingo/dep), and a basic tokenizer (lingo/lexer) for English. It also provides data structures for holding corpuses (lingo/corpus), and treebanks (lingo/treebank).

getlang - Natural language detection package in pure Go

  •    Go

getlang provides fast natural language detection in Go.

tying-wv-and-wc - Implementation for "Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling"

  •    Python

This paper tries to utilize the diversity of word meaning to train the Deep Neural Network. In the language modeling (prediction of the word sequence), we want to express the diversity of word meaning. For example, when predicting the word next to "Banana is delicious ___", the answer is "fruit", but "sweets", "food" is also ok. But ordinary one-hot vector teaching is not suitable to achieve it. Because any similar words ignored, but the exact answer word.

rnn-theano - RNN(LSTM, GRU) in Theano with mini-batch training; character-level language models in Theano

  •    Python

RNN(LSTM, GRU) in Theano with mini-batch training; character-level language models in Theano

pytorch-char-rnnlm

  •    Python

This is a character RNN-based language model in PyTorch. Code are based on examples in https://github.com/pytorch/examples/tree/master/word_language_model. It can handle any Unicode corpus. All configurations and hyper paramters are centerized in a JSON file (hps/penn.json is an example for PTB). See the example for what are specified.

zamia-speech - Open tools and data for cloudless automatic speech recognition

  •    Python

Important: Please note that these scripts form in no way a complete application ready for end-user consumption. However, if you are a developer interested in natural language processing you may find some of them useful. Contributions, patches and pull requests are very welcome. At the time of this writing, the scripts here are focused on building the English and German VoxForge models. However, there is no reason why they couldn't be used to build other language models as well, feel free to contribute support for those.

zeroth - Kaldi-based Korean ASR (한국어 음성인식) open-source project

  •    Shell

Zeroth is an open source project for Korean speech recognition implemented using the Kaldi toolkit. This project was developed as part of Atlas’s (https://www.goodatlas.com) Language AI platform, which enables enterprises to add intelligence to their B2C communications.

spacy_kenlm - :game_die: KenLM extension for spaCy 2.0.

  •    Python

This package adds kenLM support as a spaCy 2.0 extension. Train a kenLM language model first (or use the test model from test.arpa).

pyVHDLParser - Streaming based VHDL parser.

  •    Python

This is a token-stream based parser for VHDL-2008.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.