Displaying 1 to 15 from 15 results

gensim - Topic Modelling for Humans

  •    Python

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

wordvectors - Pre-trained word vectors of 30+ languages

  •    Python

This project has two purposes. First of all, I'd like to share some of my experience in nlp tasks such as segmentation or word vectors. The other, which is more important, is that probably some people are searching for pre-trained word vector models for non-English languages. Alas! English has gained much more attention than any other languages has done. Check this to see how easily you can get a variety of pre-trained English word vectors without efforts. I think it's time to turn our eyes to a multi language version of this. Nearing the end of the work, I happened to know that there is already a similar job named polyglot. I strongly encourage you to check this great project. How embarrassing! Nevertheless, I decided to open this project. You will know that my job has its own flavor, after all.

PyTorchText - 1st Place Solution for Zhihu Machine Learning Challenge

  •    Python

This is the solution for Zhihu Machine Learning Challenge 2017. We won the champion out of 963 teams. You may need tf.contrib.keras.preprocessing.sequence.pad_sequences for data preprocessing.

magnitude - A fast, efficient universal vector embedding utility package.

  •    Python

A feature-packed Python package and vector storage file format for utilizing vector embeddings in machine learning models in a fast, efficient, and simple manner developed by Plasticity. It is primarily intended to be a simpler / faster alternative to Gensim, but can be used as a generic key-vector store for domains outside NLP. Vector space embedding models have become increasingly common in machine learning and traditionally have been popular for natural language processing applications. A fast, lightweight tool to consume these large vector space embedding models efficiently is lacking.

fasttextjs - JavaScript implementation of the FastText prediction algorithm

  •    Javascript

The goal is to provide a compatible predict and predict-prob with the C++ version of FastText for use in Node.js. FastText is a project out of Facebook Research. The primary implementation can be found at https://github.com/facebookresearch/fastText. That is the source code used to create this version.

go-fasttext - Facebook fastText database in SQLite with Go API

  •    Go

This package provides a Go API for the Facebook's fastText dataset for word embeddings, with data stored in a persistent SQLite database.


  •    Jupyter

Also, check out this link to download the final .bin model and the preprocessed dataset.

ai_law - all kinds of baseline models for long text classificaiton( text categorization)

  •    Python

Update: Joint Model for law cases prediction is released. run python HAN_train.py to train the model for predict accusation, relevant articles and term of imprisonment.

convai-bot-1337 - Skill-based Conversational Agent for NIPS Conversational Intelligence Challenge 2017

  •    Python

Skill-based Conversational Agent that took 1st place at 2017 NIPS Conversational Intelligence Challenge (http://convai.io). We still update our Conversational Agent and the latest version could be found in master branch.

fastrtext - R wrapper for fastText

  •    C++

R wrapper for fastText C++ code from Facebook. fastText is a library for efficient learning of word representations and sentence classification.

tensorflow_fasttext - Simple embedding based text classifier inspired by fastText, implemented in tensorflow

  •    Python

This project is based on the ideas in Facebook's FastText but implemented in Tensorflow. However, it is not an exact replica of fastText. Classification is done by embedding each word, taking the mean embedding over the full text and classifying that using a linear classifier. The embedding is trained with the classifier. You can also specify to use 2+ character ngrams. These ngrams get hashed then embedded in a similar manner to the orginal words. Note, ngrams make training much slower but only make marginal improvements in performance, at least in English.

pytorch-sentiment-analysis - Tutorials on getting started with PyTorch and TorchText for sentiment analysis

  •    Jupyter

This repo contains tutorials covering how to do sentiment analysis using PyTorch 0.4 and TorchText 0.2.3 using Python 3.6. The first 2 tutorials will cover getting started with the de facto approach to sentiment analysis: recurrent neural networks (RNNs). The third notebook covers the FastText model and the final covers a convolutional neural network (CNN) model.

whatthelang - Lightning Fast Language Prediction 🚀

  •    Python

Supports 176 languages . The ISO codes for the corresponding languages are as below.

sentence-classification - Sentence Classifications with Neural Networks

  •    Python

Each of the above broad sentence categories can be expanded and can be made more indepth. The way these networks and scripts are designed it should be possible expand to classify other sentence types, provided the data is provided. This was developed for applications at Metacortex and is accompanied by a guide on building practical/applied neural networks on austingwalters.com.