Displaying 1 to 20 from 20 results

gensim - Topic Modelling for Humans

  •    Python

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

wordvectors - Pre-trained word vectors of 30+ languages

  •    Python

This project has two purposes. First of all, I'd like to share some of my experience in nlp tasks such as segmentation or word vectors. The other, which is more important, is that probably some people are searching for pre-trained word vector models for non-English languages. Alas! English has gained much more attention than any other languages has done. Check this to see how easily you can get a variety of pre-trained English word vectors without efforts. I think it's time to turn our eyes to a multi language version of this. Nearing the end of the work, I happened to know that there is already a similar job named polyglot. I strongly encourage you to check this great project. How embarrassing! Nevertheless, I decided to open this project. You will know that my job has its own flavor, after all.

PyTorchText - 1st Place Solution for Zhihu Machine Learning Challenge

  •    Python

This is the solution for Zhihu Machine Learning Challenge 2017. We won the champion out of 963 teams. You may need tf.contrib.keras.preprocessing.sequence.pad_sequences for data preprocessing.

magnitude - A fast, efficient universal vector embedding utility package.

  •    Python

A feature-packed Python package and vector storage file format for utilizing vector embeddings in machine learning models in a fast, efficient, and simple manner developed by Plasticity. It is primarily intended to be a simpler / faster alternative to Gensim, but can be used as a generic key-vector store for domains outside NLP. Vector space embedding models have become increasingly common in machine learning and traditionally have been popular for natural language processing applications. A fast, lightweight tool to consume these large vector space embedding models efficiently is lacking.

BioSentVec - BioWordVec & BioSentVec: pre-trained embeddings for biomedical words and sentences

  •    Jupyter

We created biomedical word and sentence embeddings using PubMed and the clinical notes from MIMIC-III Clinical Database. Both PubMed and MIMIC-III texts were split and tokenized using NLTK. We also lowercased all the words. The statistics of the two corpora are shown below. We applied fastText to compute 200-dimensional word embeddings. We set the window size to be 20, learning rate 0.05, sampling threshold 1e-4, and negative examples 10. Both the word vectors and the model with hyperparameters are available for download below. The model file can be used to compute word vectors that are not in the dictionary (i.e. out-of-vocabulary terms). This work extends the original BioWordVec which provides fastText word embeddings trained using PubMed and MeSH. We used the same parameters as the original BioWordVec which has been thoroughly evaluated in a range of applications.

fasttextjs - JavaScript implementation of the FastText prediction algorithm

  •    Javascript

The goal is to provide a compatible predict and predict-prob with the C++ version of FastText for use in Node.js. FastText is a project out of Facebook Research. The primary implementation can be found at https://github.com/facebookresearch/fastText. That is the source code used to create this version.

go-fasttext - Facebook fastText database in SQLite with Go API

  •    Go

This package provides a Go API for the Facebook's fastText dataset for word embeddings, with data stored in a persistent SQLite database.


  •    Jupyter

Also, check out this link to download the final .bin model and the preprocessed dataset.

ai_law - all kinds of baseline models for long text classificaiton( text categorization)

  •    Python

Update: Joint Model for law cases prediction is released. run python HAN_train.py to train the model for predict accusation, relevant articles and term of imprisonment.

convai-bot-1337 - Skill-based Conversational Agent for NIPS Conversational Intelligence Challenge 2017

  •    Python

Skill-based Conversational Agent that took 1st place at 2017 NIPS Conversational Intelligence Challenge (http://convai.io). We still update our Conversational Agent and the latest version could be found in master branch.

fastrtext - R wrapper for fastText

  •    C++

R wrapper for fastText C++ code from Facebook. fastText is a library for efficient learning of word representations and sentence classification.

tensorflow_fasttext - Simple embedding based text classifier inspired by fastText, implemented in tensorflow

  •    Python

This project is based on the ideas in Facebook's FastText but implemented in Tensorflow. However, it is not an exact replica of fastText. Classification is done by embedding each word, taking the mean embedding over the full text and classifying that using a linear classifier. The embedding is trained with the classifier. You can also specify to use 2+ character ngrams. These ngrams get hashed then embedded in a similar manner to the orginal words. Note, ngrams make training much slower but only make marginal improvements in performance, at least in English.

pytorch-sentiment-analysis - Tutorials on getting started with PyTorch and TorchText for sentiment analysis

  •    Jupyter

This repo contains tutorials covering how to do sentiment analysis using PyTorch 0.4 and TorchText 0.2.3 using Python 3.6. The first 2 tutorials will cover getting started with the de facto approach to sentiment analysis: recurrent neural networks (RNNs). The third notebook covers the FastText model and the final covers a convolutional neural network (CNN) model.

sentence-classification - Sentence Classifications with Neural Networks

  •    Python

Each of the above broad sentence categories can be expanded and can be made more indepth. The way these networks and scripts are designed it should be possible expand to classify other sentence types, provided the data is provided. This was developed for applications at Metacortex and is accompanied by a guide on building practical/applied neural networks on austingwalters.com.

whatthelang - Lightning Fast Language Prediction 🚀

  •    Python

Supports 176 languages . The ISO codes for the corresponding languages are as below.

Text-Classification - PyTorch implementation of some text classification models (HAN, fastText, BiLSTM-Attention, TextCNN, Transformer) | 文本分类

  •    Python

PyTorch re-implementation of some text classificaiton models. Train the following models by editing model_name item in config files (here are some example config files). Click the link of each for details.

pure-predict - Machine learning prediction in pure Python

  •    Python

pure-predict speeds up and slims down machine learning prediction applications. It is a foundational tool for serverless inference or small batch prediction with popular machine learning frameworks like scikit-learn and fasttext. It implements the predict methods of these frameworks in pure Python. In this scenario, a container service with a large dependency footprint can be overkill for a microservice, particularly if the access patterns favor the pricing model of a serverless application. Additionally, for smaller models and single record predictions per request, the numpy and scipy functionality in the prediction methods of popular machine learning frameworks work against the application in terms of latency, underperforming pure python in some cases.

actions-suggest-related-links - A GitHub Action to suggest related or similar issues, documents, and links

  •    TypeScript

A GitHub Action to suggest related or similar issues, documents, and links. Based on the power of NLP and fastText. Create your YAML workflow file as follows.

fasttext-tuning - :chart_with_upwards_trend: Find your fasttext hyperparameters quickly and easily.

  •    Python

:chart_with_upwards_trend: Find your fasttext hyperparameters quickly and easily.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.