fastText - Library for fast text representation and classification.

  •        7

fastText is a library for efficient learning of word representations and sentence classification. You can find answers to frequently asked questions on our website.

https://github.com/facebookresearch/fastText

Tags
Implementation
License
Platform

   




Related Projects

text-classification-models-tf - Tensorflow implementations of Text Classification Models.

  •    Python

Tensorflow implementation of Text Classification Models. Semi-supervised text classification(Transfer learning) models are implemented at [dongjun-Lee/transfer-learning-text-tf].

cnn-text-classification-tf-chinese - CNN for Chinese Text Classification in Tensorflow

  •    Python

This code belongs to the "Implementing a CNN for Text Classification in Tensorflow" blog post. It is slightly simplified implementation of Kim's Convolutional Neural Networks for Sentence Classification paper in Tensorflow.

klassify - Bayesian Text classification service based on Redis. Built on top of Tornado and React.js

  •    Javascript

Redis based text classification service with real-time web interface. Text classification, document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories.

cnn-text-classification-tf - Convolutional Neural Network for Text Classification in Tensorflow

  •    Python

This code belongs to the "Implementing a CNN for Text Classification in Tensorflow" blog post. It is slightly simplified implementation of Kim's Convolutional Neural Networks for Sentence Classification paper in Tensorflow.


tf-rnn-attention - Tensorflow implementation of attention mechanism for text classification tasks.

  •    Python

Tensorflow implementation of attention mechanism for text classification tasks. Inspired by "Hierarchical Attention Networks for Document Classification", Zichao Yang et al. (http://www.aclweb.org/anthology/N16-1174).

limdu - Machine-learning for Node.js

  •    Javascript

Limdu is a machine-learning framework for Node.js. It supports multi-label classification, online learning, and real-time classification. Therefore, it is especially suited for natural language understanding in dialog systems and chat-bots.Limdu is in an "alpha" state - some parts are working (see this readme), but some parts are missing or not tested. Contributions are welcome.

NTextCat

  •    

NTextCat is text classification utility. Primary target is language identification. So it helps you to recognize (identify) the language of text (or binary) snippet. Pure .net application (C#).

textkit4j

  •    Java

Provides a set of tools for processing text, such as text extraction and classification. Classification implementations to be implemented include: Bayesian and Statistical (N-gram).

snips-nlu - Snips Python library to extract meaning from text

  •    Python

Snips NLU (Natural Language Understanding) is a Python library that allows to parse sentences written in natural language and extracts structured information. To find out how to use Snips NLU please refer to our documentation, it will provide you with a step-by-step guide on how to use and setup our library.

zhihu-text-classification - [2017知乎看山杯 多标签 文本分类] ye组(第六名) 解题方案

  •    Jupyter

和 creat_batch_data.py 相同,只是对 content 部分进行句子划分。用于分层模型。 划分句子长度: wd_title_len = 30, wd_sent_len = 30, wd_doc_len = 10.(即content划分为10个句子,每个句子长度为30个词) ch_title_len = 52, ch_sent_len = 52, ch_doc_len = 10. 不划分句子: wd_title_len = 30, wd_content_len = 150. ch_title_len = 52, ch_content_len = 300.

text-analytics-with-python - Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer

  •    Python

Derive useful insights from your data using Python. Learn the techniques related to natural language processing and text analytics, and gain the skills to know which technique is best suited to solve a particular problem. A structured and comprehensive approach is followed in this book so that readers with little or no experience do not find themselves overwhelmed. You will start with the basics of natural language and Python and move on to advanced analytical and machine learning concepts. You will look at each technique and algorithm with both a bird's eye view to understand how it can be used as well as with a microscopic view to understand the mathematical concepts and to implement them to solve your own problems.

Word Vector Tool

  •    Java

The Word Vector Tool is a simple but flexible Java library to create word vector representations of text documents. Word vectors can be used for various text processing tasks, as text classification, text clustering or information retrieval.

mahout - Mirror of Apache Mahout

  •    Java

Mahout's goal is to build scalable machine learning libraries. With scalable we mean: Scalable to reasonably large data sets. Our core algorithms for clustering, classification and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm. However we do not restrict contributions to Hadoop based implementations: Contributions that run on a single node or on a non-Hadoop cluster are welcome as well. The core libraries are highly optimized to allow for good performance also for non-distributed algorithms. Scalable to support your business case. Mahout is distributed under a commercially friendly Apache Software license. Scalable community. The goal of Mahout is to build a vibrant, responsive, diverse community to facilitate discussions not only on the project itself but also on potential use cases. Come to the mailing lists to find out more. Currently Mahout supports mainly four use cases: Recommendation mining takes users' behavior and from that tries to find items users might like. Clustering takes e.g. text documents and groups them into groups of topically related documents. Classification learns from existing categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. Frequent itemset mining takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.

MMLSpark - Microsoft Machine Learning for Apache Spark

  •    Scala

MMLSpark provides a number of deep learning and data science tools for Apache Spark, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK) and OpenCV, enabling you to quickly create powerful, highly-scalable predictive and analytical models for large image and text datasets.MMLSpark requires Scala 2.11, Spark 2.1+, and either Python 2.7 or Python 3.5+. See the API documentation for Scala and for PySpark.

Sequence-Semantic-Embedding - Tools and recipes to train deep learning models and build services for NLP tasks such as text classification, semantic search ranking and recall fetching, cross-lingual information retrieval, and question answering etc

  •    Python

SSE(Sequence Semantic Embedding) is an encoder framework toolkit for natural language processing related tasks. It's implemented in TensorFlow by leveraging TF's convenient deep learning blocks like DNN/CNN/LSTM etc. Depending on each specific task, similar semantic meanings can have different definitions. For example, in the category classification task, similar semantic meanings means that for each correct pair of (listing-title, category), the SSE of listing-title is close to the SSE of corresponding category. While in the information retrieval task, similar semantic meaning means for each relevant pair of (query, document), the SSE of query is close to the SSE of relevant document. While in the question answering task, the SSE of question is close to the SSE of correct answers.

Crepe - Character-level Convolutional Networks for Text Classification

  •    Lua

Note: An early version of this work entitled “Text Understanding from Scratch” was posted in Feb 2015 as arXiv:1502.01710. The present paper above has considerably more experimental results and a rewritten introduction. Note: See also the example implementation in NVIDIA DIGITS. Using CuDNN is 17 times faster than the code here.

sent-conv-torch - Text classification using a convolutional neural network.

  •    Lua

This code implements Kim (2014) sentence convolution code in Torch with GPUs. It replicates the results on existing datasets, and allows training of models on arbitrary other text datasets. Results are timestamped and saved to the results/ directory.

food-101-keras - Food Classification with Deep Learning in Keras / Tensorflow

  •    Jupyter

If you are reading this on GitHub, the demo looks like this. Please follow the link below to view the live demo on my blog. Convolutional Neural Networks (CNN), a technique within the broader Deep Learning field, have been a revolutionary force in Computer Vision applications, especially in the past half-decade or so. One main use-case is that of image classification, e.g. determining whether a picture is that of a dog or cat.