prose - :book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction

  •        44

prose is Go library for text (primarily English at the moment) processing that supports tokenization, part-of-speech tagging, named-entity extraction, and more. The library's functionality is split into subpackages designed for modular use.See the GoDoc documentation for more information.



Related Projects

TextTeaser - Automatic Summarization Algorithm

TextTeaser is an automatic summarization algorithm that combines the power of natural language processing and machine learning to produce good results. It can provide provide a gist of an article, Better previews in news readers.

PHP classes for NLP

A set of classes for Natural Language Processing in PHP for: 1. Part of speech Tagging - Brill, n-gram, HMM 2. Princeton Wordnet querying and access 3. Document summarization 4. Document classification - EM, Bayes 5. Stemming - Porter, Lancaster

nlpnet - A neural network architecture for NLP tasks, inspired in the SENNA system

Gitter is chat room for developers. nlpnet is a Python library for Natural Language Processing tasks based on neural networks. Currently, it performs part-of-speech tagging, semantic role labeling and dependency parsing. Most of the architecture is language independent, but some functions were specially tailored for working with Portuguese. This system was inspired by SENNA.

pytextrank - Python implementation of TextRank for text document NLP parsing and summarization

Python implementation of TextRank, based on the Mihalcea 2004 paper. The results produced by this implementation are intended more for use as feature vectors in machine learning, not as academic paper summaries.

gensim - Topic Modelling for Humans

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

pynlpl - PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotatation). The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.

The OpenNLP Grok Library

Grok is a library of natural language processing components, including support for parsing with categorial grammars and various preprocessing tasks such as part-of-speech tagging, sentence detection, and tokenization.

nlp-with-ruby - Practical Natural Language Processing done in Ruby.

This curated list comprises awesome resources, libraries, information sources about computational processing of texts in human languages with the Ruby programming language. That field is often referred to as NLP, Computational Linguistics, HLT (Human Language Technology) and can be brought in conjunction with Artificial Intelligence, Machine Learning, Information Retrieval, Text Mining, Knowledge Extraction and other related disciplines. This list comes from our day to day work on Language Models and NLP Tools. Read why this list is awesome. Our FAQ describes the important decisions and useful answers you may be interested in.

PyTorch-NLP - Supporting Rapid Prototyping with a Toolkit (incl. Datasets and Neural Network Layers)

PyTorch-NLP, or torchnlp for short, is a library of neural network layers, text processing modules and datasets designed to accelerate Natural Language Processing (NLP) research. Join our community, add datasets and neural network layers! Chat with us on Gitter and join the Google Group, we're eager to collaborate with you.

ark-tweet-nlp - CMU ARK Twitter Part-of-Speech Tagger

CMU ARK Twitter Part-of-Speech Tagger

ruby-nlp - A collection of links to Ruby Natural Language Processing (NLP) libraries, tools and software

A collection of Natural Language Processing (NLP) Ruby libraries, tools and software. Suggestions and contributions are welcome. Client libraries to various 3rd party NLP API services.

deepnl - Deep Learning for Natural Language Processing

deepnl is a Python library for Natural Language Processing tasks based on a Deep Learning neural network architecture. The library currently provides tools for performing part-of-speech tagging, Named Entity tagging and Semantic Role Labeling.

budou - Budou is an automatic organizer tool for beautiful line breaking in CJK (Chinese, Japanese, and Korean)

English uses spacing and hyphenation as cues to allow for beautiful and legible line breaks. Certain CJK languages have none of these, and are notoriously more difficult. Breaks occur randomly, usually in the middle of a word. This is a long standing issue in typography on web, and results in degradation of readability.Budou automatically translates CJK sentences into organized HTML code with lexical chunks wrapped in non-breaking markup so as to semantically control line breaks. Budou uses Google Cloud Natural Language API (NL API) to analyze the input sentence, and it concatenates proper words in order to produce meaningful chunks utilizing part-of-speech (pos) tagging and syntactic information. Processed chunks are wrapped with SPAN tag, so semantic units will no longer be split at the end of a line by specifying their display property as inline-block in CSS.

nlp - Extract values from strings and fill your structs with nlp.

You will always begin by creating a NL type calling nlp.New(), the NL type is a Natural Language Processor that owns 3 funcs, RegisterModel(), Learn() and P().RegisterModel takes 3 parameters, an empty struct, a set of samples and some options for the model.


PROSE is a system that performs controlled, systematic, and efficient modification of the code of running Java applications without requiring them to be shut down. PROSE is an infrastructure that supports software adaptation by extending apps at runtime.

Gate - General Architecture for Text Engineering

GATE excels at text analysis of all shapes and sizes. It provides support for diverse language processing tasks such as parsers, morphology, tagging, Information Retrieval tools, Information Extraction components for various languages, and many others. It provides support to measure, evaluate, model and persist the data structure. It could analyze text or speech. It has built-in support for machine learning and also adds support for different implementation of machine learning via plugin.

JTextPro: A Java Text Processing Toolkit

JTextPro: A Java-based Text Processing tool that includes sentence boundary detection (using maximum entropy classifier), word tokenization (following Penn conventions), part-of-speech tagging (using CRFTagger), and phrase chunking (using CRFChunker).


Prose is an playground for an experimental JavaScript like language compiler. Eventually it will implement 0-CFA, CFA2, and a Tracing JIT

The OpenNLP Maximum Entropy Package

Maximum entropy is a powerful method for constructing statistical models of classification tasks, such as part of speech tagging in Natural Language Processing. Several example applications using maxent can be found in the OpenNLP Tools Library.