Displaying 1 to 20 from 57 results

jieba - 结巴中文分词


"Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best Python Chinese word segmentation module.

sling - SLING - A natural language frame semantics parser


SLING is a parser for annotating text with frame semantic annotations. It is trained on an annotated corpus using Tensorflow and Dragnn.The parser is a general transition-based frame semantic parser using bi-directional LSTMs for input encoding and a Transition Based Recurrent Unit (TBRU) for output decoding. It is a jointly trained model using only the text tokens as input and the transition system has been designed to output frame graphs directly without any intervening symbolic representation.

Smile - Statistical Machine Intelligence & Learning Engine


Smile (Statistical Machine Intelligence and Learning Engine) is a fast and comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system in Java and Scala. With advanced data structures and algorithms, Smile delivers state-of-art performance.Smile covers every aspect of machine learning, including classification, regression, clustering, association rule mining, feature selection, manifold learning, multidimensional scaling, genetic algorithms, missing value imputation, efficient nearest neighbor search, etc.

nlp-with-ruby - Practical Natural Language Processing done in Ruby.


This curated list comprises awesome resources, libraries, information sources about computational processing of texts in human languages with the Ruby programming language. That field is often referred to as NLP, Computational Linguistics, HLT (Human Language Technology) and can be brought in conjunction with Artificial Intelligence, Machine Learning, Information Retrieval, Text Mining, Knowledge Extraction and other related disciplines. This list comes from our day to day work on Language Models and NLP Tools. Read why this list is awesome. Our FAQ describes the important decisions and useful answers you may be interested in.




snips-nlu - Snips Python library to extract meaning from text


Snips NLU (Natural Language Understanding) is a Python library that allows to parse sentences written in natural language and extracts structured information. To find out how to use Snips NLU please refer to our documentation, it will provide you with a step-by-step guide on how to use and setup our library.

pytextrank - Python implementation of TextRank for text document NLP parsing and summarization


Python implementation of TextRank, based on the Mihalcea 2004 paper. The results produced by this implementation are intended more for use as feature vectors in machine learning, not as academic paper summaries.

gensim - Topic Modelling for Humans


Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

go-i18n - Translate your Go program into multiple languages with templates and CLDR plural support.


go-i18n is a Go package and a command that helps you translate Go programs into multiple languages.The i18n package provides runtime APIs for fetching translated strings.



budou - Budou is an automatic organizer tool for beautiful line breaking in CJK (Chinese, Japanese, and Korean)


English uses spacing and hyphenation as cues to allow for beautiful and legible line breaks. Certain CJK languages have none of these, and are notoriously more difficult. Breaks occur randomly, usually in the middle of a word. This is a long standing issue in typography on web, and results in degradation of readability.Budou automatically translates CJK sentences into organized HTML code with lexical chunks wrapped in non-breaking markup so as to semantically control line breaks. Budou uses Google Cloud Natural Language API (NL API) to analyze the input sentence, and it concatenates proper words in order to produce meaningful chunks utilizing part-of-speech (pos) tagging and syntactic information. Processed chunks are wrapped with SPAN tag, so semantic units will no longer be split at the end of a line by specifying their display property as inline-block in CSS.

sentencepiece - Unsupervised text tokenizer for Neural Network-based text generation.


SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training. SentencePiece implements sub-word units (also known as wordpieces [Wu et al.] [Schuster et al.] and byte-pair-encoding (BPE) [Sennrich et al.]) with the extension of direct training from raw sentences. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre/postprocessing.This is not an official Google product.

Language Detection - Language Detection Library in Java


This is a language detection library implemented in plain Java. It detects language of a text using naive Bayesian filter. It is 99% over precision for 53 languages.

prose - :book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction


prose is Go library for text (primarily English at the moment) processing that supports tokenization, part-of-speech tagging, named-entity extraction, and more. The library's functionality is split into subpackages designed for modular use.See the GoDoc documentation for more information.

PyTorch-NLP - Supporting Rapid Prototyping with a Toolkit (incl. Datasets and Neural Network Layers)


PyTorch-NLP, or torchnlp for short, is a library of neural network layers, text processing modules and datasets designed to accelerate Natural Language Processing (NLP) research. Join our community, add datasets and neural network layers! Chat with us on Gitter and join the Google Group, we're eager to collaborate with you.

nlp - Extract values from strings and fill your structs with nlp.


You will always begin by creating a NL type calling nlp.New(), the NL type is a Natural Language Processor that owns 3 funcs, RegisterModel(), Learn() and P().RegisterModel takes 3 parameters, an empty struct, a set of samples and some options for the model.

Sherlock - Natural-language event parser for Javascript


Sherlock parses events written in plain English, and returns an object defining a basic event. Try demo. Just like Sherlock the detective finds the clues needed to solve a mystery by looking at a crime scene, Sherlock.js finds the components needed to define an event by looking at a sentence. But Sherlock doesn't work alone...

nlpnet - A neural network architecture for NLP tasks, inspired in the SENNA system


Gitter is chat room for developers. nlpnet is a Python library for Natural Language Processing tasks based on neural networks. Currently, it performs part-of-speech tagging, semantic role labeling and dependency parsing. Most of the architecture is language independent, but some functions were specially tailored for working with Portuguese. This system was inspired by SENNA.

recurrent-entity-networks - TensorFlow implementation of "Tracking the World State with Recurrent Entity Networks"


This repository contains an independent TensorFlow implementation of recurrent entity networks from Tracking the World State with Recurrent Entity Networks. This paper introduces the first method to solve all of the bAbI tasks using 10k training examples. The author's original Torch implementation is now available here. Percent error for each task, comparing those in the paper to the implementation contained in this repository.

pynlpl - PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing


PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotatation). The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.

whatlanggo - Natural language detection library for Go


Natural language detection for Go.Thanks to greyblake Potapov Sergey for creating whatlang-rs from where I got the idea and logic.