Displaying 1 to 10 from 10 results

NCRFpp - NCRF++, an Open-source Neural Sequence Labeling Toolkit

  •    Python

Sequence labeling models are quite popular in many NLP tasks, such as Named Entity Recognition (NER), part-of-speech (POS) tagging and word segmentation. State-of-the-art sequence labeling models mostly utilize the CRF structure with input word features. LSTM (or bidirectional LSTM) is a popular deep learning based feature extractor in sequence labeling task. And CNN can also be used due to faster computation. Besides, features within word are also useful to represent word, which can be captured by character LSTM or character CNN structure or human-defined neural features. NCRF++ is a PyTorch based framework with flexiable choices of input features and output structures. The design of neural sequence labeling models with NCRF++ is fully configurable through a configuration file, which does not require any code work. NCRF++ is a neural version of CRF++, which is a famous statistical CRF framework.

prose - :book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction

  •    Go

prose is Go library for text (primarily English at the moment) processing that supports tokenization, part-of-speech tagging, named-entity extraction, and more. The library's functionality is split into subpackages designed for modular use.See the GoDoc documentation for more information.

spark-nlp - Natural Language Understanding Library for Apache Spark.

  •    Jupyter

John Snow Labs Spark-NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment. This library has been uploaded to the spark-packages repository https://spark-packages.org/package/JohnSnowLabs/spark-nlp .




lingo - package lingo provides the data structures and algorithms required for natural language processing

  •    Go

package lingo provides the data structures and algorithms required for natural language processing.Specifically, it provides a POS Tagger (lingo/pos), a Dependency Parser (lingo/dep), and a basic tokenizer (lingo/lexer) for English. It also provides data structures for holding corpuses (lingo/corpus), and treebanks (lingo/treebank).


iparser - Yet another dependency parser, integrated with tokenizer, tagger and visualization tool.

  •    Python

Yet another multilingual dependency parser, integrated with tokenizer, part-of-speech tagger and visualization tool. IParser can parse raw sentence to dependency tree in CoNLL format, and is able to visualize trees in your browser. Currently, iparser is in a prototype state. It makes no warranty and may not be ready for practical usage.

Mimick - Code for Mimicking Word Embeddings using Subword RNNs (EMNLP 2017)

  •    Python

Code for Mimicking Word Embeddings using Subword RNNs (EMNLP 2017) and subsequent experiments. I'm adding details to this documentation as I go. When I'm through, this comment will be gone.