Displaying 1 to 19 from 19 results

spaCy - 💫 Industrial-strength Natural Language Processing (NLP) with Python and Cython

  •    Python

spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 20+ languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration. It's commercial open-source software, released under the MIT license. 💫 Version 2.0 out now! Check out the new features here.

transformers - 🤗Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX

  •    Python

🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation and more in over 100 languages. Its aim is to make cutting-edge NLP easier to use for everyone. 🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.

kagome - Self-contained Japanese Morphological Analyzer written in pure Go

  •    Go

Kagome is an open source Japanese morphological analyzer written in pure golang. The MeCab-IPADIC and UniDic (unidic-mecab) dictionary/statiscal models are packaged in Kagome binary. Kagome has segmentation mode for search such as Kuromoji.

pynlpl - PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing

  •    Python

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotatation). The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.

lingo - package lingo provides the data structures and algorithms required for natural language processing

  •    Go

package lingo provides the data structures and algorithms required for natural language processing.Specifically, it provides a POS Tagger (lingo/pos), a Dependency Parser (lingo/dep), and a basic tokenizer (lingo/lexer) for English. It also provides data structures for holding corpuses (lingo/corpus), and treebanks (lingo/treebank).

rwordnet - A pure Ruby interface to the WordNet database

  •    Ruby

This library implements a pure Ruby interface to the WordNet lexical/semantic database. Unlike existing ruby bindings, this one doesn't require you to convert the original WordNet database into a new database format; instead it can work directly on the database that comes with WordNet. If you're doing something data-intensive you will achieve much better performance with Michael Granger's Ruby-WordNet, since it converts the WordNet database into a BerkelyDB file for quicker access. rwordnet has a much smaller footprint, with no gem or native dependencies, and requires about a third of the space on disk as Ruby-Wordnet + DB. In writing rwordnet, I've focused more on usability and ease of installation ( gem install rwordnet ) at the expense of some performance. Use at your own risk, etc.

punkt-segmenter - Ruby port of the NLTK Punkt sentence segmentation algorithm

  •    Ruby

This code is a ruby 1.9.x port of the Punkt sentence tokenizer algorithm implemented by the NLTK Project (http://www.nltk.org/). Punkt is a language-independent, unsupervised approach to sentence boundary detection. It is based on the assumption that a large number of ambiguities in the determination of sentence boundaries can be eliminated once abbreviations have been identified. I simply did the ruby port and some API changes.

python-ucto - This is a Python binding to the tokenizer Ucto

  •    Python

This is a Python binding to the tokeniser Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto). Advanced note: If the ucto libraries and includes are installed in a non-standard location, you can set environment variables INCLUDE_DIRS and LIBRARY_DIRS to point to them prior to invocation of setup.py install.

SudachiPy - Python version of Sudachi, a Japanese morphological analyzer.

  •    Python

SudachiPy is a Python version of Sudachi, a Japanese morphological analyzer. Sudachi & SudachiPy are developed in WAP Tokushima Laboratory of AI and NLP, an institute under Works Applications that focuses on Natural Language Processing (NLP).

clj-duckling - Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings

  •    Clojure

As of May 1st, 2017 the Duckling team deprecated the Clojure version in favor of the new Duckling. See their blog post announcement. My intention is to continue the clojure development of the Duckling project so I forked it in this new project.

atr4s - Toolkit with state-of-the-art Automatic Terms Recognition methods in Scala

  •    Scala

An open-source library for Automatic Term Recognition written in Scala. N.Astrakhantsev. ATR4S: Toolkit with State-of-the-art Automatic Terms Recognition Methods in Scala. arXiv preprint arXiv:1611.07804, 2016.

quick-nlp - Pytorch NLP library based on FastAI

  •    Python

Installation of fast.ai library is required. Please install using the instructions here . It is important that the latest version of fast.ai is used and not the pip version which is not up to date. The main goal of quick-nlp is to provided the easy interface of the fast.ai library for seq2seq models.

lingua - The most accurate natural language detection library for Java and other JVM languages, suitable for long and short text alike

  •    Kotlin

Lingua is a language detection library for Java and other JVM languages, suitable for long and short text alike. Its task is simple: It tells you which language some provided textual data is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages.

react-nlp-annotate - Interface for making NLP annotations.

  •    Javascript

If you just want to edit NLP data, it's easier to just use the Universal Data Tool (MIT). This library is a module of the Universal Data Tool for use in custom react applications.