spacy-nlp - Expose Spacy nlp text parsing to Nodejs (and other languages) via socketIO

  •        100

Note that python3 is preferred. If you use python2, at each run set the env var USE_PY2=true. Since it uses poly-socketio, there'll be one IO server, and one global.client(internal to this module) in the same process, no matter how many times poly-socketio is called. This resolves conflicts for cross-project usage.


bluebird : ^3.4.6
lodash : ^4.16.4
poly-socketio : ^1.1.1
portscanner : ^1.0.0
winston : ^2.2.0
snyk : ^1.41.1



Related Projects

spaCy - 💫 Industrial-strength Natural Language Processing (NLP) with Python and Cython

  •    Python

spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 20+ languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration. It's commercial open-source software, released under the MIT license. 💫 Version 2.0 out now! Check out the new features here.

textacy - NLP, before and after spaCy

  •    Python

textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spacy library. With the fundamentals --- tokenization, part-of-speech tagging, dependency parsing, etc. --- delegated to another library, textacy focuses on the tasks that come before and follow after. Note: Docs used to be hosted on ReadTheDocs, but the builds stopped working many months ago, and now those docs are out-of-date. This is unfortunate, especially since ReadTheDocs allows for multiple versions while GitHub Pages does not. I'll keep trying on ReadTheDocs; if the build issues ever get resolved, I'll switch the docs back.

displacy - :boom: displaCy.js: An open-source NLP visualiser for the modern web

  •    Javascript

⚠️ As of v2.0.0, the displaCy visualizers are now integrated into the core library. See here for more details on how to visualize a Doc object from within spaCy. We're also working on a new suite of tools for serving and testing spaCy models. The code of the standalone visualizers will still be available on GitHub, just not actively maintained. displaCy.js is a modern and service-independent visualisation library. We hope this makes it easy to compare different services, and explore your own in-house models. If you're using spaCy's syntactic parser, displaCy should be part of your regular workflow. Because spaCy's parser is statistical, it's often hard to predict how it will analyse a given sentence. Using displaCy, you can simply try and see. You can also share the page for discussion with your team, or save the SVG to use elsewhere. If you're developing your own model, you can run the service yourself — it's 100% open source.

neuralcoref - ✨Fast Coreference Resolution in spaCy with Neural Networks

  •    Python

NeuralCoref is a pipeline extension for spaCy 2.0 that annotates and resolves coreference clusters using a neural network. NeuralCoref is production-ready, integrated in spaCy's NLP pipeline and easily extensible to new training datasets. For a brief introduction to coreference resolution and NeuralCoref, please refer to our blog post. NeuralCoref is written in Python/Cython and comes with pre-trained statistical models for English. It can be trained in other languages. NeuralCoref is accompanied by a visualization client NeuralCoref-Viz, a web interface powered by a REST server that can be tried online. NeuralCoref is released under the MIT license.

thinc - 🔮 spaCy's Machine Learning library for NLP in Python

  •    Assembly

Thinc is the machine learning library powering spaCy. It features a battle-tested linear model designed for large sparse learning problems, and a flexible neural network model under development for spaCy v2.0. Thinc is a practical toolkit for implementing models that follow the "Embed, encode, attend, predict" architecture. It's designed to be easy to install, efficient for CPU usage and optimised for NLP and deep learning with text – in particular, hierarchically structured input and variable-length sequences.

sense2vec - 🦆 Use NLP to go beyond vanilla word2vec

  •    C++

sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting, detailed and context-sensitive word vectors. For an interactive example of the technology, see our sense2vec demo that lets you explore semantic similarities across all Reddit comments of 2015. This library is a simple Python/Cython implementation for loading and querying sense2vec models. While it's best used in combination with spaCy, the sense2vec library itself is very lightweight and can also be used as a standalone module. See below for usage details.

deepnlp - Deep Learning NLP Pipeline implemented on Tensorflow

  •    Python

Deep Learning NLP Pipeline implemented on Tensorflow. Following the 'simplicity' rule, this project aims to use the deep learning library of Tensorflow to implement new NLP pipeline. You can extend the project to train models with your own corpus/languages. Pretrained models of Chinese corpus are distributed. Free RESTful NLP API are also provided. Visit for details. 下载预训练模型 If you install deepnlp via pip, the pre-trained models are not distributed due to size restriction. You can download full models for 'Segment', 'POS' en and zh, 'NER' zh, zh_entertainment, zh_o2o, 'Textsum' by calling the download function.

flair - A very simple framework for state-of-the-art NLP

  •    Python

A very simple framework for state-of-the-art NLP. Developed by Zalando Research. A powerful syntactic-semantic tagger / classifier. Flair allows you to apply our state-of-the-art models for named entity recognition (NER), part-of-speech tagging (PoS), frame sense disambiguation, chunking and classification to your text.

NCRFpp - NCRF++, an Open-source Neural Sequence Labeling Toolkit

  •    Python

Sequence labeling models are quite popular in many NLP tasks, such as Named Entity Recognition (NER), part-of-speech (POS) tagging and word segmentation. State-of-the-art sequence labeling models mostly utilize the CRF structure with input word features. LSTM (or bidirectional LSTM) is a popular deep learning based feature extractor in sequence labeling task. And CNN can also be used due to faster computation. Besides, features within word are also useful to represent word, which can be captured by character LSTM or character CNN structure or human-defined neural features. NCRF++ is a PyTorch based framework with flexiable choices of input features and output structures. The design of neural sequence labeling models with NCRF++ is fully configurable through a configuration file, which does not require any code work. NCRF++ is a neural version of CRF++, which is a famous statistical CRF framework.

nlpnet - A neural network architecture for NLP tasks, inspired in the SENNA system

  •    Python

Gitter is chat room for developers. nlpnet is a Python library for Natural Language Processing tasks based on neural networks. Currently, it performs part-of-speech tagging, semantic role labeling and dependency parsing. Most of the architecture is language independent, but some functions were specially tailored for working with Portuguese. This system was inspired by SENNA.

text-analytics-with-python - Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer

  •    Python

Derive useful insights from your data using Python. Learn the techniques related to natural language processing and text analytics, and gain the skills to know which technique is best suited to solve a particular problem. A structured and comprehensive approach is followed in this book so that readers with little or no experience do not find themselves overwhelmed. You will start with the basics of natural language and Python and move on to advanced analytical and machine learning concepts. You will look at each technique and algorithm with both a bird's eye view to understand how it can be used as well as with a microscopic view to understand the mathematical concepts and to implement them to solve your own problems.

Rasa - Create chatbots and voice assistants

  •    Python

Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build chatbots on Facebook, Slack, Microsoft Bot Framework, Rocket.Chat, Mattermost, Telegram etc. Rasa's primary purpose is to help you build contextual, layered conversations with lots of back-and-forth. To have a real conversation, you need to have some memory and build on things that were said earlier. Rasa lets you do that in a scalable way.

snips-nlu - Snips Python library to extract meaning from text

  •    Python

Snips NLU (Natural Language Understanding) is a Python library that allows to parse sentences written in natural language and extracts structured information. To find out how to use Snips NLU please refer to our documentation, it will provide you with a step-by-step guide on how to use and setup our library.

nlp-with-ruby - Practical Natural Language Processing done in Ruby.

  •    Ruby

This curated list comprises awesome resources, libraries, information sources about computational processing of texts in human languages with the Ruby programming language. That field is often referred to as NLP, Computational Linguistics, HLT (Human Language Technology) and can be brought in conjunction with Artificial Intelligence, Machine Learning, Information Retrieval, Text Mining, Knowledge Extraction and other related disciplines. This list comes from our day to day work on Language Models and NLP Tools. Read why this list is awesome. Our FAQ describes the important decisions and useful answers you may be interested in.

NLP-progress - Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks

  •    HTML

This document aims to track the progress in Natural Language Processing (NLP) and give an overview of the state-of-the-art (SOTA) across the most common NLP tasks and their corresponding datasets. It aims to cover both traditional and core NLP tasks such as dependency parsing and part-of-speech tagging as well as more recent ones such as reading comprehension and natural language inference. The main objective is to provide the reader with a quick overview of benchmark datasets and the state-of-the-art for their task of interest, which serves as a stepping stone for further research. To this end, if there is a place where results for a task are already published and regularly maintained, such as a public leaderboard, the reader will be pointed there.

pytextrank - Python implementation of TextRank for text document NLP parsing and summarization

  •    Jupyter

Python implementation of TextRank, based on the Mihalcea 2004 paper. The results produced by this implementation are intended more for use as feature vectors in machine learning, not as academic paper summaries.

OpenNLP - Machine learning based toolkit for the processing of natural language text

  •    Java

The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.