spaCy - 💫 Industrial-strength Natural Language Processing (NLP) with Python and Cython

  •        24

spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 20+ languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration. It's commercial open-source software, released under the MIT license. 💫 Version 2.0 out now! Check out the new features here.



Related Projects

thinc - 🔮 spaCy's Machine Learning library for NLP in Python

  •    Assembly

Thinc is the machine learning library powering spaCy. It features a battle-tested linear model designed for large sparse learning problems, and a flexible neural network model under development for spaCy v2.0. Thinc is a practical toolkit for implementing models that follow the "Embed, encode, attend, predict" architecture. It's designed to be easy to install, efficient for CPU usage and optimised for NLP and deep learning with text – in particular, hierarchically structured input and variable-length sequences.

practical-machine-learning-with-python - Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system

  •    Jupyter

"Data is the new oil" is a saying which you must have heard by now along with the huge interest building up around Big Data and Machine Learning in the recent past along with Artificial Intelligence and Deep Learning. Besides this, data scientists have been termed as having "The sexiest job in the 21st Century" which makes it all the more worthwhile to build up some valuable expertise in these areas. Getting started with machine learning in the real world can be overwhelming with the vast amount of resources out there on the web. "Practical Machine Learning with Python" follows a structured and comprehensive three-tiered approach packed with concepts, methodologies, hands-on examples, and code. This book is packed with over 500 pages of useful information which helps its readers master the essential skills needed to recognize and solve complex problems with Machine Learning and Deep Learning by following a data-driven mindset. By using real-world case studies that leverage the popular Python Machine Learning ecosystem, this book is your perfect companion for learning the art and science of Machine Learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute Machine Learning systems and projects successfully.

neuralcoref - ✨Fast Coreference Resolution in spaCy with Neural Networks

  •    Python

NeuralCoref is a pipeline extension for spaCy 2.0 that annotates and resolves coreference clusters using a neural network. NeuralCoref is production-ready, integrated in spaCy's NLP pipeline and easily extensible to new training datasets. For a brief introduction to coreference resolution and NeuralCoref, please refer to our blog post. NeuralCoref is written in Python/Cython and comes with pre-trained statistical models for English. It can be trained in other languages. NeuralCoref is accompanied by a visualization client NeuralCoref-Viz, a web interface powered by a REST server that can be tried online. NeuralCoref is released under the MIT license.

PyTorch-NLP - Supporting Rapid Prototyping with a Toolkit (incl. Datasets and Neural Network Layers)

  •    Python

PyTorch-NLP, or torchnlp for short, is a library of neural network layers, text processing modules and datasets designed to accelerate Natural Language Processing (NLP) research. Join our community, add datasets and neural network layers! Chat with us on Gitter and join the Google Group, we're eager to collaborate with you.

lectures - Oxford Deep NLP 2017 course


This repository contains the lecture slides and course description for the Deep Natural Language Processing course offered in Hilary Term 2017 at the University of Oxford. This is an applied course focussing on recent advances in analysing and generating speech and text using recurrent neural networks. We introduce the mathematical definitions of the relevant machine learning models and derive their associated optimisation algorithms. The course covers a range of applications of neural networks in NLP including analysing latent dimensions in text, transcribing speech to text, translating between languages, and answering questions. These topics are organised into three high level themes forming a progression from understanding the use of neural networks for sequential language modelling, to understanding their use as conditional language models for transduction tasks, and finally to approaches employing these techniques in combination with other mechanisms for advanced applications. Throughout the course the practical implementation of such models on CPU and GPU hardware is also discussed.

LSTM-Human-Activity-Recognition - Human Activity Recognition example using TensorFlow on smartphone sensors dataset and an LSTM RNN (Deep Learning algo)

  •    Jupyter

Compared to a classical approach, using a Recurrent Neural Networks (RNN) with Long Short-Term Memory cells (LSTMs) require no or almost no feature engineering. Data can be fed directly into the neural network who acts like a black box, modeling the problem correctly. Other research on the activity recognition dataset can use a big amount of feature engineering, which is rather a signal processing approach combined with classical data science techniques. The approach here is rather very simple in terms of how much was the data preprocessed. Let's use Google's neat Deep Learning library, TensorFlow, demonstrating the usage of an LSTM, a type of Artificial Neural Network that can process sequential data / time series.

tensorlayer-tricks - How to use TensorLayer


While research in Deep Learning continues to improve the world, we use a bunch of tricks to implement algorithms with TensorLayer day to day. Here are a summary of the tricks to use TensorLayer. If you find a trick that is particularly useful in practice, please open a Pull Request to add it to the document. If we find it to be reasonable and verified, we will merge it in.

deep-learning-book - Repository for "Introduction to Artificial Neural Networks and Deep Learning: A Practical Guide with Applications in Python"

  •    Jupyter

Repository for the book Introduction to Artificial Neural Networks and Deep Learning: A Practical Guide with Applications in Python. Deep learning is not just the talk of the town among tech folks. Deep learning allows us to tackle complex problems, training artificial neural networks to recognize complex patterns for image and speech recognition. In this book, we'll continue where we left off in Python Machine Learning and implement deep learning algorithms in PyTorch.

neuralmonkey - An open-source tool for sequence learning in NLP built on TensorFlow.

  •    Python

The Neural Monkey package provides a higher level abstraction for sequential neural network models, most prominently in Natural Language Processing (NLP). It is built on TensorFlow. It can be used for fast prototyping of sequential models in NLP which can be used e.g. for neural machine translation or sentence classification. The higher-level API brings together a collection of standard building blocks (RNN encoder and decoder, multi-layer perceptron) and a simple way of adding new building blocks implemented directly in TensorFlow.

AIDL-Series - :books: Series of Artificial Intelligence & Deep Learning, including Mathematics Fundamentals, Python Practices, NLP Application, etc


:books: Series of Artificial Intelligence & Deep Learning, including Mathematics Fundamentals, Python Practices, NLP Application, etc. 💫 人工智能与深度学习实战,机器学习篇 | Tensoflow 篇

sense2vec - 🦆 Use NLP to go beyond vanilla word2vec

  •    C++

sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting, detailed and context-sensitive word vectors. For an interactive example of the technology, see our sense2vec demo that lets you explore semantic similarities across all Reddit comments of 2015. This library is a simple Python/Cython implementation for loading and querying sense2vec models. While it's best used in combination with spaCy, the sense2vec library itself is very lightweight and can also be used as a standalone module. See below for usage details.

gensim - Topic Modelling for Humans

  •    Python

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

tensorlayer - Deep Learning and Reinforcement Learning Library for Developers and Scientists

  •    Python

TensorLayer is a novel TensorFlow-based deep learning and reinforcement learning library designed for researchers and engineers. It provides a large collection of customizable neural layers / functions that are key to build real-world AI applications. TensorLayer is awarded the 2017 Best Open Source Software by the ACM Multimedia Society. Simplicity : TensorLayer lifts the low-level dataflow interface of TensorFlow to high-level layers / models. It is very easy to learn through the rich example codes contributed by a wide community.

lightnet - 🌓 Bringing pjreddie's DarkNet out of the shadows #yolo

  •    C

LightNet provides a simple and efficient Python interface to DarkNet, a neural network library written by Joseph Redmon that's well known for its state-of-the-art object detection models, YOLO and YOLOv2. LightNet's main purpose for now is to power Prodigy's upcoming object detection and image segmentation features. However, it may be useful to anyone interested in the DarkNet library. Once you've downloaded LightNet, you can install a model using the lightnet download command. This will save the models in the lightnet/data directory. If you've installed LightNet system-wide, make sure to run the command as administrator.

Accord.NET - Machine learning, Computer vision, Statistics and general scientific computing for .NET

  •    CSharp

The Accord.NET project provides machine learning, statistics, artificial intelligence, computer vision and image processing methods to .NET. It can be used on Microsoft Windows, Xamarin, Unity3D, Windows Store applications, Linux or mobile.

H2O - Fast Scalable Machine Learning API For Smarter Applications

  •    Java

H2O is for data scientists and application developers who need fast, in-memory scalable machine learning for smarter applications. H2O is an open source parallel processing engine for machine learning. Unlike traditional analytics tools, H2O provides a combination of extraordinary math, a high performance parallel architecture, and unrivaled ease of use.