spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 20+ languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration. It's commercial open-source software, released under the MIT license. 💫 Version 2.0 out now! Check out the new features here.
natural-language-processing data-science big-data machine-learning cython nlp artificial-intelligence ai spacy nlp-library neural-network neural-networks deep-learningA comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
pytorch machine-learning deep-learning tutorials papers awesome awesome-list pytorch-tutorials data-science nlp nlp-library cv computer-vision natural-language-processing facebook probabilistic-programming utility-library neural-network pytorch-model🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation and more in over 100 languages. Its aim is to make cutting-edge NLP easier to use for everyone. 🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.
nlp natural-language-processing tensorflow pytorch transformer speech-recognition seq2seq flax gpt pretrained-models language-models natural-language-generation nlp-library language-model bert natural-language-understanding jax xlnet pytorch-transformers model-hubA Python port of the Apache Tika library that makes Tika available using the Tika REST Server. This makes Apache Tika available as a Python library, installable via Setuptools, Pip and Easy Install.
tika-server tika-python tika-server-jar parser-interface parse translation-interface usc text-extraction mime buffer memex text-recognition detection recognition nlp nlp-machine-learning nlp-libraryKagome is an open source Japanese morphological analyzer written in pure golang. The MeCab-IPADIC and UniDic (unidic-mecab) dictionary/statiscal models are packaged in Kagome binary. Kagome has segmentation mode for search such as Kuromoji.
japanese tokenizer nlp-library japanese-language pos-tagging segmentation morphological-analysisPyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotatation). The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.
nlp computational-linguistics linguistics library folia machine-learning language-modelling search-algorithms evaluation-metrics text-processing nlp-library natural-language-processingpackage lingo provides the data structures and algorithms required for natural language processing.Specifically, it provides a POS Tagger (lingo/pos), a Dependency Parser (lingo/dep), and a basic tokenizer (lingo/lexer) for English. It also provides data structures for holding corpuses (lingo/corpus), and treebanks (lingo/treebank).
natural-language-processing nlp nlp-library nlp-parsing nlp-dependency-parsing nlp-machine-learning language-model part-of-speech-tagger part-of-speech inflection conll-uThis library implements a pure Ruby interface to the WordNet lexical/semantic database. Unlike existing ruby bindings, this one doesn't require you to convert the original WordNet database into a new database format; instead it can work directly on the database that comes with WordNet. If you're doing something data-intensive you will achieve much better performance with Michael Granger's Ruby-WordNet, since it converts the WordNet database into a BerkelyDB file for quicker access. rwordnet has a much smaller footprint, with no gem or native dependencies, and requires about a third of the space on disk as Ruby-Wordnet + DB. In writing rwordnet, I've focused more on usability and ease of installation ( gem install rwordnet ) at the expense of some performance. Use at your own risk, etc.
wordnet wordnet-tags nlp-libraryThis code is a ruby 1.9.x port of the Punkt sentence tokenizer algorithm implemented by the NLTK Project (http://www.nltk.org/). Punkt is a language-independent, unsupervised approach to sentence boundary detection. It is based on the assumption that a large number of ambiguities in the determination of sentence boundaries can be eliminated once abbreviations have been identified. I simply did the ruby port and some API changes.
rubynlp sentence-tokenizer sentence-boundaries tokenized-sentences punkt-segmenter ruby-port nltk nlp-libraryThis is a Python binding to the tokeniser Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto). Advanced note: If the ucto libraries and includes are installed in a non-standard location, you can set environment variables INCLUDE_DIRS and LIBRARY_DIRS to point to them prior to invocation of setup.py install.
nlp computational-linguistics tokenizer text-processing folia nlp-libraryJapanese morphological analyzer
morphological-analysis segmentation nlp-library pos-taggingSudachiPy is a Python version of Sudachi, a Japanese morphological analyzer. Sudachi & SudachiPy are developed in WAP Tokushima Laboratory of AI and NLP, an institute under Works Applications that focuses on Natural Language Processing (NLP).
nlp-library morphological-analysis segmentation pos-taggingAs of May 1st, 2017 the Duckling team deprecated the Clojure version in favor of the new Duckling. See their blog post announcement. My intention is to continue the clojure development of the Duckling project so I forked it in this new project.
nlp nlp-libraryAn open-source library for Automatic Term Recognition written in Scala. N.Astrakhantsev. ATR4S: Toolkit with State-of-the-art Automatic Terms Recognition Methods in Scala. arXiv preprint arXiv:1611.07804, 2016.
terminology-extraction nlp-library nlp-keywords-extractionInstallation of fast.ai library is required. Please install using the instructions here . It is important that the latest version of fast.ai is used and not the pip version which is not up to date. The main goal of quick-nlp is to provided the easy interface of the fast.ai library for seq2seq models.
pytorch fastai nlp-library seq2seqLingua is a language detection library for Java and other JVM languages, suitable for long and short text alike. Its task is simple: It tells you which language some provided textual data is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages.
language-detection language-model language-modeling language-processing languages kotlin kotlin-library java-library nlp-library nlp nlp-machine-learning natural-language natural-language-processing language-detection-library android android-library language-identification language-classification language-recognitionThe classic sentiment corpus, 2000 movie reviews already gathered by NLTK. CrowdFlower hosts a number of Twitter corpora that have already been graded for sentiment by panels of humans.
nlp-sentiment-classifier nlp nlp-machine-learning nlp-library sentiment-classification sentiment-predictions sentiment-classifier sentiment machine-learning machinelearning batteries-included automated-machine-learningIf you just want to edit NLP data, it's easier to just use the Universal Data Tool (MIT). This library is a module of the Universal Data Tool for use in custom react applications.
nlp text-mining text-classification text entity classification nlp-library hacktoberfest nlp-machine-learning text-entities text-entity-analysis entity-relation-labeling
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.