"Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best Python Chinese word segmentation module.
natural-language-processing nlpSLING is a parser for annotating text with frame semantic annotations. It is trained on an annotated corpus using Tensorflow and Dragnn.The parser is a general transition-based frame semantic parser using bi-directional LSTMs for input encoding and a Transition Based Recurrent Unit (TBRU) for output decoding. It is a jointly trained model using only the text tokens as input and the transition system has been designed to output frame graphs directly without any intervening symbolic representation.
natural-language-understanding natural-language-processing neural-network machine-learning jit-compiler frame-semantic-parsing nlpSmile (Statistical Machine Intelligence and Learning Engine) is a fast and comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system in Java and Scala. With advanced data structures and algorithms, Smile delivers state-of-art performance.Smile covers every aspect of machine learning, including classification, regression, clustering, association rule mining, feature selection, manifold learning, multidimensional scaling, genetic algorithms, missing value imputation, efficient nearest neighbor search, etc.
machine-learning nlp linear-algebra natural-language-processingAddresses and the locations they represent are essential for any application dealing with maps (place search, transportation, on-demand/delivery services, check-ins, reviews). Yet even the simplest addresses are packed with local conventions, abbreviations and context, making them difficult to index/query effectively with traditional full-text search engines. This library helps convert the free-form addresses that humans use into clean normalized forms suitable for machine comparison and full-text indexing. Though libpostal is not itself a full geocoder, it can be used as a preprocessing step to make any geocoding application smarter, simpler, and more consistent internationally. The core library is written in pure C. Language bindings for Python, Ruby, Go, Java, PHP, and NodeJS are officially supported and it's easy to write bindings in other languages.
address-parser machine-learning nlp address international deduplication record-linkage deduping natural-language-processingPython implementation of TextRank, based on the Mihalcea 2004 paper. The results produced by this implementation are intended more for use as feature vectors in machine learning, not as academic paper summaries.
textrank summarization natural-language-processing text-analytics nlp nlp-parsing machine-learning graph-algorithmsGensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.
gensim topic-modeling information-retrieval machine-learning natural-language-processing nlp data-science data-mining word2vec word-embeddings text-summarization neural-network document-similarity word-similarity fasttextExtract text from any document. No muss. No fuss. Full documentation.
natural-language-processing data-mining text-miningTextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both.
nlp nltk pattern python-3 python-2 natural-language-processingCourse materials for General Assembly's Data Science course in Washington, DC (8/18/15 - 10/29/15).
data-science machine-learning scikit-learn data-analysis pandas jupyter-notebook course linear-regression logistic-regression model-evaluation naive-bayes natural-language-processing decision-trees ensemble-learning clustering regular-expressions web-scraping data-visualization data-cleaningPlease read the contribution guidelines before contributing. Please feel free to create pull requests.
natural-language-processing deep-learning machine-learning language awesome awesome-list nlp text-miningAn Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks. If you need pointers on setting up an appropriate Python environment or would like to install AllenNLP using a different method, see below.
pytorch nlp natural-language-processing deep-learning data-sciencesense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting, detailed and context-sensitive word vectors. For an interactive example of the technology, see our sense2vec demo that lets you explore semantic similarities across all Reddit comments of 2015. This library is a simple Python/Cython implementation for loading and querying sense2vec models. While it's best used in combination with spaCy, the sense2vec library itself is very lightweight and can also be used as a standalone module. See below for usage details.
spacy nlp natural-language-processing word2vec sense2vec gensim gensim-word2vec machine-learningspaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 20+ languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration. It's commercial open-source software, released under the MIT license. 💫 Version 2.0 out now! Check out the new features here.
natural-language-processing data-science big-data machine-learning cython nlp artificial-intelligence ai spacy nlp-library neural-network neural-networks deep-learningThinc is the machine learning library powering spaCy. It features a battle-tested linear model designed for large sparse learning problems, and a flexible neural network model under development for spaCy v2.0. Thinc is a practical toolkit for implementing models that follow the "Embed, encode, attend, predict" architecture. It's designed to be easy to install, efficient for CPU usage and optimised for NLP and deep learning with text – in particular, hierarchically structured input and variable-length sequences.
machine-learning deep-learning artificial-intelligence ai nlp natural-language-processing spacy machine-learning-libraryThe Natural Language Decathlon is a multitask challenge that spans ten tasks: question answering (SQuAD), machine translation (IWSLT), summarization (CNN/DM), natural language inference (MNLI), sentiment analysis (SST), semantic role labeling(QA‑SRL), zero-shot relation extraction (QA‑ZRE), goal-oriented dialogue (WOZ, semantic parsing (WikiSQL), and commonsense reasoning (MWSC). Each task is cast as question answering, which makes it possible to use our new Multitask Question Answering Network (MQAN). This model jointly learns all tasks in decaNLP without any task-specific modules or parameters in the multitask setting. For a more thorough introduction to decaNLP and the tasks, see the main website, our blog post, or the paper. While the research direction associated with this repository focused on multitask learning, the framework itself is designed in a way that should make single-task training, transfer learning, and zero-shot evaluation simple. Similarly, the paper focused on multitask learning as a form of question answering, but this framework can be easily adapted for different approached to single-task or multitask learning.
deep-learning natural-language-processing multitask-learning汉语言处理包
nlp natural-language-processing hanlp crf hmm trie textrank doublearraytrie neural-network chinese-word-segmentation text-mining pos-tagging dependency-parser text-classification word2vec perceptron named-entity-recognition text-clusteringThis curated list comprises awesome resources, libraries, information sources about computational processing of texts in human languages with the Ruby programming language. That field is often referred to as NLP, Computational Linguistics, HLT (Human Language Technology) and can be brought in conjunction with Artificial Intelligence, Machine Learning, Information Retrieval, Text Mining, Knowledge Extraction and other related disciplines. This list comes from our day to day work on Language Models and NLP Tools. Read why this list is awesome. Our FAQ describes the important decisions and useful answers you may be interested in.
machine-learning natural-language-processing nlp sentiment-analysis pos-tag awesome awesome-list list computational-linguistics rubynlp rubymlSnips NLU (Natural Language Understanding) is a Python library that allows to parse sentences written in natural language and extracts structured information. To find out how to use Snips NLU please refer to our documentation, it will provide you with a step-by-step guide on how to use and setup our library.
nlp nlu machine-learning data-science text-classification intent-classification ner named-entity-recognition slot-filling intent-parser information-extraction snips natural-language-processingMycroft is an Artificial intelligence for everyone. It uses open software to process natural language, determine your intent and take action. It can integrate a host of professional functions – Control scenes to conserve power, grant office access with your voice. It can control all of your media and devices with the sound of your voice. Adjust your thermostat, turn on your lights, water your lawn, play your favorite movie and lot more.
ai nlp voice-assistant raspberry-pi natural-language-processing natural-language voice-as-an-interface artificial-intelligence personal-assistant speech-recognitionLudwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code. All you need to provide is a CSV file containing your data, a list of columns to use as inputs, and a list of columns to use as outputs, Ludwig will do the rest. Simple commands can be used to train models both locally and in a distributed way, and to use them to predict on new data.
deep-learning deeplearning deep-neural-networks deep learning machine-learning machinelearning machine natural-language-processing natural-language-understanding natural-language natural-language-generation computer-vision python3
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.