natural - general natural language facilities for node

  •        433

"Natural" is a general natural language facility for nodejs. Tokenizing, stemming, classification, phonetics, tf-idf, WordNet, string similarity, and some inflections are currently supported.



Related Projects

PHP classes for NLP

  •    PHP

A set of classes for Natural Language Processing in PHP for: 1. Part of speech Tagging - Brill, n-gram, HMM 2. Princeton Wordnet querying and access 3. Document summarization 4. Document classification - EM, Bayes 5. Stemming - Porter, Lancaster

jellyfish - 🎐 a python library for doing approximate and phonetic matching of strings.

  •    Python

Jellyfish is a python library for doing approximate and phonetic matching of strings. Written by James Turk <> and Michael Stephens.

spark-nlp - Natural Language Understanding Library for Apache Spark.

  •    Jupyter

John Snow Labs Spark-NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment. This library has been uploaded to the spark-packages repository .

pattern - Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization

  •    Python

It is well documented, thoroughly tested with 350+ unit tests and comes bundled with 50+ examples. The source code is licensed under BSD and available from This example trains a classifier on adjectives mined from Twitter using Python 3. First, tweets that contain hashtag #win or #fail are collected. For example: "$20 tip off a sweet little old lady today #win". The word part-of-speech tags are then parsed, keeping only adjectives. Each tweet is transformed to a vector, a dictionary of adjective → count items, labeled WIN or FAIL. The classifier uses the vectors to learn which other tweets look more like WIN or more like FAIL.

moviebox - Machine learning movie recommending system

  •    Python

Moviebox is a content based machine learning recommending system build with the powers of tf-idf and cosine similarities. Initially, a natural number, that corresponds to the ID of a unique movie title, is accepted as input from the user. Through tf-idf the plot summaries of 5000 different movies that reside in the dataset, are analyzed and vectorized. Next, a number of movies is chosen as recommendations based on their cosine similarity with the vectorized input movie. Specifically, the cosine value of the angle between any two non-zero vectors, resulting from their inner product, is used as the primary measure of similarity. Thus, only movies whose story and meaning are as close as possible to the initial one, are displayed to the user as recommendations.

OpenNLP - Machine learning based toolkit for the processing of natural language text

  •    Java

The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.

MLphone - Phonetic algorithm for indexing Malayalam words by their pronounciation, like Metaphone for English

  •    PHP

MLphone is a phonetic algorithm for indexing Malayalam words by their pronunciation, like Metaphone for English. The algorithm generates three Romanized phonetic keys (hashes) of varying phonetic affinities for a given Malayalam word. The algorithm takes into account the context sensitivity of sounds, syntactic and phonetic gemination, compounding, modifiers, and other known exceptions to produce Romanized phonetic hashes of increasing phonetic affinity that are very faithful to the pronunciation of the original Malayalam word.

limdu - Machine-learning for Node.js

  •    Javascript

Limdu is a machine-learning framework for Node.js. It supports multi-label classification, online learning, and real-time classification. Therefore, it is especially suited for natural language understanding in dialog systems and chat-bots.Limdu is in an "alpha" state - some parts are working (see this readme), but some parts are missing or not tested. Contributions are welcome.

levi - Stream based full-text search for Node.js and browsers. Built on LevelDB.

  •    Javascript

Stream based full-text search for Node.js and browsers. Using LevelDB as storage backend. Full-text search using TF-IDF and cosine similarity plus query-time field boost options. Provided with configurable text processing pipeline: Tokenizer, Porter Stemmer and Stopwords filter.

walrus - Lightweight Python utilities for working with Redis

  •    Python

Lightweight Python utilities for working with Redis. The purpose of walrus is to make working with Redis in Python a little easier by wrapping rich objects in Pythonic containers. It consist of wrappers for the Redis object types like Hash, List, Set, Sorted Set, HyperLogLog, Array.

rumale - Rumale is a machine learning library in Ruby

  •    Ruby

Rumale (Ruby machine learning) is a machine learning library in Ruby. Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python. Rumale supports Linear / Kernel Support Vector Machine, Logistic Regression, Linear Regression, Ridge, Lasso, Kernel Ridge, Factorization Machine, Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor classifier, K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, SNN, Power Iteration Clustering, Mutidimensional Scaling, t-SNE, Principal Component Analysis, Kernel PCA and Non-negative Matrix Factorization. This project was formerly known as "SVMKit". If you are using SVMKit, please install Rumale and replace SVMKit constants with Rumale.

wordpos - Part-of-speech utilities for node.js based on the WordNet database.

  •    Javascript

wordpos is a set of fast part-of-speech (POS) utilities for Node.js using fast lookup in the WordNet database. Version 1.x is a major update with no direct dependence on natural's WordNet module, with support for Promises, and roughly 5x speed improvement over previous version.

WikiSQL - A large annotated semantic parsing corpus for developing natural language interfaces.

  •    HTML

A large crowd-sourced dataset for developing natural language interfaces for relational databases. WikiSQL is the dataset released along with our work Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.