Displaying 1 to 8 from 8 results

natural - general natural language facilities for node


"Natural" is a general natural language facility for nodejs. Tokenizing, stemming, classification, phonetics, tf-idf, WordNet, string similarity, and some inflections are currently supported.It's still in the early stages, so we're very interested in bug reports, contributions and the like.

nlp - Selected Machine Learning algorithms for basic natural language processing in Golang


An implementation of selected machine learning algorithms for basic natural language processing in golang. The initial focus for this project is Latent Semantic Analysis to allow retrieval/searching, clustering and classification of text documents based upon semantic content.Built upon the gonum/gonum matrix library with some inspiration taken from Python's scikit-learn.

moviebox - 🎥 Machine learning movie recommender


Moviebox is a content based machine learning recommending system build with the powers of tf-idf and cosine similarities.Initially, a natural number, that corresponds to the ID of a unique movie title, is accepted as input from the user. Through tf-idf the plot summaries of 5000 different movies that reside in the dataset, are analyzed and vectorized. Next, a number of movies is chosen as recommendations based on their cosine similarity with the vectorized input movie. Specifically, the cosine value of the angle between any two non-zero vectors, resulting from their inner product, is used as the primary measure of similarity. Thus, only movies whose story and meaning are as close as possible to the initial one, are displayed to the user as recommendations.




python-tf-idf - An extremely simple Python library to perform TF-IDF document comparison.


The simplest TF-IDF library imaginable. Add your documents as two-element lists [doc_name, [list_of_words_in_the_document]] with addDocument(doc_name, list_of_words).

DocumentFeatureSelection - A set of metrics for feature selection from text data


The feature selection is also useful when you observe your text data. With the feature selection, you can get to know which features really contribute to specific labels. Please visit project page on github.

pke - Python Keyphrase Extraction module


pke is an open source python-based keyphrase extraction toolkit. It provides an end-to-end keyphrase extraction pipeline in which each component can be easily modified or extented to develop new approaches. pke also allows for easy benchmarking of state-of-the-art keyphrase extraction approaches, and ships with supervised models trained on the SemEval-2010 dataset. pke works only for Python 2.x at the moment.

clusterix - Visual exploration of clustered data.


This command will run Clusterix on http://127.0.0.1:5000 where you will be able to use the interface to upload data files, and select the algorithms/options that you want.