n2 - TOROS N2 - lightweight approximate Nearest Neighbor library which runs faster even with large datasets

  •    C++

For more detail, see the installation for instruction on how to build N2 from source. N2 is an approximate nearest neighborhoods algorithm library written in C++ (including Python/Go bindings). N2 provides a much faster search speed than other implementations when modeling large dataset. Also, N2 supports multi-core CPUs for index building.

GloVe-experiments - GloVe word vector embedding experiments (similar to Word2Vec)

  •    Python

This repository contains a few brief experiments with Stanford NLP's GloVe, an unsupervised learning algorithm for obtaining vector representations for words. Similar to Word2Vec, GloVe creates a continuous N-dimensional representation of a word that is learned from its surrounding context words in a training corpus. Trained on a large corpus of text, these co-occurance statistics (an N-dimensional vector embedding) cause semantically similar words to appear near each-other in their resulting N-dimensional embedding space (e.g. "dog" and "cat" may appear nearby a region of other pet related words in the embedding space because the context words that surround both "dog" and "cat" in the training corpus are similar). All three scripts use the GloVe.6B pre-trained word embeddings created from the combined Wikipedia 2014 and Gigaword 5 datasets. They were trained using 6 billion tokens and contains 400,000 unique lowercase words. Trained embeddings are provided in 50, 100, 200, and 300 dimensions (822 MB download).


  •    CSharp

.Net library for fast approximate nearest neighbours search. Exact k nearest neighbours search algorithms tend to perform poorly in high-dimensional spaces. To overcome curse of dimensionality the ANN algorithms come in place. This library implements one of such algorithms described in the "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs" article. It provides simple API for building nearest neighbours graphs, (de)serializing them and running k-NN search queries.

