Displaying 1 to 18 from 18 results

annoy - Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

  •    C++

Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large read-only file-based data structures that are mmapped into memory so that many processes may share the same data.To install, simply do sudo pip install annoy to pull down the latest version from PyPI.

mlpack - A scalable C++ machine learning library

  •    C++

mlpack is an intuitive, fast, and flexible C++ machine learning library with bindings to other languages. It is meant to be a machine learning analog to LAPACK, and aims to implement a wide array of machine learning methods and functions as a "swiss army knife" for machine learning researchers. In addition to its powerful C++ interface, mlpack also provides command-line programs and Python bindings.

lopq - Training of Locally Optimized Product Quantization (LOPQ) models for approximate nearest neighbor search of high dimensional data in Python and Spark

  •    Python

This is Python training and testing code for Locally Optimized Product Quantization (LOPQ) models, as well as Spark scripts to scale training to hundreds of millions of vectors. The resulting model can be used in Python with code provided here or deployed via a Protobuf format to, e.g., search backends for high performance approximate nearest neighbor search.Locally Optimized Product Quantization (LOPQ) [1] is a hierarchical quantization algorithm that produces codes of configurable length for data points. These codes are efficient representations of the original vector and can be used in a variety of ways depending on the application, including as hashes that preserve locality, as a compressed vector from which an approximate vector in the data space can be reconstructed, and as a representation from which to compute an approximation of the Euclidean distance between points.

n2 - TOROS N2 - lightweight approximate Nearest Neighbor library which runs faster even with large datasets

  •    C++

For more detail, see the installation for instruction on how to build N2 from source. N2 is an approximate nearest neighborhoods algorithm library written in C++ (including Python/Go bindings). N2 provides a much faster search speed than other implementations when modeling large dataset. Also, N2 supports multi-core CPUs for index building.




spark-knn-graphs - Spark algorithms for building k-nn graphs

  •    HTML

Spark algorithms for building and processing k-nn graphs.All algorithms support custom classes as value. See an example with custom class as value.

scanns - A scalable nearest neighbor search library in Apache Spark

  •    Scala

ScANNS is a nearest neighbor search library for Apache Spark originally developed by Namit Katariya from the LinkedIn Machine Learning Algorithms team. It enables nearest neighbor search in a batch offline context within the cosine, jaccard and euclidean distance spaces. This library has been tested to scale to hundreds of millions to low billions of data points.

TarsosLSH - A Java library implementing practical nearest neighbour search algorithm for multidimensional vectors that operates in sublinear time

  •    Java

TarsosLSH is a Java library implementing sub-linear nearest neigbour search algorithms. It contains both an approximate and an exact search algorithm. The first, Locality-sensitive Hashing (LSH) is a randomized approximate search algorithm for a number of search spaces. The second, Multi-index hashing is an exact nearest neigbour search algorithm which is limited to Hamming space. Locality-sensitive Hashing (LSH), a practical nearest neighbour search algorithm for multidimensional vectors that operates in sublinear time. It supports several Locality Sensitive Hashing (LSH) families: the Euclidean hash family (L2), city block hash family (L1) and cosine hash family. The library tries to hit the sweet spot between being capable enough to get real tasks done, and compact enough to serve as a demonstration on how LSH works.


soundfingerprinting - The project aims studying the audio signal in terms of its perceptual characteristics, resulting in an algorithm that will be able to detect (map) unknown audio snippets from a large database of known songs

  •    CSharp

soundfingerprinting is a C# framework designed for developers, enthusiasts, researchers in the fields of audio and digital signal processing, data mining and audio recognition. It implements an efficient algorithm which provides fast insert and retrieval of acoustic fingerprints with high precision and recall rate. Below code snippet shows how to extract acoustic fingerprints from an audio file and later use them as identifiers to recognize unknown audio query. These sub-fingerprints (or fingerprints, 2 terms are used interchangeably) will be stored in a configurable backend. The interfaces for fingerprinting and querying audio files are implemented as Fluent Interfaces.

lshensemble - LSH index for approximate set containment search

  •    Go

Presentation slides @ VLDB 2016, New Delhi. We used two datasets for evaluation. The datasets are all from public domains and can be downloaded directly from the original publisher.

pynndescent - A Python nearest neighbor descent for approximate nearest neighbors

  •    Python

Dong, Wei, Charikar Moses, and Kai Li. "Efficient k-nearest neighbor graph construction for generic similarity measures." Proceedings of the 20th international conference on World wide web. ACM, 2011. This library supplements that approach with the use of random projection trees for initialisation. This can be particularly useful for the metrics that are amenable to such approaches (euclidean, minkowski, angular, cosine, etc.).

gann - gann(go-approximate-nearest-neighbor) is a library for Approximate Nearest Neighbor Search written in Go

  •    Go

gann (go-approximate-nearest-neighbor) is a library for approximate nearest neighbor search purely written in golang. The implemented algorithm is truly inspired by Annoy (https://github.com/spotify/annoy).

knn-matting - Source Code for KNN Matting, CVPR 2012 / TPAMI 2013

  •    Matlab

run "bash install.sh" to download all the required libraries and data. It would take several minutes to tens of minutes, depending on the network connection. We have been running our codes since Matlab R2011b. The latest version of code is tested on Matlab R2015a. Please let us know if you run into problem.

bisec-tree - Bisector tree implementation in OCaml

  •    OCaml

Bisector tree implementation in OCaml. A bisector tree allows to do fast and exact nearest neighbor searches in any space provided that you have a metric (function) to measure the distance between any two points in that space.

vp-tree - Vantage point tree implementation in OCaml

  •    OCaml

A vantage point tree implementation in OCaml. Cf. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.41.4193&rep=rep1&type=pdf for details.

go-ann - Approximate Nearest Neighbor using the MRPT algorithm

  •    Go

Pure Go implementation of Approximate k-Nearest-Neighbor search. Wrap an ANNer with a MappedANNer to associate values with vectors.