- 229

Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large read-only file-based data structures that are mmapped into memory so that many processes may share the same data.To install, simply do sudo pip install annoy to pull down the latest version from PyPI.

https://github.com/spotify/annoyTags | c-plus-plus nearest-neighbor-search locality-sensitive-hashing approximate-nearest-neighbor-search |

Implementation | C++ |

License | Apache |

Platform |

This is Python training and testing code for Locally Optimized Product Quantization (LOPQ) models, as well as Spark scripts to scale training to hundreds of millions of vectors. The resulting model can be used in Python with code provided here or deployed via a Protobuf format to, e.g., search backends for high performance approximate nearest neighbor search.Locally Optimized Product Quantization (LOPQ) [1] is a hierarchical quantization algorithm that produces codes of configurable length for data points. These codes are efficient representations of the original vector and can be used in a variety of ways depending on the application, including as hashes that preserve locality, as a compressed vector from which an approximate vector in the data space can be reconstructed, and as a representation from which to compute an approximation of the Euclidean distance between points.

nearest-neighbor-search product-quantization lopq clustering sparkFor more detail, see the installation for instruction on how to build N2 from source. N2 is an approximate nearest neighborhoods algorithm library written in C++ (including Python/Go bindings). N2 provides a much faster search speed than other implementations when modeling large dataset. Also, N2 supports multi-core CPUs for index building.

ml knn machine-learning approximate k-nearest-neighbors nearest-neighbor-search approximate-nearest-neighbor-searchApproximate Nearest Neighbor Search for Sparse Data in Python! This library is well suited to finding nearest neighbors in sparse, high dimensional spaces (like text documents). Out of the box, PySparNN supports Cosine Distance (i.e. 1 - cosine_similarity).

Doing fast searching of nearest neighbors in high dimensional spaces is an increasingly important problem, but so far there has not been a lot of empirical attempts at comparing approaches in an objective way. This project contains some tools to benchmark various implementations of approximate nearest neighbor (ANN) search for different metrics. We have pregenerated datasets (in HDF5) formats and we also have Docker containers for each algorithm. There's a test suite that makes sure every algorithm works.

nearest-neighbors benchmark dockernanoflann is a C++11 header-only library for building KD-Trees of datasets with different topologies: R2, R3 (point clouds), SO(2) and SO(3) (2D and 3D rotation groups). No support for approximate NN is provided. nanoflann does not require compiling or installing. You just need to #include <nanoflann.hpp> in your code. This library is a fork of the flann library by Marius Muja and David G. Lowe, and born as a child project of MRPT. Following the original license terms, nanoflann is distributed under the BSD license. Please, for bugs use the issues button or fork and open a pull request.

c-plus-plus kd-trees point-clouds cpp nanoflannNearPy is a Python framework for fast (approximated) nearest neighbour search in high dimensional vector spaces using different locality-sensitive hashing methods. It allows to experiment and to evaluate new methods but is also production-ready. It comes with a redis storage adapter.

mlpack is an intuitive, fast, and flexible C++ machine learning library with bindings to other languages. It is meant to be a machine learning analog to LAPACK, and aims to implement a wide array of machine learning methods and functions as a "swiss army knife" for machine learning researchers. In addition to its powerful C++ interface, mlpack also provides command-line programs and Python bindings.

machine-learning-library c-plus-plus deep-learning nearest-neighbor-search regression machine-learningNon-Metric Space Library (NMSLIB) is an efficient cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The core-library does not have any third-party dependencies. It has been gaining popularity recently. In particular, it has become a part of Amazon Elasticsearch Service.

search search-library similarity-search algorithm knn-search non-metric neighborhood-graphs k-nn-graphs proximity-graphs lsh locality-sensitive-hashingOpen Distro for Elasticsearch is an Apache 2.0-licensed distribution of Elasticsearch enhanced with Enterprise Security, Alerting, SQL, Index Management, k-Nearest Neighbor Search, Performance Analyzer and more.

elastic-search search-engine searchengine aggregations real-time enterprise-searchWe include two methods, one supervised that uses a bilingual dictionary or identical character strings, and one unsupervised that does not use any parallel data (see Word Translation without Parallel Data for more details). MUSE is available on CPU or GPU, in Python 2 or 3. Faiss is optional for GPU users - though Faiss-GPU will greatly speed up nearest neighbor search - and highly recommended for CPU users. Faiss can be installed using "conda install faiss-cpu -c pytorch" or "conda install faiss-gpu -c pytorch".

Jubatus is a distributed processing framework and streaming machine learning library. Jubatus includes these functionalities: Online Machine Learning Library: Classification, Regression, Recommendation (Nearest Neighbor Search), Graph Mining, Anomaly Detection, Clustering, Feature Vector Converter (fv_converter): Data Preprocess and Feature Extraction, Framework for Distributed Online Machine Learning with Fault Tolerance.

machine-learning machine-learning-framework distributedWhen it comes to building a classification algorithm, analysts have a broad range of open source options to choose from. However, for time series classification, there are less out-of-the box solutions. I began researching the domain of time series classification and was intrigued by a recommended technique called K Nearest Neighbors and Dynamic Time Warping. A meta analysis completed by Mitsa (2010) suggests that when it comes to timeseries classification, 1 Nearest Neighbor (K=1) and Dynamic Timewarping is very difficult to beat [1].

machine-learning timeseries classification-algorithm human-activity-recognition nearest-neighbors dynamic-programming dynamic-time-warpingA library for fast computation of Gauss transforms in multiple dimensions, using the Improved Fast Gauss Transform and Approximate Nearest Neighbor searching. This library is useful for efficient Kernel Density Estimation (KDE) using a Gaussian kernel.

Smile (Statistical Machine Intelligence and Learning Engine) is a fast and comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system in Java and Scala. With advanced data structures and algorithms, Smile delivers state-of-art performance.Smile covers every aspect of machine learning, including classification, regression, clustering, association rule mining, feature selection, manifold learning, multidimensional scaling, genetic algorithms, missing value imputation, efficient nearest neighbor search, etc.

machine-learning nlp linear-algebra natural-language-processingThis repository contains implementations of basic machine learning algorithms in plain Python (Python Version 3.6+). All algorithms are implemented from scratch without using additional machine learning libraries. The intention of these notebooks is to provide a basic understanding of the algorithms and their underlying structure, not to provide the most efficient implementations. After several requests I started preparing notebooks on how to preprocess datasets for machine learning. Within the next months I will add one notebook for each kind of dataset (text, images, ...). As before, the intention of these notebooks is to provide a basic understanding of the preprocessing steps, not to provide the most efficient implementations.

machine-learning logistic-regression ipynb machine-learning-algorithms linear-regression perceptron python-implementations kmeans algorithm python3 neural-network k-nearest-neighbours k-nearest-neighbor k-nn neural-networksHLearn is a high performance machine learning library written in Haskell. For example, it currently has the fastest nearest neighbor implementation for arbitrary metric spaces (see this blog post). HLearn is also a research project. The research goal is to discover the "best possible" interface for machine learning. This involves two competing demands: The library should be as fast as low-level libraries written in C/C++/Fortran/Assembly; but it should be as flexible as libraries written in high level languages like Python/R/Matlab. Julia is making amazing progress in this direction, but HLearn is more ambitious. In particular, HLearn's goal is to be faster than the low level languages and more flexible than the high level languages.

The Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, C++ machine learning library designed for real-time gesture recognition. Classification: Adaboost, Decision Tree, Dynamic Time Warping, Gaussian Mixture Models, Hidden Markov Models, k-nearest neighbor, Naive Bayes, Random Forests, Support Vector Machine, Softmax, and more...

gesture-recognition grt machine-learning gesture-recognition-toolkit support-vector-machine random-forest kmeans dynamic-time-warping softmax linear-regressionPropagation Engine is a C# .NET propagation framework which is designed to simulate various types of entities in a linked nearest-neighbor node-edge network environment using a specified set (or series) of rules and entities. Propagation Engine is CLS-Complaint.

rules-engineNPatternRecognizer is a fast machine learning algorithm library written in C#. It contains support vector machine, neural networks, bayes, boost, k-nearest neighbor, decision tree, ..., etc.

#LargeVis This is the official implementation of the LargeVis model by the original authors, which is used to visualize large-scale and high-dimensional data (Tang, Liu, Zhang and Mei). It now supports visualizing both high-dimensional feature vectors and networks. The package also contains a very efficient algorithm for constructing K-nearest neighbor graph (K-NNG). Contact person: Jian Tang, tangjianpku@gmail.com. This work is done when the author is in Microsoft Research Asia.

We have large collection of open source products. Follow the tags from
Tag Cloud >>

Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
**Add Projects.**