Non-Metric Space Library (NMSLIB) - An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

  •        927

Non-Metric Space Library (NMSLIB) is an efficient cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The core-library does not have any third-party dependencies. It has been gaining popularity recently. In particular, it has become a part of Amazon Elasticsearch Service.

The goal of the project is to create an effective and comprehensive toolkit for searching in generic and non-metric spaces. Even though the library contains a variety of metric-space access methods, our main focus is on generic and approximate search methods, in particular, on methods for non-metric spaces. NMSLIB is possibly the first library with a principled support for non-metric space searching.

https://github.com/nmslib/nmslib

Tags
Implementation
License
Platform

   




Related Projects

Qdrant - Neural Search Engine, Vector Similarity Search Engine with extended filtering support

  •    Rust

Qdrant ( quadrant ) is a vector similarity search engine. It provides a production-ready service with a convenient API to store, search, and manage points - vectors with an additional payload. Qdrant is tailored to extended filtering support. It makes it useful for all sorts of neural-network or semantic-based matching, faceted search, and other applications. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more.

SwiftGraph - A Graph Data Structure in Pure Swift

  •    Swift

SwiftGraph is a pure Swift (no Cocoa) implementation of a graph data structure, appropriate for use on all platforms Swift supports (iOS, macOS, Linux, etc.). It includes support for weighted, unweighted, directed, and undirected graphs. It uses generics to abstract away both the type of the vertices, and the type of the weights. It includes copious in-source documentation, unit tests, as well as search functions for doing things like breadth-first search, depth-first search, and Dijkstra's algorithm. Further, it includes utility functions for topological sort, Jarnik's algorithm to find a minimum-spanning tree, detecting a DAG (directed-acyclic-graph), and enumerating all cycles.

datasketch - MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++

  •    Python

datasketch gives you probabilistic data structures that can process and search very large amount of data super fast, with little loss of accuracy. datasketch must be used with Python 2.7 or above and NumPy 1.11 or above. Scipy is optional, but with it the LSH initialization can be much faster.

kmcuda - Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA

  •    Jupyter

K-means implementation is based on "Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup". While it introduces some overhead and many conditional clauses which are bad for CUDA, it still shows 1.6-2x speedup against the Lloyd algorithm. K-nearest neighbors employ the same triangle inequality idea and require precalculated centroids and cluster assignments, similar to the flattened ball tree. Technically, this project is a shared library which exports two functions defined in kmcuda.h: kmeans_cuda and knn_cuda. It has built-in Python3 and R native extension support, so you can from libKMCUDA import kmeans_cuda or dyn.load("libKMCUDA.so").

annoy - Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

  •    C++

Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large read-only file-based data structures that are mmapped into memory so that many processes may share the same data.To install, simply do sudo pip install annoy to pull down the latest version from PyPI.


SetSimilaritySearch - All-pair set similarity search on millions of sets in Python and on a laptop (faster than MinHash LSH)

  •    Python

Efficient set similarity search algorithms in Python. For even better performance see the Go Implementation. A popular way to measure the similarity between two sets is Jaccard similarity, which gives a fractional score between 0 and 1.0.

NearPy - Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive hashes

  •    Python

NearPy is a Python framework for fast (approximated) nearest neighbour search in high dimensional vector spaces using different locality-sensitive hashing methods. It allows to experiment and to evaluate new methods but is also production-ready. It comes with a redis storage adapter.

n2 - TOROS N2 - lightweight approximate Nearest Neighbor library which runs faster even with large datasets

  •    C++

For more detail, see the installation for instruction on how to build N2 from source. N2 is an approximate nearest neighborhoods algorithm library written in C++ (including Python/Go bindings). N2 provides a much faster search speed than other implementations when modeling large dataset. Also, N2 supports multi-core CPUs for index building.

similarity - TensorFlow Similarity is a python package focused on making similarity learning quick and easy

  •    Python

TensorFlow Similarity is a TensorFlow library for similarity learning also known as metric learning and contrastive learning. TensorFlow Similarity is still in beta.

GraphDash - A web-based dashboard built on graphs and their metadata.

  •    Python

You may then start the graph dashboard. You will get a nice web interface displaying your graphs, and a search box with autocompletion. You can easily navigate and share your graphs. For user-space installation, make sure your $PATH includes ~/.local/bin.

Milvus - An open-source vector database for embedding similarity search and AI applications

  •    Go

Milvus is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment. Milvus 2.0 is a cloud-native vector database with storage and computation separated by design. All components in this refactored version of Milvus are stateless to enhance elasticity and flexibility.

translate - Translate - a PyTorch Language Library

  •    Python

Translate is a library for machine translation written in PyTorch. It provides training for sequence-to-sequence models. Translate relies on fairseq, a general sequence-to-sequence library, which means that models implemented in both Translate and Fairseq can be trained. Translate also provides the ability to export some models to Caffe2 graphs via ONNX and to load and run these models from C++ for production purposes. Currently, we export components (encoder, decoder) to Caffe2 separately and beam search is implemented in C++. In the near future, we will be able to export the beam search as well. We also plan to add export support to more models. Provided you have CUDA installed you should be good to go.

Xapian - Search Engine Library

  •    C++

Xapian is an Open Source Search Engine Library. It is written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C# and Ruby. Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.

Faiss - A library for efficient similarity search and clustering of dense vectors.

  •    C++

Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy. Some of the most useful algorithms are implemented on the GPU. It is developed by Facebook AI Research.

bootcamp - Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc

  •    Python

Embed everything, thanks to AI, we can use neural networks to extract feature vectors from unstructured data, such as image, audio and vide etc. Then analyse the unstructured data by calculating the feature vectors, for example calculating the Euclidean or Cosine distance of the vectors to get the similarity. Milvus Bootcamp is designed to expose users to both the simplicity and depth of the Milvus vector database. Discover how to run benchmark tests as well as build similarity search applications like chatbots, recommender systems, reverse image search, molecular search, video search, audio search, and more.

Java ADT and Algorithm Library

  •    Java

This project provides a library of standard data types (lists, trees, graphs, semaphores, locks, points, vectors, matrices, shapes, etc.) and standard algorithms (sorting, depth first search, shortest path problem etc.)

Sphinix - Search server

  •    C++

Sphinix is free open-source SQL full-text search engine. How do you implement full-text search for that 10+ million row table, keep up with the load, and stay relevant? Sphinx is good at those kinds of riddles.

Lucene - A high-performance, full-featured text search engine library

  •    Java

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

LSHKIT

  •    C++

This library implements several locality sensitive hashing(LSH) based algorithms, including indexing data structure for high dimensional spaces and metric spaces, sketch constructions and set embedding algorithms.

OpenSearch - Open source distributed and RESTful search engine

  •    Java

OpenSearch is a community-driven, open source search and analytics suite derived from Apache 2.0 licensed Elasticsearch 7.10.2 & Kibana 7.10.2. It consists of a search engine daemon, OpenSearch, and a visualization and user interface, OpenSearch Dashboards. OpenSearch enables people to easily ingest, secure, search, aggregate, view, and analyze data. These capabilities are popular for use cases such as application search, log analytics, and more.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.