Non-Metric Space Library (NMSLIB) is an efficient cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The core-library does not have any third-party dependencies. It has been gaining popularity recently. In particular, it has become a part of Amazon Elasticsearch Service.
The goal of the project is to create an effective and comprehensive toolkit for searching in generic and non-metric spaces. Even though the library contains a variety of metric-space access methods, our main focus is on generic and approximate search methods, in particular, on methods for non-metric spaces. NMSLIB is possibly the first library with a principled support for non-metric space searching.
|Tags||search search-library similarity-search algorithm knn-search non-metric neighborhood-graphs k-nn-graphs proximity-graphs lsh locality-sensitive-hashing|
Qdrant ( quadrant ) is a vector similarity search engine. It provides a production-ready service with a convenient API to store, search, and manage points - vectors with an additional payload. Qdrant is tailored to extended filtering support. It makes it useful for all sorts of neural-network or semantic-based matching, faceted search, and other applications. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more.search-engine elasticsearch neural-network matching filter saas nearest-neighbor-search image-search recommender-system vectors approximate-nearest-neighbor-search knn-algorithm hnsw vector-search vector-search-engine embeddings-similarity semantic-search
SwiftGraph is a pure Swift (no Cocoa) implementation of a graph data structure, appropriate for use on all platforms Swift supports (iOS, macOS, Linux, etc.). It includes support for weighted, unweighted, directed, and undirected graphs. It uses generics to abstract away both the type of the vertices, and the type of the weights. It includes copious in-source documentation, unit tests, as well as search functions for doing things like breadth-first search, depth-first search, and Dijkstra's algorithm. Further, it includes utility functions for topological sort, Jarnik's algorithm to find a minimum-spanning tree, detecting a DAG (directed-acyclic-graph), and enumerating all cycles.graph data-structure graph-algorithms dijkstra-algorithm topological-sort breadth-first-search depth-first-search prims-algorithm
datasketch gives you probabilistic data structures that can process and search very large amount of data super fast, with little loss of accuracy. datasketch must be used with Python 2.7 or above and NumPy 1.11 or above. Scipy is optional, but with it the LSH initialization can be much faster.bbit-minhash lsh-forest jaccard-similarity hyperloglog lsh minhash weighted-quantiles top-k search data-sketches data-summary
K-means implementation is based on "Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup". While it introduces some overhead and many conditional clauses which are bad for CUDA, it still shows 1.6-2x speedup against the Lloyd algorithm. K-nearest neighbors employ the same triangle inequality idea and require precalculated centroids and cluster assignments, similar to the flattened ball tree. Technically, this project is a shared library which exports two functions defined in kmcuda.h: kmeans_cuda and knn_cuda. It has built-in Python3 and R native extension support, so you can from libKMCUDA import kmeans_cuda or dyn.load("libKMCUDA.so").cuda kmeans yinyang knn-search machine-learning afk-mc2
Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large read-only file-based data structures that are mmapped into memory so that many processes may share the same data.To install, simply do sudo pip install annoy to pull down the latest version from PyPI.c-plus-plus nearest-neighbor-search locality-sensitive-hashing approximate-nearest-neighbor-search
Efficient set similarity search algorithms in Python. For even better performance see the Go Implementation. A popular way to measure the similarity between two sets is Jaccard similarity, which gives a fractional score between 0 and 1.0.similarity-search set-similarity-search all-pairs
NearPy is a Python framework for fast (approximated) nearest neighbour search in high dimensional vector spaces using different locality-sensitive hashing methods. It allows to experiment and to evaluate new methods but is also production-ready. It comes with a redis storage adapter.
For more detail, see the installation for instruction on how to build N2 from source. N2 is an approximate nearest neighborhoods algorithm library written in C++ (including Python/Go bindings). N2 provides a much faster search speed than other implementations when modeling large dataset. Also, N2 supports multi-core CPUs for index building.ml knn machine-learning approximate k-nearest-neighbors nearest-neighbor-search approximate-nearest-neighbor-search
TensorFlow Similarity is a TensorFlow library for similarity learning also known as metric learning and contrastive learning. TensorFlow Similarity is still in beta.deep-learning tensorflow nearest-neighbor-search metric-learning nearest-neighbors similarity-search similarity-learning contrastive-learning
You may then start the graph dashboard. You will get a nice web interface displaying your graphs, and a search box with autocompletion. You can easily navigate and share your graphs. For user-space installation, make sure your $PATH includes ~/.local/bin.graphs dashboard web
Milvus is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment. Milvus 2.0 is a cloud-native vector database with storage and computation separated by design. All components in this refactored version of Milvus are stateless to enhance elasticity and flexibility.database ai vector nearest-neighbor-search cloud-native image-search approximate-nearest-neighbor-search hacktoberfest embedding similarity-search video-search faiss anns hnsw vector-search milvus vector-database embeddings-similarity artificial-intelligence
Translate is a library for machine translation written in PyTorch. It provides training for sequence-to-sequence models. Translate relies on fairseq, a general sequence-to-sequence library, which means that models implemented in both Translate and Fairseq can be trained. Translate also provides the ability to export some models to Caffe2 graphs via ONNX and to load and run these models from C++ for production purposes. Currently, we export components (encoder, decoder) to Caffe2 separately and beam search is implemented in C++. In the near future, we will be able to export the beam search as well. We also plan to add export support to more models. Provided you have CUDA installed you should be good to go.artificial-intelligence machine-learning onnx pytorch
Xapian is an Open Source Search Engine Library. It is written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C# and Ruby. Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.searchengine search-engine full-text-search lucene-alternative
Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy. Some of the most useful algorithms are implemented on the GPU. It is developed by Facebook AI Research.clustering similarity-search artificial-intelligence gpu
Embed everything, thanks to AI, we can use neural networks to extract feature vectors from unstructured data, such as image, audio and vide etc. Then analyse the unstructured data by calculating the feature vectors, for example calculating the Euclidean or Cosine distance of the vectors to get the similarity. Milvus Bootcamp is designed to expose users to both the simplicity and depth of the Milvus vector database. Discover how to run benchmark tests as well as build similarity search applications like chatbots, recommender systems, reverse image search, molecular search, video search, audio search, and more.nlp deep-learning question-answering image-classification image-recognition image-search hacktoberfest unstructured-data audio-search milvus benchmark-testing
This project provides a library of standard data types (lists, trees, graphs, semaphores, locks, points, vectors, matrices, shapes, etc.) and standard algorithms (sorting, depth first search, shortest path problem etc.)
Sphinix is free open-source SQL full-text search engine. How do you implement full-text search for that 10+ million row table, keep up with the load, and stay relevant? Sphinx is good at those kinds of riddles.searchengine search-engine full-text-search standalone
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.search-engine searchengine full-text-search
This library implements several locality sensitive hashing(LSH) based algorithms, including indexing data structure for high dimensional spaces and metric spaces, sketch constructions and set embedding algorithms.
OpenSearch is a community-driven, open source search and analytics suite derived from Apache 2.0 licensed Elasticsearch 7.10.2 & Kibana 7.10.2. It consists of a search engine daemon, OpenSearch, and a visualization and user interface, OpenSearch Dashboards. OpenSearch enables people to easily ingest, secure, search, aggregate, view, and analyze data. These capabilities are popular for use cases such as application search, log analytics, and more.search-engine searchengine full-text-search realtime-analytics analytics log-aggregation aggregation clickstream-analytics
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.