kdtree-rs - K-dimensional tree in Rust for fast geospatial indexing and lookup

  •        127

Thanks Eh2406 for various fixes and perf improvements. at your option.




Related Projects

n2 - TOROS N2 - lightweight approximate Nearest Neighbor library which runs faster even with large datasets

  •    C++

For more detail, see the installation for instruction on how to build N2 from source. N2 is an approximate nearest neighborhoods algorithm library written in C++ (including Python/Go bindings). N2 provides a much faster search speed than other implementations when modeling large dataset. Also, N2 supports multi-core CPUs for index building.

lopq - Training of Locally Optimized Product Quantization (LOPQ) models for approximate nearest neighbor search of high dimensional data in Python and Spark

  •    Python

This is Python training and testing code for Locally Optimized Product Quantization (LOPQ) models, as well as Spark scripts to scale training to hundreds of millions of vectors. The resulting model can be used in Python with code provided here or deployed via a Protobuf format to, e.g., search backends for high performance approximate nearest neighbor search.Locally Optimized Product Quantization (LOPQ) [1] is a hierarchical quantization algorithm that produces codes of configurable length for data points. These codes are efficient representations of the original vector and can be used in a variety of ways depending on the application, including as hashes that preserve locality, as a compressed vector from which an approximate vector in the data space can be reconstructed, and as a representation from which to compute an approximation of the Euclidean distance between points.

Open Distro for Elasticsearch - Elasticsearch enhanced with enterprise security, alerting, SQL, and more

  •    Java

Open Distro for Elasticsearch is an Apache 2.0-licensed distribution of Elasticsearch enhanced with Enterprise Security, Alerting, SQL, Index Management, k-Nearest Neighbor Search, Performance Analyzer and more.

pysparnn - Approximate Nearest Neighbor Search for Sparse Data in Python!

  •    Python

Approximate Nearest Neighbor Search for Sparse Data in Python! This library is well suited to finding nearest neighbors in sparse, high dimensional spaces (like text documents). Out of the box, PySparNN supports Cosine Distance (i.e. 1 - cosine_similarity).

ann-benchmarks - Benchmarks of approximate nearest neighbor libraries in Python

  •    Python

Doing fast searching of nearest neighbors in high dimensional spaces is an increasingly important problem, but so far there has not been a lot of empirical attempts at comparing approaches in an objective way. This project contains some tools to benchmark various implementations of approximate nearest neighbor (ANN) search for different metrics. We have pregenerated datasets (in HDF5) formats and we also have Docker containers for each algorithm. There's a test suite that makes sure every algorithm works.

rbush - RBush — a high-performance JavaScript R-tree-based 2D spatial index for points and rectangles

  •    Javascript

RBush is a high-performance JavaScript library for 2D spatial indexing of points and rectangles. It's based on an optimized R-tree data structure with bulk insertion support. Spatial index is a special data structure for points and rectangles that allows you to perform queries like "all items within this bounding box" very efficiently (e.g. hundreds of times faster than looping over all items). It's most commonly used in maps and data visualizations.

Milvus - An open-source vector database for embedding similarity search and AI applications

  •    Go

Milvus is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment. Milvus 2.0 is a cloud-native vector database with storage and computation separated by design. All components in this refactored version of Milvus are stateless to enhance elasticity and flexibility.

flatbush - A very fast static spatial index for 2D points and rectangles in JavaScript

  •    Javascript

A really fast static spatial index for 2D points and rectangles in JavaScript. An efficient implementation of the packed Hilbert R-tree algorithm. Enables fast spatial queries on a very large number of objects (e.g. millions), which is very useful in maps, data visualizations and computational geometry algorithms.

rtreego - an R-Tree library for Go

  •    Go

A library for efficiently storing and querying spatial data in the Go programming language. The R-tree is a popular data structure for efficiently storing and querying spatial objects; one common use is implementing geospatial indexes in database management systems. Both bounding-box queries and k-nearest-neighbor queries are supported.

annoy - Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

  •    C++

Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large read-only file-based data structures that are mmapped into memory so that many processes may share the same data.To install, simply do sudo pip install annoy to pull down the latest version from PyPI.

Tile38 - Geospatial database, spatial index, and realtime geofence

  •    Go

Tile38 is a in-memory geolocation data store, spatial index, and realtime geofence. It supports a variety of object types including lat/lon points, bounding boxes, XYZ tiles, Geohashes, and GeoJSON. It supports spatial index with search methods such as Nearby, Within, and Intersects, Realtime geofencing through persistent sockets or webhooks and lot more.

Spatial Solr Plugin for Lucene and Solr

  •    Java

With the continuous efforts of adjusting search results to focused target audieces, there's an increasing demand for incorporating geographical location information into the standard search functionality. Spatial Solr Plugin (SSP) is a free, standalone plug-in which enables Geo / Location Based Search, and is built on top of the open source projects Apache Solr and Apache Lucene.


  •    C++

#LargeVis This is the official implementation of the LargeVis model by the original authors, which is used to visualize large-scale and high-dimensional data (Tang, Liu, Zhang and Mei). It now supports visualizing both high-dimensional feature vectors and networks. The package also contains a very efficient algorithm for constructing K-nearest neighbor graph (K-NNG). Contact person: Jian Tang, tangjianpku@gmail.com. This work is done when the author is in Microsoft Research Asia.

kmcuda - Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA

  •    Jupyter

K-means implementation is based on "Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup". While it introduces some overhead and many conditional clauses which are bad for CUDA, it still shows 1.6-2x speedup against the Lloyd algorithm. K-nearest neighbors employ the same triangle inequality idea and require precalculated centroids and cluster assignments, similar to the flattened ball tree. Technically, this project is a shared library which exports two functions defined in kmcuda.h: kmeans_cuda and knn_cuda. It has built-in Python3 and R native extension support, so you can from libKMCUDA import kmeans_cuda or dyn.load("libKMCUDA.so").

rumale - Rumale is a machine learning library in Ruby

  •    Ruby

Rumale (Ruby machine learning) is a machine learning library in Ruby. Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python. Rumale supports Linear / Kernel Support Vector Machine, Logistic Regression, Linear Regression, Ridge, Lasso, Kernel Ridge, Factorization Machine, Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor classifier, K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, SNN, Power Iteration Clustering, Mutidimensional Scaling, t-SNE, Principal Component Analysis, Kernel PCA and Non-negative Matrix Factorization. This project was formerly known as "SVMKit". If you are using SVMKit, please install Rumale and replace SVMKit constants with Rumale.

NearPy - Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive hashes

  •    Python

NearPy is a Python framework for fast (approximated) nearest neighbour search in high dimensional vector spaces using different locality-sensitive hashing methods. It allows to experiment and to evaluate new methods but is also production-ready. It comes with a redis storage adapter.

K-Nearest-Neighbors-with-Dynamic-Time-Warping - Python implementation of KNN and DTW classification algorithm

  •    Jupyter

When it comes to building a classification algorithm, analysts have a broad range of open source options to choose from. However, for time series classification, there are less out-of-the box solutions. I began researching the domain of time series classification and was intrigued by a recommended technique called K Nearest Neighbors and Dynamic Time Warping. A meta analysis completed by Mitsa (2010) suggests that when it comes to timeseries classification, 1 Nearest Neighbor (K=1) and Dynamic Timewarping is very difficult to beat [1].

similarity - TensorFlow Similarity is a python package focused on making similarity learning quick and easy

  •    Python

TensorFlow Similarity is a TensorFlow library for similarity learning also known as metric learning and contrastive learning. TensorFlow Similarity is still in beta.

OpenSearch - Open source distributed and RESTful search engine

  •    Java

OpenSearch is a community-driven, open source search and analytics suite derived from Apache 2.0 licensed Elasticsearch 7.10.2 & Kibana 7.10.2. It consists of a search engine daemon, OpenSearch, and a visualization and user interface, OpenSearch Dashboards. OpenSearch enables people to easily ingest, secure, search, aggregate, view, and analyze data. These capabilities are popular for use cases such as application search, log analytics, and more.

tntsearch - A fully featured full text search engine written in PHP

  •    PHP

We created also some demo pages that show tolerant retrieval with n-grams in action. The package has bunch of helper functions like jaro-winkler and cosine similarity for distance calculations. It supports stemming for English, Croatian, Arabic, Italian, Russian, Portuguese and Ukrainian. If the built in stemmers aren't enough, the engine lets you easily plugin any compatible snowball stemmer. Some forks of the package even support Chinese. Unlike many other engines, the index can be easily updated without doing a reindex or using deltas.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.