kdtree - Absolute balanced kdtree for fast kNN search.

  •        20

This is a (nearly absolute) balanced kdtree for fast kNN search with bad performance for dynamic addition and removal. In fact we adopt quick sort to rebuild the whole tree after changes of the nodes. We cache the added or the deleted nodes which will not be actually mapped into the tree until the rebuild method to be invoked. The good thing is we can always keep the tree balanced, and the bad thing is we have to wait some time for the finish of tree rebuild. Moreover duplicated samples are allowed to be added with the tree still kept balanced. The thought of the implementation is posted here.




Related Projects

kmcuda - Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA

  •    Jupyter

K-means implementation is based on "Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup". While it introduces some overhead and many conditional clauses which are bad for CUDA, it still shows 1.6-2x speedup against the Lloyd algorithm. K-nearest neighbors employ the same triangle inequality idea and require precalculated centroids and cluster assignments, similar to the flattened ball tree. Technically, this project is a shared library which exports two functions defined in kmcuda.h: kmeans_cuda and knn_cuda. It has built-in Python3 and R native extension support, so you can from libKMCUDA import kmeans_cuda or dyn.load("libKMCUDA.so").

machine_learning_basics - Plain python implementations of basic machine learning algorithms

  •    Jupyter

This repository contains implementations of basic machine learning algorithms in plain Python (Python Version 3.6+). All algorithms are implemented from scratch without using additional machine learning libraries. The intention of these notebooks is to provide a basic understanding of the algorithms and their underlying structure, not to provide the most efficient implementations. After several requests I started preparing notebooks on how to preprocess datasets for machine learning. Within the next months I will add one notebook for each kind of dataset (text, images, ...). As before, the intention of these notebooks is to provide a basic understanding of the preprocessing steps, not to provide the most efficient implementations.

K-Nearest-Neighbors-with-Dynamic-Time-Warping - Python implementation of KNN and DTW classification algorithm

  •    Jupyter

When it comes to building a classification algorithm, analysts have a broad range of open source options to choose from. However, for time series classification, there are less out-of-the box solutions. I began researching the domain of time series classification and was intrigued by a recommended technique called K Nearest Neighbors and Dynamic Time Warping. A meta analysis completed by Mitsa (2010) suggests that when it comes to timeseries classification, 1 Nearest Neighbor (K=1) and Dynamic Timewarping is very difficult to beat [1].

kd-tree-javascript - JavaScript k-d Tree Implementation

  •    Javascript

A basic but super fast JavaScript implementation of the k-dimensional tree data structure. As of version 1.01, the library is defined as an UMD module (based on https://github.com/umdjs/umd/blob/master/commonjsStrict.js).

n2 - TOROS N2 - lightweight approximate Nearest Neighbor library which runs faster even with large datasets

  •    C++

For more detail, see the installation for instruction on how to build N2 from source. N2 is an approximate nearest neighborhoods algorithm library written in C++ (including Python/Go bindings). N2 provides a much faster search speed than other implementations when modeling large dataset. Also, N2 supports multi-core CPUs for index building.

rumale - Rumale is a machine learning library in Ruby

  •    Ruby

Rumale (Ruby machine learning) is a machine learning library in Ruby. Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python. Rumale supports Linear / Kernel Support Vector Machine, Logistic Regression, Linear Regression, Ridge, Lasso, Kernel Ridge, Factorization Machine, Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor classifier, K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, SNN, Power Iteration Clustering, Mutidimensional Scaling, t-SNE, Principal Component Analysis, Kernel PCA and Non-negative Matrix Factorization. This project was formerly known as "SVMKit". If you are using SVMKit, please install Rumale and replace SVMKit constants with Rumale.

nanoflann - nanoflann: a C++11 header-only library for Nearest Neighbor (NN) search with KD-trees

  •    C++

nanoflann is a C++11 header-only library for building KD-Trees of datasets with different topologies: R2, R3 (point clouds), SO(2) and SO(3) (2D and 3D rotation groups). No support for approximate NN is provided. nanoflann does not require compiling or installing. You just need to #include <nanoflann.hpp> in your code. This library is a fork of the flann library by Marius Muja and David G. Lowe, and born as a child project of MRPT. Following the original license terms, nanoflann is distributed under the BSD license. Please, for bugs use the issues button or fork and open a pull request.

Non-Metric Space Library (NMSLIB) - An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

  •    C++

Non-Metric Space Library (NMSLIB) is an efficient cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The core-library does not have any third-party dependencies. It has been gaining popularity recently. In particular, it has become a part of Amazon Elasticsearch Service.


  •    C++

a C++ template container implementation of k-dimensional space sorting based on the kd-tree data structure. THIS PROJECT HAS MOVED TO ALIOTH.DEBIAN.ORG

teachable-machine-boilerplate - Boilerplate code for Teachable Machine

  •    Javascript

This is a small boilerplate project that demonstrates how to use tensorflow.js to create projects like Teachable Machine. The code shows how you can create a KNN classifier that can be trained live in the browser on a webcam image. It is intentionally kept very simple so it can provide a starting point for new projects. Behind the scenes, the image from the webcam is being processed by an activation of MobileNet. This network is trained to recognize all sorts of classes from the imagenet dataset, and is optimized to be really small, making it useable in the browser. Instead of reading the prediction values from the MobileNet network, we instead take the second to last layer in the neural network and feed it into a KNN (k-nearest neighbors) classifier that allows you to train your own classes.

twss.js - A node.js "that's what she said" classifier

  •    Javascript

This is a node.js module that classifies if a sentence can be replied with "that's what she said". You change algorithm from the default naive bayes classifier (nbc) to a k-nearest neighbor algorithm (knn).

binarytree - Python Library for Studying Binary Trees

  •    Python

Binarytree is a Python library which provides a simple API to generate, visualize, inspect and manipulate binary trees. It allows you to skip the tedious work of setting up test data, and dive straight into practising your algorithms. Heaps and BSTs (binary search trees) are also supported. You may need to use sudo depending on your environment.

grt - gesture recognition toolkit

  •    C++

The Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, C++ machine learning library designed for real-time gesture recognition. Classification: Adaboost, Decision Tree, Dynamic Time Warping, Gaussian Mixture Models, Hidden Markov Models, k-nearest neighbor, Naive Bayes, Random Forests, Support Vector Machine, Softmax, and more...

rtreego - an R-Tree library for Go

  •    Go

A library for efficiently storing and querying spatial data in the Go programming language. The R-tree is a popular data structure for efficiently storing and querying spatial objects; one common use is implementing geospatial indexes in database management systems. Both bounding-box queries and k-nearest-neighbor queries are supported.


  •    Python

K-tree provides a scalable approach to clustering inspired by the B+-tree and k-means algorithms. Clustering can be used to solve problems in signal processing, machine learning and other contexts.

K-Mindmap idea collecting tool

  •    C++

K-Mindmap lets you create tree graphs; where each branch contains space for text. The structure of the tree graph is intended to reflect contextual relations between the text fields. A physical simulation moves the graph to keep it looking nice.

t-digest - A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means

  •    Java

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means. The t-digest algorithm is also very parallel friendly making it useful in map-reduce and parallel streaming applications. The t-digest construction algorithm uses a variant of 1-dimensional k-means clustering to produce a data structure that is related to the Q-digest. This t-digest data structure can be used to estimate quantiles or compute other rank statistics. The advantage of the t-digest over the Q-digest is that the t-digest can handle floating point values while the Q-digest is limited to integers. With small changes, the t-digest can handle any values from any ordered set that has something akin to a mean. The accuracy of quantile estimates produced by t-digests can be orders of magnitude more accurate than those produced by Q-digests in spite of the fact that t-digests are more compact when stored on disk.