Clustering - Implements "Clustering a Million Faces by Identity"

  •        3

This repository contains an implementation of this paper. - Contains the implementaion of the clustering algorithm.



Related Projects

lopq - Training of Locally Optimized Product Quantization (LOPQ) models for approximate nearest neighbor search of high dimensional data in Python and Spark

  •    Python

This is Python training and testing code for Locally Optimized Product Quantization (LOPQ) models, as well as Spark scripts to scale training to hundreds of millions of vectors. The resulting model can be used in Python with code provided here or deployed via a Protobuf format to, e.g., search backends for high performance approximate nearest neighbor search.Locally Optimized Product Quantization (LOPQ) [1] is a hierarchical quantization algorithm that produces codes of configurable length for data points. These codes are efficient representations of the original vector and can be used in a variety of ways depending on the application, including as hashes that preserve locality, as a compressed vector from which an approximate vector in the data space can be reconstructed, and as a representation from which to compute an approximation of the Euclidean distance between points.

supercluster - A crazy fast geospatial point clustering library for browsers and Node.

  •    Javascript

A very fast JavaScript library for geospatial point clustering for browsers and Node.Loads an array of GeoJSON Feature objects. Each feature's geometry must be a GeoJSON Point. Once loaded, index is immutable.

hdbscan - A high performance implementation of HDBSCAN clustering.

  •    Jupyter

HDBSCAN - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Performs DBSCAN over varying epsilon values and integrates the result to find a clustering that gives the best stability over epsilon. This allows HDBSCAN to find clusters of varying densities (unlike DBSCAN), and be more robust to parameter selection.

tomcat clustering, load balancing

  •    Java

Tomcat clustering, providing failover clustering, load balancing clustering, tomcat clustering. Including apache and tomcat web server. Need to copy and run on server.

Clustering Demo in Silverlight using K-Means Algorithm


This explains clustering and K-means algorithm in an efficient way using a live demo in Silverlight. The demo can be used to understand the working of k-means algorithm through user-defined data points.


  •    Python

clusterviz allows to cluster three-dimensional data. The clustering process is visualized using OpenGL. As clustering algorithms the family of k-means algorithms is implemented, including mixture models.

node-facenet - Solve face verification, recognition and clustering problems: A TensorFlow backed FaceNet implementation for Node

  •    TypeScript

A TensorFlow backed FaceNet implementation for Node.js, which can solve face verification, recognition and clustering problems. FaceNet is a deep convolutional network designed by Google, trained to solve face verification, recognition and clustering problem with efficiently at scale.

t-digest - A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means

  •    Java

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means. The t-digest algorithm is also very parallel friendly making it useful in map-reduce and parallel streaming applications. The t-digest construction algorithm uses a variant of 1-dimensional k-means clustering to produce a data structure that is related to the Q-digest. This t-digest data structure can be used to estimate quantiles or compute other rank statistics. The advantage of the t-digest over the Q-digest is that the t-digest can handle floating point values while the Q-digest is limited to integers. With small changes, the t-digest can handle any values from any ordered set that has something akin to a mean. The accuracy of quantile estimates produced by t-digests can be orders of magnitude more accurate than those produced by Q-digests in spite of the fact that t-digests are more compact when stored on disk.

DominantColor - Finding dominant colors of an image using k-means clustering

  •    Swift

Finding the dominant colors of an image using the CIE LAB color space and the k-means clustering algorithm. The RGB color space does not take human perception into account, so the CIELAB color space is used instead, which is designed to approximate human vision [1].

ClusterKit - An iOS map clustering framework targeting MapKit, Google Maps and Mapbox.

  •    Objective-C

ClusterKit is an elegant and efficiant clustering controller for maps. Its flexible architecture make it very customizable, you can use your own algorithm and even your own map provider. Or clone the repo and run pod install from the Examples directory first.

Virtual Earth Clustering

  •    Javascript

The project provides code for Microsoft Virtual Earth to cluster data and provide high performance for large data sets over the web.

CCHMapClusterController - High-performance map clustering with MapKit for iOS and OS X

  •    Objective-C

CCHMapClusterController solves the problem of displaying many annotations on an MKMapView and is available under the MIT license. Note: With iOS 11, Apple introduced map clustering support in MapKit. You can continue using CCHMapClusterController on iOS 11, but for new projects, I suggest to check out if the built-in functionality is a match for your needs. I will still accept PRs for bug fixes and small enhancements but won't otherwise implement any new functionality.

metrictank - Cassandra-backed, metrics2

  •    Go

Metrictank is a multi-tenant timeseries engine for Graphite and friends. It provides long term storage, high availability, efficient storage, retrieval and processing for large scale environments. GrafanaLabs has been running metrictank in production since December 2015. It currently requires an external datastore like Cassandra, and we highly recommend using Kafka to support clustering, as well as a clustering manager like Kubernetes. This makes it non-trivial to operate, though GrafanaLabs has an on-premise product that makes this process much easier.

Clustering demo

  •    Java

This is educational software whose purpose it is to clarify the workings of different clustering algorithms by visualising the clustering process in 2D space. This make many possible problems and advantages very clear.

clusterfck - [UNMAINTAINED] K-means and hierarchical clustering

  •    Javascript

A js cluster analysis library. Includes Hierarchical (agglomerative) clustering and K-means clustering. Demo here. For classification, instantiate a new Kmeans() object.

Leaflet.markercluster - Marker Clustering plugin for Leaflet

  •    Javascript

Provides Beautiful Animated Marker Clustering functionality for Leaflet, a JS library for interactive maps. See the included examples for usage.

Apache Mahout - Scalable machine learning library

  •    Java

Apache Mahout has implementations of a wide range of machine learning and data mining algorithms: clustering, classification, collaborative filtering and frequent pattern mining.

Swarm - Docker native clustering system

  •    Go

Docker Swarm is native clustering for Docker. It turns a pool of Docker hosts into a single, virtual host. It is Docker's first container orchestration project that began in 2014. Combined with Docker Compose, it's a very convenient tool to schedule containers.