Displaying 1 to 13 from 13 results

Redis - Advanced key-value store

  •    C

Redis is an advanced key-value store. It is similar to memcached but the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists, sets, and ordered sets. All this data types can be manipulated with atomic operations to push/pop elements, add/remove elements, perform server side union, intersection, difference between sets, and so forth. Redis supports different kind of sorting abilities.

datasketch - MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++

  •    Python

datasketch gives you probabilistic data structures that can process and search very large amount of data super fast, with little loss of accuracy. datasketch must be used with Python 2.7 or above and NumPy 1.11 or above. Scipy is optional, but with it the LSH initialization can be much faster.

logswan - Fast Web log analyzer using probabilistic data structures

  •    C

Logswan is a fast Web log analyzer using probabilistic data structures. It is targeted at very large log files, typically APIs logs. It has constant memory usage regardless of the log file size, and takes approximatively 4MB of RAM.Unique visitors counting is performed using two HyperLogLog counters (one for IPv4, and another one for IPv6), providing a relative accuracy of 0.10%. String representations of IP addresses are used and preferred as they offer better precision.

sketchy - Sketching Algorithms for Clojure (bloom filter, min-hash, hyper-loglog, count-min sketch)

  •    Clojure

sketchy is available as a Maven artifact from Clojars.This library contains various sketching/hash-based algorithms useful for building compact summaries of large datasets.

flajolet - Probabilistic data structures for OCaml

  •    OCaml

Flajolet is an OCaml library providing streaming data structures in the vein of the popular streamlib library for Java. Flajolet is named for INRIA professor Philippe Flajolet, inventor of the HyperLogLog data structure.

cpp-HyperLogLog - C++ implementation of HyperLogLog

  •    C++

C++ implementation of HyperLogLog algorithm and HIP(Historic Inverse Probability) Estimator. HyperLoglog is a headers-only library so you just need to include "hyperloglog.hpp" and "murmur3.h" to use this project. You can use normal HyperLogLog counter class(hll::HyperLogLog) and HyperLogLog counter with HIP Estimator class(hll::HyperLogLogHIP).

node-streamcount - Provides implementations of "sketch" algorithms for real-time counting of stream data

  •    Javascript

Provides implementations of "sketch" algorithms for real-time counting of stream data. For an overview of the type of problems these algorithms solve, read The Britney Spears Problem and Wikipedia's article on Streaming algorithm.

ntCard - Estimating k-mer coverage histogram of genomics data

  •    C++

ntCard is a streaming algorithm for cardinality estimation in genomics datasets. As input it takes file(s) in fasta, fastq, sam, or bam formats and computes the total number of distinct k-mers, F0, and also the k-mer coverage frequency histogram, fi, i>=1.

go-hll - HyperLogLog in golang

  •    Go

A go implementation of HypeLogLog data structure with a twist. See HyperLogLog in Practice paper by Stefan Heule, Marc Nunkesser, Alex Hall. There is no need to serialize/deserialize hll. Everything is stored in a byte slice, which can be memory mapped, passed around over the network as is etc.

visigo - Unique site visits counter in Go

  •    Go

Visigo is http middleware for page unique visits counting. It uses HyperLogLog as a counter, so it's pretty fast. Warning: Visigo stores HyperLogLog++ in map, so this implementation should be used only on smaller sites.

HyperMinHash-java - Union, intersection, and set cardinality in loglog space

  •    Java

HyperMinHash is a probabilistic data structure that can approximate union, intersection, and set cardinalities as well as Jaccard indices of very large sets with high accuracy, in loglog space, and in a streaming fashion.


  •    CSharp

.NET Redis client library based on StackExchange.Redis adding some interesting features like an extensible serialization strategy, a tagging mechanism to group keys, hash fields and set members, and a fetching mechanism to support atomic add/get operations, all being cluster-compatible. The constructor parameter must be a valid StackExchange.Redis connection string. Check this for more information about StackExchange.Redis configuration options.