Displaying 1 to 4 from 4 results

BoomFilters - Probabilistic data structures for processing continuous, unbounded streams.

  •    Go

Boom Filters are probabilistic data structures for processing continuous, unbounded streams. This includes Stable Bloom Filters, Scalable Bloom Filters, Counting Bloom Filters, Inverse Bloom Filters, Cuckoo Filters, several variants of traditional Bloom filters, HyperLogLog, Count-Min Sketch, and MinHash.Classic Bloom filters generally require a priori knowledge of the data set in order to allocate an appropriately sized bit array. This works well for offline processing, but online processing typically involves unbounded data streams. With enough data, a traditional Bloom filter "fills up", after which it has a false-positive probability of 1.

khmer - In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more

  •    Python

The official source code repository is at https://github.com/dib-lab/khmer and project documentation is available online at http://khmer.readthedocs.io. See http://khmer.readthedocs.io/en/stable/introduction.html for an overview of the khmer project. khmer is research software, so you should cite us when you use it in scientific publications! Please see the CITATION file for citation information.

sketchy - Sketching Algorithms for Clojure (bloom filter, min-hash, hyper-loglog, count-min sketch)

  •    Clojure

sketchy is available as a Maven artifact from Clojars.This library contains various sketching/hash-based algorithms useful for building compact summaries of large datasets.

countminsketch - An implementation of Count-Min Sketch in Golang

  •    Go

An implementation of Count-Min Sketch in Golang. The Count–min sketch (or CM sketch) is a probabilistic sub-linear space streaming algorithm which can be used to summarize a data stream in many different ways. The algorithm was invented in 2003 by Graham Cormode and S. Muthu Muthukrishnan.