A Bloom filter is a space-efficient probabilistic data structure used to determine whether an element belongs to a set or not. The Bloom filter allows false positives (maybe in the set) but never false negatives (definitely not in set) . If you are new to bloomfilters, give Bloom Filters by Example a read. The bloomfilter package is suitable for caching filtering, decentralized aggregation, search large chemical structure databases and many other applications. More specifically, we use this package in production with KrakenD to distributedly reject JWT tokens as it allows us to perform massive rejections with very little memory consumption. For instance, 100 million tokens of any size consume around 0.5GB RAM (with a rate of false positives of 1 in 999,925,224 tokens), and lookups are completed in constant time (k number of hashes). These numbers are impossible to get with a key-value or a relational database.
https://github.com/devopsfaith/bloomfilterTags | bloom-filter probabilistic-data-structures rpc distributed-computing crdt g-set |
Implementation | Go |
License | Apache |
Platform | Windows MacOS Linux |
Boom Filters are probabilistic data structures for processing continuous, unbounded streams. This includes Stable Bloom Filters, Scalable Bloom Filters, Counting Bloom Filters, Inverse Bloom Filters, Cuckoo Filters, several variants of traditional Bloom filters, HyperLogLog, Count-Min Sketch, and MinHash.Classic Bloom filters generally require a priori knowledge of the data set in order to allocate an appropriately sized bit array. This works well for offline processing, but online processing typically involves unbounded data streams. With enough data, a traditional Bloom filter "fills up", after which it has a false-positive probability of 1.
bloom-filter stable-bloom-filters cuckoo-filter probabilistic-programming counting-bloom-filters scalable-bloom-filters count-min-sketch data-stream filter data-structure collections go-collection"A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not. In other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed," says Wikipedia. Warning: These are synthetic benchmarks in isolated environment. Usually the difference in throughput and latency is bigger in production system because it will stress the GC, lead to slow allocation paths and higher latencies, trigger the GC, etc.
bloom-filter probabilistic high-performance datastructuresCuckoo filter is a Bloom filter replacement for approximated set-membership queries. Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters. For applications that store many items and target moderately low false positive rates, cuckoo filters have lower space overhead than space-optimized Bloom filters. Some possible use-cases that depend on approximated set-membership queries would be databases, caches, routers, and storage systems where it is used to decide if a given item is in a (usually large) set, with some small false positive probability. Alternatively, given it is designed to be a viable replacement to Bloom filters, it can also be used to reduce the space required in probabilistic routing tables, speed longest-prefix matching for IP addresses, improve network state management and monitoring, and encode multicast forwarding information in packets, among many other applications. Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).
cuckoo-filter bloom-filter filterThis is a collection of different data structures and utilities, implemented in JavaScript. Its written and tested using Node.js which is also the target platform.CircularBuffer - A data structure that uses a single, fixed-size buffer as if it were connected end-to-end. When the buffer is filled, new data is written starting at the beginning of the buffer and overwriting the old.
algorithms datastructures ds sort-map data-structures avl avltree trie ternarysearchtrie multiwaytrie skiplist btree binarysearch binarysearchtree sortedmap lru cache eviction maximize least-recently-used loading guava async delay-queue delayqueue priorityqueue heap binaryheap bitset bitarray bitvector circularbuffer ringbuffer graph directed undirected shortest cycle path topological bloomfilter bloom filterRedisson - distributed Java objects and services (Set, Multimap, SortedSet, Map, List, Queue, BlockingQueue, Deque, BlockingDeque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Executor service, Tomcat Session Manager, Scheduler service, JCache API) on top of Redis server. Rich Redis client.
cache distributed-caching distributed-locks redis-client redis-cluster collections java-collections hashmap set queueCuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space. Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).
P. Almeida, C.Baquero, N. Preguiça, D. Hutchison, Scalable Bloom Filters, (GLOBECOM 2007), IEEE, 2007. Bloom filters are great if you understand what amount of bits you need to set aside early to store your entire set. Scalable Bloom Filters allow your bloom filter bits to grow as a function of false positive probability and size.
A Bloom filter is a representation of a set of n items, where the main requirement is to make membership queries; i.e., whether an item is a member of a set.A Bloom filter has two parameters: m, a maximum size (typically a reasonably large multiple of the cardinality of the set to represent) and k, the number of hashing functions on elements of the set. (The actual hashing functions are important, too, but this is not a parameter for this implementation). A Bloom filter is backed by a BitSet; a key is represented in the filter by setting the bits at each value of the hashing functions (modulo m). Set membership is done by testing whether the bits at each value of the hashing functions (again, modulo m) are set. If so, the item is in the set. If the item is actually in the set, a Bloom filter will never fail (the true positive rate is 1.0); but it is susceptible to false positives. The art is to choose k and m correctly.
bloom bloom-filters bloom-filter data-structure collections go-collectionFast bloom filter in JavaScript.
bloom-filter probabilistic-data-structureHazelcast Jet is a distributed computing platform built for high-performance stream processing and fast batch processing. It embeds Hazelcast In Memory Data Grid (IMDG) to provide a lightweight package of a processor and a scalable in-memory storage. It supports distributed java.util.stream API support for Hazelcast data structures such as IMap and IList, Distributed implementations of java.util.{Queue, Set, List, Map} data structures highly optimized to be used for the processing
data-grid data-processing data-streaming in-memory batch-processing stream-processinginbloom - a cross language Bloom filter implementation (https://en.wikipedia.org/wiki/Bloom_filter). A Bloom filter is a probabalistic data structure which provides an extremely space-efficient method of representing large sets. It can have false positives but never false negatives which means a query returns either "possibly in set" or "definitely not in set".
A stand-alone Bloom filter implementation written in Java
A Bloom filter is a data structure that may report it contains an item that it does not (a false positive), but is guaranteed to report correctly if it contains the item ("no false negatives"). The opposite of a Bloom filter is a data structure that may report a false negative, but can never report a false positive. That is, it may claim that it has not seen an item when it has, but will never claim to have seen an item it has not.This repository contains thread-safe implementations of "the opposite of a Bloom filter" in Java and Go.
Roshi implements a time-series event storage via a LWW-element-set CRDT with limited inline garbage collection. Roshi is a stateless, distributed layer on top of Redis and is implemented in Go. It is partition tolerant, highly available and eventually consistent. Roshi stores a sharded copy of your dataset in multiple independent Redis instances, called a cluster. Roshi provides fault tolerance by duplicating clusters; multiple identical clusters, normally at least 3, form a farm. Roshi leverages CRDT semantics to ensure consistency without explicit consensus.
Algorithm is a library of tools that is used to create intelligent applications. Embedded frameworks require a minimum deployment target of iOS 8 or OS X Mavericks (10.9).
algorithm algorithms data-structures data-structure data-structures-algorithms probability probabilistic-programming set multiset dictionary multidictionary conditional-probability redblacktree redblacktrees swift-3 cosmicmind sorted-sets doublylinkedlist stackEdward is a Python library for probabilistic modeling, inference, and criticism. It is a testbed for fast experimentation and research with probabilistic models, ranging from classical hierarchical models on small data sets to complex deep probabilistic models on large data sets. Edward fuses three fields: Bayesian statistics and machine learning, deep learning, and probabilistic programming. Edward is built on top of TensorFlow. It enables features such as computational graphs, distributed training, CPU/GPU integration, automatic differentiation, and visualization with TensorBoard.
bayesian-methods deep-learning machine-learning data-science tensorflow neural-networks statistics probabilistic-programminga CRDT is a data type designed so that operations on it commute - give the same result indepent of the order in which they are applied.CRDTs give you eventual consistency for free. it is not necessary to track concurrent changes and use complicated merge algorithms. this module is useful for collaborative/distributed/peer2peer (same things) applications.
Redisson - Distributed and Scalable Java data structures (Set, SortedSet, Map, ConcurrentMap, List, Queue, Deque, Lock, AtomicLong, CountDownLatch, Publish / Subscribe, HyperLogLog) on top of Redis server. Advanced redis java client. It supports over 28+ data structures and services, Synchronous / asynchronous / reactive interfaces and lot more.
distributed collections locks distributed-locks cachehuststore is a open source high performance distributed database system. It not only provides key-value storage service with extremely high performance, up to 100 thousand QPS, but also supports data structures like hash, set, sorted set, etc. Also, it can store binary data as value from a key-value pair, and thus can be used as an alternative of Redis.In addtion, huststore implements a distributed message queue by integrating a special HA module, features including message Push Stream, and message Publish-SubScribe, these features can be used as replacements of the corresponding features in rabbitmq and gearman.
architecture benchmark redis database-engine nginx message-queue distributed-storageLogswan is a fast Web log analyzer using probabilistic data structures. It is targeted at very large log files, typically APIs logs. It has constant memory usage regardless of the log file size, and takes approximatively 4MB of RAM.Unique visitors counting is performed using two HyperLogLog counters (one for IPv4, and another one for IPv6), providing a relative accuracy of 0.10%. String representations of IP addresses are used and preferred as they offer better precision.
web log analyzer hyperloglog
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.