Displaying 1 to 7 from 7 results

ps - Problem Solving

  •    

ps is a collection of various computing problems and their corresponding solutions that I could think of. The problems range in various categories such as design, algorithms, performance optimizations, and more. Feel free to add a Github issue for any discrepancy, typo, alternatives, or a different solution that I might not have considered. Also feel free to add issues for problems that you would like me to work upon - though it may take me a few days to research and jot my notes here.

bloom - Go package implementing Bloom filters

  •    Go

A Bloom filter is a representation of a set of n items, where the main requirement is to make membership queries; i.e., whether an item is a member of a set.A Bloom filter has two parameters: m, a maximum size (typically a reasonably large multiple of the cardinality of the set to represent) and k, the number of hashing functions on elements of the set. (The actual hashing functions are important, too, but this is not a parameter for this implementation). A Bloom filter is backed by a BitSet; a key is represented in the filter by setting the bits at each value of the hashing functions (modulo m). Set membership is done by testing whether the bits at each value of the hashing functions (again, modulo m) are set. If so, the item is in the set. If the item is actually in the set, a Bloom filter will never fail (the true positive rate is 1.0); but it is susceptible to false positives. The art is to choose k and m correctly.

bloofi - Bloofi: A java implementation of multidimensional Bloom filters

  •    Java

Bloom filters are probabilistic data structures commonly used for approximate membership problems in many areas of Computer Science (networking, distributed systems, databases, etc.). With the increase in data size and distribution of data, problems arise where a large number of Bloom filters are available, and all them need to be searched for potential matches. As an example, in a federated cloud environment, each cloud provider could encode the information using Bloom filters and share the Bloom filters with a central coordinator. The problem of interest is not only whether a given element is in any of the sets represented by the Bloom filters, but which of the existing sets contain the given element. This problem cannot be solved by just constructing a Bloom filter on the union of all the sets. Instead, we effectively have a multidimensional Bloom filter problem: given an element, we wish to receive a list of candidate sets where the element might be. To solve this problem, we consider 3 alternatives. Firstly, we can naively check many Bloom filters. Secondly, we propose to organize the Bloom filters in a hierarchical index structure akin to a B+ tree, that we call Bloofi. Finally, we propose another data structure that packs the Bloom filters in such a way as to exploit bit-level parallelism, which we call Flat-Bloofi. Our theoretical and experimental results show that Bloofi and Flat-Bloofi provide scalable and efficient solutions alternatives to search through a large number of Bloom filters.We build on an existing Bloom filter library (https://github.com/magnuss/java-bloomfilter) by Magnus Skjegstad which we embedded and modified. We also use junit and Hamcrest which we include as jar files for your convenience.




fuggetaboutit - implementations of a counting bloom, a timing bloom and a scaling timing bloom

  •    Python

What does this mean? Well... it means you can have a rolling window view on unique items in a stream (using the TimingBloomFilter object) and also have it rescale itself when the number of unique items increases beyond what you had anticipated (using the ScalingTimingBloomFilter). And, since this is built on bloom filters, the number of bits per entry is generally EXCEEDINGLY small letting you keep track of many items using a small amount of resources while still having very tight bounds on error. Assuming you have a tornado.ioloop running, this will automatically forget old values for you and only print if the phone number has been seen in the last 24hours. (NOTE: If you do not have an IOLoop running, don't worry... just call the TimingBloomFilter.decay() method every half a decay interval or every 12 hours in this example).

bloomfilter - bloomfilter.js ported to Go

  •    Go

The ability to build a bloom filter on a server in Go and evaluate that filter on a client in Javascript can have immense value for comparing application state in distributed single page applications with offline read/write capabilities and large data sets. There are a lot of open source bloom filter implementations available on the internet. These implementations are mostly incompatible. Every repo adds its own special sauce to its hashing algorithms and hash derivation methods. After scouring the internet for a multi-language bloom filter implemenation, none appeared that fit the requirements.

Olivia - Go: A distributed, in-memory key-value storage.

  •    Go

Olivia is essentially a distributed hash table that I built to test out some weird ideas I've had about Go and distributed programming that I've had. Olivia is essentially just a distributed hash table with added goodies. From early on, I wanted to use bloomfilters to enhance lookup speed between different remote nodes and I'm fairly happy with how that's turned out. Bloom filters allow us to prioritize which nodes we'll request keys from, rather than blindly sending out requests to all known nodes.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.