hash_table - Fast, data-oriented, stdlib-style hash table

  •        13

This repo contains the implementation of a fast (both for lookup and insertion) hash table, with an interface similar to that of C++ standard library collection types. The algorithm is inspired by what's found in Apple's Objective-C runtime: rather than using the classic "tombstone" or "dummy value" approach for deleted key-value pairs, this variant always compares even deleted values in a collision sequence, however the global maximum of collision sequence lengths is stored and lookup terminates once a sequence of this length has been traversed without finding a match.




Related Projects

sparsepp - A fast, memory efficient hash map for C++

  •    C++

We believe Sparsepp provides an unparalleled combination of performance and memory usage, and will outperform your compiler's unordered_map on both counts. Only Google's dense_hash_map is consistently faster, at the cost of much greater memory usage (especially when the final size of the map is not known in advance, and insertions cause a resizing). This hash map. like Google's dense_hash_map, uses open addressing (meaning that the hash table entries are conceptually stored in a big array, and collisions are not resolved using a linked list, but by probing). A big drawback of open addressing is that when the array needs to be resized larger, the high mark for memory usage is typically 3 times the current map size (as we allocate a new array twice as big as the current one, and copy from the old array to the new one). Sparsepp's implementation resolves this memory usage issue, and the memory usage high mark shows only a small bump when resizing.

sharedhashfile - Share Hash Tables Stored In Memory Mapped Files Between Arbitrary Processes & Threads

  •    C

SharedHashFile is a lightweight NoSQL key value store / hash table, a zero-copy IPC queue, & a multiplexed IPC logging library written in C for Linux. There is no server process. Data is read and written directly from/to shared memory or SSD; no sockets are used between SharedHashFile and the application program. APIs for C, C++, & nodejs. Data is kept in shared memory by default, making all the data accessible to separate processes and/or threads. Up to 4 billion keys can be stored in a single SharedHashFile hash table which is limited in size only by available RAM.

Capsule - The Capsule Hash Trie Collections Library

  •    Java

Capsule aims to become a full-fledged (immutable) collections library for Java 8+ that is solely built around persistent tries. The library is designed for standalone use and for being embedded in domain-specific languages. Capsule still has to undergo some incubation before it can ship as a well-rounded collection library. Nevertheless, the code is stable and performance is solid.

TomP2P - A P2P-based high performance key-value pair storage library

  •    Java

TomP2P is a P2P library and a distributed hash table (DHT) implementation which provides a decentralized key-value infrastructure for distributed applications. Each peer has a table that can be configured either to be disk-based or memory-based to store its values. TomP2P stores key-value pairs in a distributed manner. To find the peers to store the data in the distributed hash table, TomP2P uses an iterative routing to find the closest peers. Since TomP2P uses non-blocking communication, a future object is required to keep track of future results. This key concept is used for all the communication (iterative routing and DHT operations, such as storing a value on multiple peers) in TomP2P and it is also exposed in the API. Thus, an operation such as get or put will return immediately and the user can either block the operation to wait for the completion or add a listener that gets notified when the operation completes.

xxHash - Extremely fast non-cryptographic hash algorithm

  •    C

xxHash is an Extremely fast Hash algorithm, running at RAM speed limits. It successfully completes the SMHasher test suite which evaluates collision, dispersion and randomness qualities of hash functions. Code is highly portable, and hashes are identical on all platforms (little / big endian).Q.Score is a measure of quality of the hash function. It depends on successfully passing SMHasher test set. 10 is a perfect score. Algorithms with a score < 5 are not listed on this table.

opendht - A C++11 Distributed Hash Table implementation

  •    C++

A lightweight C++11 Distributed Hash Table implementation. OpenDHT provides an easy to use distributed in-memory data store. Every node in the network can read and write values to the store. Values are distributed over the network, with redundancy.

Essentials - General purpose utilities and hash functions for Android and Java (aka java-common)

  •    Java

Essentials are a collection of general-purpose classes we found useful in many occasions. This project is bare bones compared to a rich menu offered by Guava or Apache Commons. Essentials is not a framework, it's rather a small set of utilities to make Java standard approaches more convenient or more efficient.

Hash-Buster - Crack hashes in seconds.

  •    Python

Note: Hash Buster isn't compatible with python2, run it with python3 instead. Also, Hash-Buster uses some APIs for hash lookups, check the source code if you are paranoid. After the installation, you will be able to access it with buster command.

Hypertable - A high performance, scalable, distributed storage and processing system for structured

  •    C++

Hypertable is based on Google's Bigtable Design, which is a proven scalable design that powers hundreds of Google services. Many of the current scalable NoSQL database offerings are based on a hash table design which means that the data they manage is not kept physically ordered. Hypertable keeps data physically sorted by a primary key and it is well suited for Analytics.

imagehash - 🌄 Perceptual image hashing for PHP

  •    PHP

A perceptual hash is a fingerprint of a multimedia file derived from various features from its content. Unlike cryptographic hash functions which rely on the avalanche effect of small changes in input leading to drastic changes in the output, perceptual hashes are "close" to one another if the features are similar. Perceptual hashes are a different concept compared to cryptographic hash functions like MD5 and SHA1. With cryptographic hashes, the hash values are random. The data used to generate the hash acts like a random seed, so the same data will generate the same result, but different data will create different results. Comparing two SHA1 hash values really only tells you two things. If the hashes are different, then the data is different. And if the hashes are the same, then the data is likely the same. In contrast, perceptual hashes can be compared -- giving you a sense of similarity between the two data sets.

kad - peer-to-peer application framework implementing the kademlia distributed hash table for node

  •    Javascript

Peer-to-peer application framework implementing the Kademlia distributed hash table for Node.js and the browser.Install kad as a dependency of your package using NPM.

wendy - A pure Go implementation of the Pastry Distributed Hash Table

  •    Go

An open source, pure-Go implementation of the Pastry Distributed Hash Table. Beta1: Wendy is still in active development. It should not be used for mission-critical software. It has been beat on a little, but there are probably still bugs we haven't found or fixed.

cjdns - An encrypted IPv6 network using public-key cryptography for address allocation and a distributed hash table for routing

  •    Assembly

Cjdns implements an encrypted IPv6 network using public-key cryptography for address allocation and a distributed hash table for routing. This provides near-zero-configuration networking, and prevents many of the security and scalability issues that plague existing networks. The cjdns developers.

libcuckoo - A high-performance, concurrent hash table

  •    C++

libcuckoo provides a high-performance, compact hash table that allows multiple concurrent reader and writer threads. The Doxygen-generated documentation is available at the project page.

kissdb - (Keep It) Simple Stupid Database

  •    C

KISSDB is about the simplest key/value store you'll ever see, anywhere. It's written in plain vanilla C using only the standard string and FILE I/O functions, and should port to just about anything with a disk or something that acts like one. It stores keys and values of fixed length in a stupid-simple file format based on fixed-size hash tables. If a hash collision occurrs, a new "page" of hash table is appended to the database. The format is append-only. There is no delete. Puts that replace an existing value, however, will not grow the file as they will overwrite the existing entry.


  •    C++

Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space. Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).

C++ Hash Container Benchmark


C++ Hash Container Benchmark for STL map, C++0x unordered map, Boost unordered map, ATL map and ATL hash map for STL wide string and ATL CString.

casync - Content-Addressable Data Synchronization Tool

  •    C

Encoding: Let's take a large linear data stream, split it into variable-sized chunks (the size of each being a function of the chunk's contents), and store these chunks in individual, compressed files in some directory, each file named after a strong hash value of its contents, so that the hash value may be used to as key for retrieving the full chunk data. Let's call this directory a "chunk store". At the same time, generate a "chunk index" file that lists these chunk hash values plus their respective chunk sizes in a simple linear array. The chunking algorithm is supposed to create variable, but similarly sized chunks from the data stream, and do so in a way that the same data results in the same chunks even if placed at varying offsets. For more information see this blog story. Decoding: Let's take the chunk index file, and reassemble the large linear data stream by concatenating the uncompressed chunks retrieved from the chunk store, keyed by the listed chunk hash values.

Open Chord

  •    Java

OpenChord is an open source implementation of the Chord distributed hash table as described in the paper by Ion Stoica et al. quot;Chord: A scalable peer-to-peer lookup service for internet applicationsquot;. It is available under GNU GPL.


  •    CSharp

NChord is a C# implementation of the Chord distributed hash table. The project provides a library containing the routing, lookup, and maintenance routines specified in the MIT Chord paper, and is quite stable including under heavy load and churn.