array-hash - C++ implementation of a fast and memory efficient hash map and hash set specialized for strings

  •        11

Cache conscious hash map and hash set for strings based on the "Cache-conscious collision resolution in string hash tables." (Askitis Nikolas and Justin Zobel, 2005) paper. You can find some details regarding the structure here. Thanks to its cache friendliness, the structure provides fast lookups while keeping a low memory usage. The main drawback is the rehash process which is a bit slow and need some spare memory to copy the strings from the old hash table to the new hash table (it can’t use std::move as the other hash tables using std::string as key).



Related Projects

sparsepp - A fast, memory efficient hash map for C++

  •    C++

We believe Sparsepp provides an unparalleled combination of performance and memory usage, and will outperform your compiler's unordered_map on both counts. Only Google's dense_hash_map is consistently faster, at the cost of much greater memory usage (especially when the final size of the map is not known in advance, and insertions cause a resizing). This hash map. like Google's dense_hash_map, uses open addressing (meaning that the hash table entries are conceptually stored in a big array, and collisions are not resolved using a linked list, but by probing). A big drawback of open addressing is that when the array needs to be resized larger, the high mark for memory usage is typically 3 times the current map size (as we allocate a new array twice as big as the current one, and copy from the old array to the new one). Sparsepp's implementation resolves this memory usage issue, and the memory usage high mark shows only a small bump when resizing.

casync - Content-Addressable Data Synchronization Tool

  •    C

Encoding: Let's take a large linear data stream, split it into variable-sized chunks (the size of each being a function of the chunk's contents), and store these chunks in individual, compressed files in some directory, each file named after a strong hash value of its contents, so that the hash value may be used to as key for retrieving the full chunk data. Let's call this directory a "chunk store". At the same time, generate a "chunk index" file that lists these chunk hash values plus their respective chunk sizes in a simple linear array. The chunking algorithm is supposed to create variable, but similarly sized chunks from the data stream, and do so in a way that the same data results in the same chunks even if placed at varying offsets. For more information see this blog story. Decoding: Let's take the chunk index file, and reassemble the large linear data stream by concatenating the uncompressed chunks retrieved from the chunk store, keyed by the listed chunk hash values.

Capsule - The Capsule Hash Trie Collections Library

  •    Java

Capsule aims to become a full-fledged (immutable) collections library for Java 8+ that is solely built around persistent tries. The library is designed for standalone use and for being embedded in domain-specific languages. Capsule still has to undergo some incubation before it can ship as a well-rounded collection library. Nevertheless, the code is stable and performance is solid.

hat-trie - C++ implementation of a fast and memory efficient HAT-trie

  •    C++

Trie implementation based on the "HAT-trie: A Cache-conscious Trie-based Data Structure for Strings." (Askitis Nikolas and Sinha Ranjan, 2007) paper. For now, only the pure HAT-trie has been implemented, the hybrid version may arrive later. Details regarding the HAT-trie data structure can be found here. The library provides an efficient and compact way to store a set or a map of strings by compressing the common prefixes. It also allows to search for keys that match a prefix. Note though that the default parameters of the structure are geared toward optimizing exact searches, if you do a lot of prefix searches you may want to reduce the burst threshold through the burst_threshold method.

sharedhashfile - Share Hash Tables Stored In Memory Mapped Files Between Arbitrary Processes & Threads

  •    C

SharedHashFile is a lightweight NoSQL key value store / hash table, a zero-copy IPC queue, & a multiplexed IPC logging library written in C for Linux. There is no server process. Data is read and written directly from/to shared memory or SSD; no sockets are used between SharedHashFile and the application program. APIs for C, C++, & nodejs. Data is kept in shared memory by default, making all the data accessible to separate processes and/or threads. Up to 4 billion keys can be stored in a single SharedHashFile hash table which is limited in size only by available RAM.

Judy - General purpose dynamic array

  •    C

Judy is a general purpose dynamic array implemented as a C callable library. Judy's speed and memory usage are typically better than other data storage models and improves with very large data sets. Judy's key benefits are scalability, high performance, and memory efficiency. A Judy array is extensible and can scale up to a very large number of elements, bounded only by machine memory. Since Judy is designed as an unbounded array, the size of a Judy array is not pre-allocated but grows and shrinks dynamically with the array population.

get-in - Functions for for hash map (assoc array) traversal.

  •    PHP

Functions for hash map (assoc array) traversal. When dealing with nested associative structures, traversing them can become quite a pain. Mostly because of the amount of isset checking that is necessary.

algorithms_and_data_structures - 180+ Algorithm & Data Structure Problems using C++

  •    C++

Note: Some of the code here is old and was written when I was learning C++. It might be possible that code is not safe or making wrong assumptions. Please use with caution. Pull requests are always welcome. Include contains single header implementation of data structures and some algorithms.

imagehash - 🌄 Perceptual image hashing for PHP

  •    PHP

A perceptual hash is a fingerprint of a multimedia file derived from various features from its content. Unlike cryptographic hash functions which rely on the avalanche effect of small changes in input leading to drastic changes in the output, perceptual hashes are "close" to one another if the features are similar. Perceptual hashes are a different concept compared to cryptographic hash functions like MD5 and SHA1. With cryptographic hashes, the hash values are random. The data used to generate the hash acts like a random seed, so the same data will generate the same result, but different data will create different results. Comparing two SHA1 hash values really only tells you two things. If the hashes are different, then the data is different. And if the hashes are the same, then the data is likely the same. In contrast, perceptual hashes can be compared -- giving you a sense of similarity between the two data sets.

torus - Torus - distributed storage system

  •    Go

Torus provides a resource pool and basic file primitives from a set of daemons running atop multiple nodes. These primitives are made consistent by being append-only and coordinated by [etcd]. From these primitives, a Torus server can support multiple types of volumes, the semantics of which can be broken into subprojects. It ships with a simple block-device volume plugin, but is extensible to more. Sharding is done via a consistent hash function, controlled in the simple case by a hash ring algorithm, but fully extensible to arbitrary maps, rack-awareness, and other nice features. The project name comes from this: a hash 'ring' plus a 'volume' is a torus.

frugally-deep - Header-only library for using Keras models in C++.

  •    C++

Would you like to build/train a model using Keras/Python? And would you like run the prediction (forward pass) on your model in C++ without linking your application against TensorFlow? Then frugally-deep is exactly for you. Layer types typically used in image recognition/generation are supported, making many popular model architectures possible (see Performance section).

nameof - Nameof operator for modern C++, simply obtain the name of a variable, type, function, macro, and enum

  •    C++

Header-only C++17 library provides nameof macros and functions to simply obtain the name of a variable, type, function, macro, and enum. Nameof returns std::string_view. If argument does not have name, returns empty string.

kissdb - (Keep It) Simple Stupid Database

  •    C

KISSDB is about the simplest key/value store you'll ever see, anywhere. It's written in plain vanilla C using only the standard string and FILE I/O functions, and should port to just about anything with a disk or something that acts like one. It stores keys and values of fixed length in a stupid-simple file format based on fixed-size hash tables. If a hash collision occurrs, a new "page" of hash table is appended to the database. The format is append-only. There is no delete. Puts that replace an existing value, however, will not grow the file as they will overwrite the existing entry.

FunctionalPlus - Functional Programming Library for C++. Write concise and readable C++ code.

  •    C++

helps you write concise and readable C++ code. Great code should mostly be self-documenting, but while using C++ in reality you can find yourself dealing with low-level stuff like iterators or hand-written loops that distract from the actual essence of your code.

xxHash - Extremely fast non-cryptographic hash algorithm

  •    C

xxHash is an Extremely fast Hash algorithm, running at RAM speed limits. It successfully completes the SMHasher test suite which evaluates collision, dispersion and randomness qualities of hash functions. Code is highly portable, and hashes are identical on all platforms (little / big endian).Q.Score is a measure of quality of the hash function. It depends on successfully passing SMHasher test set. 10 is a perfect score. Algorithms with a score < 5 are not listed on this table.


  •    C++

Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space. Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).

Essentials - General purpose utilities and hash functions for Android and Java (aka java-common)

  •    Java

Essentials are a collection of general-purpose classes we found useful in many occasions. This project is bare bones compared to a rich menu offered by Guava or Apache Commons. Essentials is not a framework, it's rather a small set of utilities to make Java standard approaches more convenient or more efficient.

C++ Hash Container Benchmark


C++ Hash Container Benchmark for STL map, C++0x unordered map, Boost unordered map, ATL map and ATL hash map for STL wide string and ATL CString.