Disk-backed-map - A simple java disk backed map

  •        1029

A small library that provide a disk backed map implementation for storing large number of key value pairs. The map implementations (HashMap, HashTable) max out around 3-4Million keys/GB of memory for very simple key/value pairs and in most cases the limit is much lower. DiskBacked map on the other hand can store betweeen 16Million (64bit JVM) to 20Million(32bit JVM) keys/GB, regardless the size of the key/value pairs.

https://github.com/aloksingh/disk-backed-map
http://code.google.com/p/disk-backed-map

Tags
Implementation
License
Platform

   




Related Projects

Agrona - Library to build high-performance applications in Java and C++

  •    Java

Agrona provides a library of data structures and utility methods that are a common need when building high-performance applications in Java and C++. It supports Buffers, Map, Sets, Cache, Queues and lot more.

MapDB - Embedded Database Engine

  •    Java

MapDB is an embedded database engine. It provides Maps and other collections backed by disk or memory storage. It offers excellent performance comparable to java collections, but is not limited by GC overhead. It is also a full database engine with storage backends, transactions, cache algorithms, expiration and many other options. MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap memory.

t-digest - A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means

  •    Java

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means. The t-digest algorithm is also very parallel friendly making it useful in map-reduce and parallel streaming applications. The t-digest construction algorithm uses a variant of 1-dimensional k-means clustering to produce a data structure that is related to the Q-digest. This t-digest data structure can be used to estimate quantiles or compute other rank statistics. The advantage of the t-digest over the Q-digest is that the t-digest can handle floating point values while the Q-digest is limited to integers. With small changes, the t-digest can handle any values from any ordered set that has something akin to a mean. The accuracy of quantile estimates produced by t-digests can be orders of magnitude more accurate than those produced by Q-digests in spite of the fact that t-digests are more compact when stored on disk.

AwesomeCache - Delightful on-disk cache (written in Swift)

  •    Swift

Delightful on-disk cache (written in Swift). Backed by NSCache for maximum performance and support for expiry of single objects. AwesomeCache >= 3.0 is designed to have a sync API, making it easy to reason about the actual contents of the cache. This decision has been made based on feedback from the community, to keep the API of AwesomeCache small and easy to use.

PINCache - Fast, non-deadlocking parallel object cache for iOS, tvOS and OS X

  •    Objective-C

PINCache is a fork of TMCache re-architected to fix issues with deadlocking caused by heavy use. It is a key/value store designed for persisting temporary objects that are expensive to reproduce, such as downloaded data or the results of slow processing. It is comprised of two self-similar stores, one in memory (PINMemoryCache) and one on disk (PINDiskCache), all backed by GCD and safe to access from multiple threads simultaneously. On iOS, PINMemoryCache will clear itself when the app receives a memory warning or goes into the background. Objects stored in PINDiskCache remain until you trim the cache yourself, either manually or by setting a byte or age limit. Both PINMemoryCache and PINDiskCache use locks to protect reads and writes. PINCache coordinates them so that objects added to memory are available immediately to other threads while being written to disk safely in the background. Both caches are public properties of PINCache, so it's easy to manipulate one or the other separately if necessary.


python-diskcache - Python disk backed cache (Django-compatible).

  •    Python

DiskCache is an Apache2 licensed disk and file backed cache library, written in pure-Python, and compatible with Django. Note: Micro-benchmarks have their place but are not a substitute for real measurements. DiskCache offers cache benchmarks to defend its performance claims. Micro-optimizations are avoided but your mileage may vary.

Chronicle Map - High performance, off-heap, key-value, in memory, persisted data store

  •    Java

Chronicle Map is a high performance, off-heap, key-value, in memory, persisted data store. It works like a standard java map yet it automatically distributes data between processes, these processes can be both on the same server or across your network. In other words its a low latency, huge data key value store, which can store terabytes of data locally to your process.

barefoot - Java library for integrating the map into software and services with state-of-the-art online and offline map matching that can be used stand-alone and in the cloud

  •    Java

An open source Java library for online and offline map matching with OpenStreetMap. Together with its extensive set of geometric and spatial functions, an in-memory map data structure and basic machine learning functions, it is a versatile basis for scalable location-based services and spatio-temporal data analysis on the map. It is designed for use in parallel and distributed systems and, hence, includes a stand-alone map matching server and can be used in distributed systems for map matching services in the cloud. Barefoot consists of a software library and a (Docker-based) map server that provides access to street map data from OpenStreetMap and is flexible to be used in distributed cloud infrastructures as map data server or side-by-side with Barefoot's stand-alone servers for offline (matcher server) and online map matching (tracker server), or other applications built with Barefoot library. Access to map data is provided with a fast and flexible in-memory map data structure. Together with GeographicLib [1] and ESRI's geometry API [2], it provides an extensive set of geographic and geometric operations for spatial data analysis on the map.

JDBM3 - Embedded Key Value Java Database

  •    Java

JDBM provides TreeMap, HashMap and other collections backed up by disk storage. Now you can handle billions of items without ever running out of memory. JDBM is probably the fastest and the simpliest pure Java database. JDBM is tiny (160KB nodeps jar), but packed with features such as transactions, instance cache and space efficient serialization. It also has outstanding performance with 1 million inserts per second and 10 million fetches per second (disk based!!). It is tightly optimized and has minimal overhead. It scales well from Android phone to multi-terrabyte data sets.

OHC - Java large off heap cache

  •    Java

Off-Heap concurrent hash map intended to store GBs of serialized data. It has optional per entry or default TTL/expireAt, Entry eviction and expiration without a separate thread, Capable of maintaining huge amounts of cache memory, Suitable for tiny/small entries with low overhead using the chunked implementation.

diskv - A disk-backed key-value store.

  •    Go

Diskv (disk-vee) is a simple, persistent key-value store written in the Go language. It starts with an incredibly simple API for storing arbitrary data on a filesystem by key, and builds several layers of performance-enhancing abstraction on top. The end result is a conceptually simple, but highly performant, disk-backed storage system.More complex examples can be found in the "examples" subdirectory.

BleachBit - Clean Your System and Free Disk Space

  •    Python

BleachBit deletes unnecessary files to free valuable disk space, maintain privacy, and remove junk. Rid your system of old clutter including cache, Internet history, temporary files, cookies, and broken shortcuts. Designed for Linux and Windows systems, it wipes clean Google Chrome, Firefox, Adobe Flash, and more.

durable-queue - a disk-backed queue for clojure

  •    Clojure

This library implements a disk-backed task queue, allowing for queues that can survive processes dying, and whose size is bounded by available disk rather than memory. It is a small, purely-Clojure implementation focused entirely on the in-process use case, meaning that it is both simpler and more easily embedded than network-aware queue implementations such as Kafka and ActiveMQ. Notice that the task has a value describing its progress, and a value describing the task itself. We can get the task descriptor by dereferencing the returned task. Note that since the task is persisted to disk and anything on disk may be corrupted, this involves a checksum which may fail and throw an IOException. Any software which wants to be robust to all failure modes should always dereference within a try/catch clause.

Feather cache

  •    Java

featherCache is an lightweigth java cache, implementing Map interface and storing objects on the disk. Multiple sections and expriation times are supported. No third party components, simple integration.

TMCache - Fast parallel object cache for iOS and OS X.

  •    Objective-C

⚠️ ⚠️ ⚠️ TMCACHE IS NO LONGER BEING ACTIVELY MAINTAINED. DETAILS HERE. TMCache is a key/value store designed for persisting temporary objects that are expensive to reproduce, such as downloaded data or the results of slow processing. It is comprised of two self-similar stores, one in memory (TMMemoryCache) and one on disk (TMDiskCache), all backed by GCD and safe to access from multiple threads simultaneously. On iOS, TMMemoryCache will clear itself when the app receives a memory warning or goes into the background. Objects stored in TMDiskCache remain until you trim the cache yourself, either manually or by setting a byte or age limit.

Redisson - Redis based In-Memory Data Grid for Java

  •    Java

Redisson - distributed Java objects and services (Set, Multimap, SortedSet, Map, List, Queue, BlockingQueue, Deque, BlockingDeque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Executor service, Tomcat Session Manager, Scheduler service, JCache API) on top of Redis server. Rich Redis client.

SDURLCache - URLCache subclass with on-disk cache support on iPhone/iPad

  •    Objective-C

On iPhone OS, Apple did remove on-disk cache support for unknown reason. Some will say it's to save flash-drive life, others will arg it's to save disk capacity. As it is explained in the NSURLCacheStoragePolicy, the NSURLCacheStorageAllowed constant is always treated as NSURLCacheStorageAllowedInMemoryOnly and there is no way to force it back, the code is certainly gone on this platform. For whatever reason Apple removed this feature, you may be interested by having on-disk HTTP request caching in your application. SDURLCache gives back this feature to this iPhone OS for you.To use it, you just have create an instance, replace the default shared NSURLCache with it and that's it, you instantly give on-disk HTTP request caching capability to your application.

LucidDB - RDBMS built entirely for Data Warehousing and Business Intelligence

  •    Java

LucidDB is the RDBMS built entirely for data warehousing and business intelligence. It is based on architectural cornerstones such as column-store, bitmap indexing, hash join/aggregation, and page-level multi versioning. Every component of LucidDB was designed with the requirements of flexible, high-performance data integration and sophisticated query processing in mind.

MegaMap

  •    Java

A Map (or HashTable) for Java that is backed by the filesystem to provide unbounded size. Additionally, MegaMaps may remain on disk between VM invocations, providing persistent hashtables.

hat-trie - C++ implementation of a fast and memory efficient HAT-trie

  •    C++

Trie implementation based on the "HAT-trie: A Cache-conscious Trie-based Data Structure for Strings." (Askitis Nikolas and Sinha Ranjan, 2007) paper. For now, only the pure HAT-trie has been implemented, the hybrid version may arrive later. Details regarding the HAT-trie data structure can be found here. The library provides an efficient and compact way to store a set or a map of strings by compressing the common prefixes. It also allows to search for keys that match a prefix. Note though that the default parameters of the structure are geared toward optimizing exact searches, if you do a lot of prefix searches you may want to reduce the burst threshold through the burst_threshold method.