Disk-backed-map - A simple java disk backed map

  •        938

A small library that provide a disk backed map implementation for storing large number of key value pairs. The map implementations (HashMap, HashTable) max out around 3-4Million keys/GB of memory for very simple key/value pairs and in most cases the limit is much lower. DiskBacked map on the other hand can store betweeen 16Million (64bit JVM) to 20Million(32bit JVM) keys/GB, regardless the size of the key/value pairs.

https://github.com/aloksingh/disk-backed-map
http://code.google.com/p/disk-backed-map

Tags
Implementation
License
Platform

   




Related Projects

Agrona - Library to build high-performance applications in Java and C++


Agrona provides a library of data structures and utility methods that are a common need when building high-performance applications in Java and C++. It supports Buffers, Map, Sets, Cache, Queues and lot more.

MapDB - Embedded Database Engine


MapDB is an embedded database engine. It provides Maps and other collections backed by disk or memory storage. It offers excellent performance comparable to java collections, but is not limited by GC overhead. It is also a full database engine with storage backends, transactions, cache algorithms, expiration and many other options. MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap memory.

t-digest - A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means


A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means. The t-digest algorithm is also very parallel friendly making it useful in map-reduce and parallel streaming applications. The t-digest construction algorithm uses a variant of 1-dimensional k-means clustering to produce a data structure that is related to the Q-digest. This t-digest data structure can be used to estimate quantiles or compute other rank statistics. The advantage of the t-digest over the Q-digest is that the t-digest can handle floating point values while the Q-digest is limited to integers. With small changes, the t-digest can handle any values from any ordered set that has something akin to a mean. The accuracy of quantile estimates produced by t-digests can be orders of magnitude more accurate than those produced by Q-digests in spite of the fact that t-digests are more compact when stored on disk.

PINCache - Fast, non-deadlocking parallel object cache for iOS, tvOS and OS X


PINCache is a fork of TMCache re-architected to fix issues with deadlocking caused by heavy use. It is a key/value store designed for persisting temporary objects that are expensive to reproduce, such as downloaded data or the results of slow processing. It is comprised of two self-similar stores, one in memory (PINMemoryCache) and one on disk (PINDiskCache), all backed by GCD and safe to access from multiple threads simultaneously. On iOS, PINMemoryCache will clear itself when the app receives a memory warning or goes into the background. Objects stored in PINDiskCache remain until you trim the cache yourself, either manually or by setting a byte or age limit. Both PINMemoryCache and PINDiskCache use locks to protect reads and writes. PINCache coordinates them so that objects added to memory are available immediately to other threads while being written to disk safely in the background. Both caches are public properties of PINCache, so it's easy to manipulate one or the other separately if necessary.

python-diskcache - Python disk backed cache (Django-compatible).


DiskCache is an Apache2 licensed disk and file backed cache library, written in pure-Python, and compatible with Django. Note: Micro-benchmarks have their place but are not a substitute for real measurements. DiskCache offers cache benchmarks to defend its performance claims. Micro-optimizations are avoided but your mileage may vary.


Chronicle Map - High performance, off-heap, key-value, in memory, persisted data store


Chronicle Map is a high performance, off-heap, key-value, in memory, persisted data store. It works like a standard java map yet it automatically distributes data between processes, these processes can be both on the same server or across your network. In other words its a low latency, huge data key value store, which can store terabytes of data locally to your process.

JDBM3 - Embedded Key Value Java Database


JDBM provides TreeMap, HashMap and other collections backed up by disk storage. Now you can handle billions of items without ever running out of memory. JDBM is probably the fastest and the simpliest pure Java database. JDBM is tiny (160KB nodeps jar), but packed with features such as transactions, instance cache and space efficient serialization. It also has outstanding performance with 1 million inserts per second and 10 million fetches per second (disk based!!). It is tightly optimized and has minimal overhead. It scales well from Android phone to multi-terrabyte data sets.

diskv - A disk-backed key-value store.


Diskv (disk-vee) is a simple, persistent key-value store written in the Go language. It starts with an incredibly simple API for storing arbitrary data on a filesystem by key, and builds several layers of performance-enhancing abstraction on top. The end result is a conceptually simple, but highly performant, disk-backed storage system.More complex examples can be found in the "examples" subdirectory.

BleachBit - Clean Your System and Free Disk Space


BleachBit deletes unnecessary files to free valuable disk space, maintain privacy, and remove junk. Rid your system of old clutter including cache, Internet history, temporary files, cookies, and broken shortcuts. Designed for Linux and Windows systems, it wipes clean Google Chrome, Firefox, Adobe Flash, and more.

durable-queue - a disk-backed queue for clojure


This library implements a disk-backed task queue, allowing for queues that can survive processes dying, and whose size is bounded by available disk rather than memory. It is a small, purely-Clojure implementation focused entirely on the in-process use case, meaning that it is both simpler and more easily embedded than network-aware queue implementations such as Kafka and ActiveMQ. Notice that the task has a value describing its progress, and a value describing the task itself. We can get the task descriptor by dereferencing the returned task. Note that since the task is persisted to disk and anything on disk may be corrupted, this involves a checksum which may fail and throw an IOException. Any software which wants to be robust to all failure modes should always dereference within a try/catch clause.

Feather cache


featherCache is an lightweigth java cache, implementing Map interface and storing objects on the disk. Multiple sections and expriation times are supported. No third party components, simple integration.

Redisson - Redis based In-Memory Data Grid for Java


Redisson - distributed Java objects and services (Set, Multimap, SortedSet, Map, List, Queue, BlockingQueue, Deque, BlockingDeque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Executor service, Tomcat Session Manager, Scheduler service, JCache API) on top of Redis server. Rich Redis client.

SDURLCache - URLCache subclass with on-disk cache support on iPhone/iPad


On iPhone OS, Apple did remove on-disk cache support for unknown reason. Some will say it's to save flash-drive life, others will arg it's to save disk capacity. As it is explained in the NSURLCacheStoragePolicy, the NSURLCacheStorageAllowed constant is always treated as NSURLCacheStorageAllowedInMemoryOnly and there is no way to force it back, the code is certainly gone on this platform. For whatever reason Apple removed this feature, you may be interested by having on-disk HTTP request caching in your application. SDURLCache gives back this feature to this iPhone OS for you.To use it, you just have create an instance, replace the default shared NSURLCache with it and that's it, you instantly give on-disk HTTP request caching capability to your application.

LucidDB - RDBMS built entirely for Data Warehousing and Business Intelligence


LucidDB is the RDBMS built entirely for data warehousing and business intelligence. It is based on architectural cornerstones such as column-store, bitmap indexing, hash join/aggregation, and page-level multi versioning. Every component of LucidDB was designed with the requirements of flexible, high-performance data integration and sophisticated query processing in mind.

MegaMap


A Map (or HashTable) for Java that is backed by the filesystem to provide unbounded size. Additionally, MegaMaps may remain on disk between VM invocations, providing persistent hashtables.

Sophia - Advanced transactional MVCC key-value/row storage library


Sophia is RAM-Disk hybrid storage. It is designed to provide best possible on-disk performance without degradation in time. It has guaranteed O(1) worst case complexity for read, write and range scan operations. It provides Full ACID compliancy, MVCC engine, Optimistic, non-blocking concurrency with N-writers and M-readers, Prefix search, Automatic key-expire, Implemented as small C-written library with zero dependencies and lot more.

fatcache - Memcache on SSD


fatcache is memcache on SSD. Think of fatcache as a cache for your big data.There are two ways to think of SSDs in system design. One is to think of SSD as an extension of disk, where it plays the role of making disks fast and the other is to think of them as an extension of memory, where it plays the role of making memory fat. The latter makes sense when persistence (non-volatility) is unnecessary and data is accessed over the network. Even though memory is thousand times faster than SSD, network connected SSD-backed memory makes sense, if we design the system in a way that network latencies dominate over the SSD latencies by a large factor.

Spark - Fast Cluster Computing


Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.

mapstructure - Go library for decoding generic map values into native Go structures.


mapstructure is a Go library for decoding generic map values to structures and vice versa, while providing helpful error handling.This library is most useful when decoding values from some data stream (JSON, Gob, etc.) where you don't quite know the structure of the underlying data until you read a part of it. You can therefore read a map[string]interface{} and use this library to decode it into the proper underlying native Go structure.