Hypertable - A high performance, scalable, distributed storage and processing system for structured

  •        0

Hypertable is based on Google's Bigtable Design, which is a proven scalable design that powers hundreds of Google services. Many of the current scalable NoSQL database offerings are based on a hash table design which means that the data they manage is not kept physically ordered. Hypertable keeps data physically sorted by a primary key and it is well suited for Analytics.

This project is for the design and implementation of a high performance, scalable, distributed storage and processing system for structured and unstructured data. It is designed to manage the storage and processing of information on a large cluster of commodity servers, providing resilience to machine and component failures. Data is represented in the system as a multi-dimensional table of information. The data in a table can be transformed and organized at high speed by performing computations in parallel, pushing them to where the data is physically stored.




comments powered by Disqus

Related Projects

Cassandra - Scalable Distributed Database

The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. Cassandra is suitable for applications that can't afford to lose data. Data is automatically replicated to multiple nodes for fault-tolerance.

HBase - Hadoop database

HBase provides support to handle BigTable - billions of rows X millions of columns. It is a scalable, distributed, versioned, column-oriented store modeled after Google's Bigtable and runs on top of HDFS (Hadoop Distributed Filesystem). It features compression, in-memory operation per-column. Data could be replicated between the nodes. HBase is used in Facebook and Twitter.

Project-voldemort - A distributed database, Clone of Amazon's Dynamo

Voldemort is a distributed key-value storage system. Data is automatically replicated over multiple servers. Data is automatically partitioned so each server contains only a subset of the total data. Server failure is handled transparently. It is used at LinkedIn for certain high-scalability storage problems where simple functional partitioning is not sufficient.

HyperGraphDB - Database for Storing Strongly-Typed Hypergraphs

HyperGraphDB is a general purpose, open-source data storage mechanism based on a powerful knowledge management formalism known as directed hypergraphs. While a persistent memory model designed mostly for Knowledge management, Artificial Intelligence and Semantic web projects, it can also be used as an embedded object-oriented database for Java projects of all sizes. It could also be used as graph database or as (non-SQL) relational database.


Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.


Ehcache is an open source, standards-based cache used to boost performance, offload the database and simplify scalability. Ehcache is robust, proven and full-featured and this has made it the most widely-used Java-based cache.

Infinispan - Key value NOSQL data store and data grid

Infinispan is an extremely scalable, highly available key/value NoSQL datastore and distributed data grid platform. The purpose of Infinispan is to expose a data structure that is highly concurrent, designed ground-up to make the most of modern multi-processor/multi-core architectures while at the same time providing distributed cache capabilities. Infinispan offers enterprise features such as efficient eviction algorithms to control memory usage as well as JTA compatibility.

Memcached - distributed object caching system

Memcached is high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load. Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.

Terrastore - Scalable, elastic, consistent document store.

Terrastore is a modern document store which provides advanced scalability and elasticity features without sacrificing consistency. Terrastore is based on Terracotta, so it relies on an industry-proven, fast (and cool) clustering technology. Terrastore is accessed through the universally supported HTTP protocol. Terrastore is a distributed document store supporting single-cluster and multi-cluster deployments.


Sphinix is free open-source SQL full-text search engine. How do you implement full-text search for that 10+ million row table, keep up with the load, and stay relevant? Sphinx is good at those kinds of riddles.

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.

Tag Cloud >>