Cloudata - Structured Data Storage implementing Google's Bigtable.

  •        0

Cloudata is Distributed Large scale Structured Data Storage, and open source project implementing Google's Bigtable. It's DBMS(Database Management System), but not Relational DBMS. It can store more than Peta bytes.



Related Projects

Hypertable - A high performance, scalable, distributed storage and processing system for structured

Hypertable is based on Google's Bigtable Design, which is a proven scalable design that powers hundreds of Google services. Many of the current scalable NoSQL database offerings are based on a hash table design which means that the data they manage is not kept physically ordered. Hypertable keeps data physically sorted by a primary key and it is well suited for Analytics.

HBase - Hadoop database

HBase provides support to handle BigTable - billions of rows X millions of columns. It is a scalable, distributed, versioned, column-oriented store modeled after Google's Bigtable and runs on top of HDFS (Hadoop Distributed Filesystem). It features compression, in-memory operation per-column. Data could be replicated between the nodes. HBase is used in Facebook and Twitter.

Cassandra - Scalable Distributed Database

The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. Cassandra is suitable for applications that can't afford to lose data. Data is automatically replicated to multiple nodes for fault-tolerance.

cloudata - Distributed Structured Data Store, NoSQL, Bigtable

Distributed Structured Data Store, NoSQL, Bigtable

BigchainDB - The Scalable Blockchain Database

BigchainDB allows developers and enterprise to deploy blockchain proof-of-concepts, platforms and applications with a scalable blockchain database, supporting a wide range of industries and use cases. It is a decentralization ecosystem: a decentralized database, at scale. It can perform 1 million writes per second throughput, store petabytes of data, and sub-second latency.

Apache Accumulo - Key Value Store based on Google BigTable

The Apache Accumulo sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system. Apache Accumulo is based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. Apache Accumulo features a few novel improvements on the BigTable design in the form of cell-based access control and a server-side programming mechanism that can modify key/value pairs at various points in the data management process.

RethinkDB - Distributed JSON database

RethinkDB is built to store JSON documents, and scale to multiple machines with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by, and is easy to setup and learn. It supports JSON data model, Distributed joins, subqueries, aggregation, atomic updates, Hadoop-style map/reduce.

BangDB - NoSQL for Real Time Performance

Bangdb is pure vanilla key value nosql data store. The goal of bangdb is to be fast, reliable, robust, scalable and easy to use data store for various data management services required by applications. Bangdb comes in flavors like Embedded In memory, Network, Distributed data grid/ Elastic Cache. The bangdb is highly concurrent and runs parallel operations as much as possible.

ScyllaDB - NoSQL Column Store Database compatible with Cassandra

Scylladb is a Cassandra compatible NoSQL column store which can do 1MM transactions/sec per server. It scales up linearly with number of cores.

FlockDB - A distributed, fault-tolerant graph database from Twitter

FlockDB is much simpler than other graph databases such as neo4j because it tries to solve fewer problems. It scales horizontally and is designed for on-line, low-latency, high throughput environments such as web-sites. Twitter uses FlockDB to store social graphs (who follows whom, who blocks whom) and secondary indices. As of April 2010, the Twitter FlockDB cluster stores 13+ billion edges and sustains peak traffic of 20k writes/second and 100k reads/second.

Bagri - XML/Document DB on top of distributed cache

Bagri is a Document Database built on top of distributed cache solution like Hazelcast or Coherence. The system allows to process semi-structured schema-less documents and perform distributed queries on them in real-time. It scales horizontally very well with use of data sharding, when all documents are distributed evenly between distributed cache partitions.

Usergrid - The BaaS Framework you run

Usergrid is an open-source Backend-as-a-Service (“BaaS” or “mBaaS”) composed of an integrated distributed NoSQL database, application layer and client tier with SDKs for developers looking to rapidly build web and/or mobile applications. It provides elementary services (user registration & management, data storage, file storage, queues) and retrieval features (full text search, geolocation search, joins) to power common app features.

Pinot - A realtime distributed OLAP datastore

Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally, so that it can scale to larger data sets and higher query rates as needed.

Kairosdb - Fast distributed scalable time series database written on top of Cassandra

KairosDB is a fast distributed scalable time series database written on top of Cassandra. Data can be pushed in KairosDB via multiple protocols : Telnet, Rest, Graphite. KairosDB stores time series in Cassandra, the popular and performant NoSQL datastore. It supports aggregators which can perform an operation on data points and down samples. Standard functions like min, max, sum, count, mean etc.

OpenTSDB - A scalable, distributed Time Series Database.

OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable.

havalo - Non Distributed NoSQL Key Value Store

A zero configuration, non-distributed NoSQL key-value store that runs in any Servlet 3.0 compatible container. With Havalo, simply drop havalo.war into your favorite Servlet 3.0 compatible container and with almost no configuration you'll have access to a fast and lightweight K,V store backed by any local mount point for persistent storage.

SenseiDB - Distributed, Realtime, Semi-Structured Database from LinkedIn

Sensei is a distributed data system that was built to support many product initiatives at LinkedIn, including the real-time faceted search in LinkedIn Signal and the news feed and tabs on the Homepage. Sensei is both a search engine and a database. It is designed to query and navigate through documents that consist of unstructured text and well-formed and structured metadata. Sensei is both a search engine and a database.

Crate - The fast, scalable, easy to use SQL database with native full text search

Crate is an open source, highly scalable, shared-nothing distributed SQL database. Crate offers the scalability and performance of a modern No-SQL database with the power of Standard SQL. Crate’s distributed SQL query engine lets you use the same syntax that already exists in your applications or integrations, and have queries seamlessly executed across the crate cluster, including any aggregations, if needed.

VoltDB - Fast Scalable SQL DBMS with ACID

VoltDB was specifically designed for contemporary software applications that are pushed beyond their limits by high volume data sources. VoltDB provides the ability to capture, store and process incoming data at millions of read/write operations per second. And VoltDB’s relational model opens that data to be analyzed in real-time, using familiar Business Intelligence tools, to identify data patterns and trends, spot anomalies, or perform tracking and alerting.

Titan - Scalable Graph Database

Titan is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. Titan is a transactional database that can support thousands of concurrent users executing complex graph traversals. It is a native Blueprints enabled graph database and as such, it supports the full TinkerPop stack of technologies.