Cassandra - Scalable Distributed Database

  •        4114

The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. Cassandra is suitable for applications that can't afford to lose data. Data is automatically replicated to multiple nodes for fault-tolerance.

Cassandra provides support for Scale out, load balancing, cluster growth, Flexible schema, Key-oriented queries and CAP theorem (Consistency, Availability, Partition tolerance). It is in use at Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco and more companies.

http://cassandra.apache.org/

Tags
Implementation
License
Platform

   




Related Projects

Hypertable - A high performance, scalable, distributed storage and processing system for structured

  •    C++

Hypertable is based on Google's Bigtable Design, which is a proven scalable design that powers hundreds of Google services. Many of the current scalable NoSQL database offerings are based on a hash table design which means that the data they manage is not kept physically ordered. Hypertable keeps data physically sorted by a primary key and it is well suited for Analytics.

ScyllaDB - NoSQL Column Store Database compatible with Cassandra

  •    C++

Scylladb is a Cassandra compatible NoSQL column store which can do 1MM transactions/sec per server. It scales up linearly with number of cores.

Cloudata - Structured Data Storage implementing Google's Bigtable.

  •    Java

Cloudata is Distributed Large scale Structured Data Storage, and open source project implementing Google's Bigtable. It's DBMS(Database Management System), but not Relational DBMS. It can store more than Peta bytes.

YugaByte Database - Transactional, high-performance database for building internet-scale, globally-distributed applications

  •    C++

A cloud-native database for building mission-critical applications. This repository contains the Community Edition of the YugaByte Database.YugaByte offers both SQL and NoSQL in a single, unified db. It is meant to be a system-of-record/authoritative database that applications can rely on for correctness and availability. It allows applications to easily scale up and scale down in the cloud, on-premises or across hybrid environments without creating operational complexity or increasing the risk of outages.

BigchainDB - The Scalable Blockchain Database

  •    Python

BigchainDB allows developers and enterprise to deploy blockchain proof-of-concepts, platforms and applications with a scalable blockchain database, supporting a wide range of industries and use cases. It is a decentralization ecosystem: a decentralized database, at scale. It can perform 1 million writes per second throughput, store petabytes of data, and sub-second latency.


Kairosdb - Fast distributed scalable time series database written on top of Cassandra

  •    Java

KairosDB is a fast distributed scalable time series database written on top of Cassandra. Data can be pushed in KairosDB via multiple protocols : Telnet, Rest, Graphite. KairosDB stores time series in Cassandra, the popular and performant NoSQL datastore. It supports aggregators which can perform an operation on data points and down samples. Standard functions like min, max, sum, count, mean etc.

HBase - Hadoop database

  •    Java

HBase provides support to handle BigTable - billions of rows X millions of columns. It is a scalable, distributed, versioned, column-oriented store modeled after Google's Bigtable and runs on top of HDFS (Hadoop Distributed Filesystem). It features compression, in-memory operation per-column. Data could be replicated between the nodes. HBase is used in Facebook and Twitter.

RethinkDB - Distributed JSON database

  •    C++

RethinkDB is built to store JSON documents, and scale to multiple machines with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by, and is easy to setup and learn. It supports JSON data model, Distributed joins, subqueries, aggregation, atomic updates, Hadoop-style map/reduce.

Aerospike Database Server – Flash-optimized, in-memory, nosql database

  •    C

Aerospike is a distributed, scalable NoSQL database. It provides support to create a high-performance, scalable platform that would meet the needs of today's web-scale applications. It supports operational efficiency, robustness and reliability expected from traditional databases.

Pinot - A realtime distributed OLAP datastore

  •    Java

Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally, so that it can scale to larger data sets and higher query rates as needed.

BangDB - NoSQL for Real Time Performance

  •    C++

Bangdb is pure vanilla key value nosql data store. The goal of bangdb is to be fast, reliable, robust, scalable and easy to use data store for various data management services required by applications. Bangdb comes in flavors like Embedded In memory, Network, Distributed data grid/ Elastic Cache. The bangdb is highly concurrent and runs parallel operations as much as possible.

FlockDB - A distributed, fault-tolerant graph database from Twitter

  •    Scala

FlockDB is much simpler than other graph databases such as neo4j because it tries to solve fewer problems. It scales horizontally and is designed for on-line, low-latency, high throughput environments such as web-sites. Twitter uses FlockDB to store social graphs (who follows whom, who blocks whom) and secondary indices. As of April 2010, the Twitter FlockDB cluster stores 13+ billion edges and sustains peak traffic of 20k writes/second and 100k reads/second.

CockroachDB - Cloud-native SQL database.

  •    Go

CockroachDB is a cloud-native SQL database for building global, scalable cloud services that survive disasters.CockroachDB is a distributed SQL database built on a transactional and strongly-consistent key-value store. It scales horizontally; survives disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention; supports strongly-consistent ACID transactions; and provides a familiar SQL API for structuring, manipulating, and querying data.

radon - RadonDB is an open source, cloud-native MySQL database for building global, scalable cloud services

  •    Go

RadonDB is an open source, Cloud-native MySQL database for unlimited scalability and performance. RadonDB is a cloud-native database based on MySQL,and architected in fully distributed cluster that enable unlimited scalability (scale-out), capacity and performance. It supported distributed transaction that ensure high data consistency, and leveraged MySQL as storage engine for trusted data reliability. RadonDB is compatible with MySQL protocol, and sup-porting automatic table sharding as well as batch of automation feature for simplifying the maintenance and operation workflow.

Elassandra - Elasticsearch + Apache Cassandra

  •    Java

Elassandra is a fork of Elasticsearch modified to run as a plugin for Apache Cassandra in a scalable and resilient peer-to-peer architecture. Elasticsearch code is embedded in Cassanda nodes providing advanced search features on Cassandra tables and Cassandra serve as an Elasticsearch data and configuration store. It supports Cassandra vnodes and scales horizontally by adding more nodes.

TiDB - Distributed NewSQL database compatible with MySQL protocol

  •    Go

TiDB is a distributed SQL database. Inspired by the design of Google F1 and Google Spanner, TiDB supports the best features of both traditional RDBMS and NoSQL. It is horizontally scalable, grow TiDB as your business grows. You can increase the capacity simply by adding more machines.

EventQL - The database for large-scale event analytics

  •    C++

EventQL is a distributed, column-oriented database built for large-scale event collection and analytics. It runs super-fast SQL and MapReduce queries. Its features include Automatic partitioning, Columnar storage, Standard SQL support, Scales to petabytes, Timeseries and relational data, Fast range scans and lot more.

tidis - Distributed transactional NoSQL database, Redis protocol compatible using tikv as backend

  •    Go

Tidis is a Distributed NoSQL database, providing a Redis protocol API (string, list, hash, set, sorted set), written in Go. Tidis is like TiDB layer, providing protocol transform and data structure compute, powered by TiKV backend distributed storage which use Raft for data replication and 2PC for distributed transaction.

Bagri - XML/Document DB on top of distributed cache

  •    Java

Bagri is a Document Database built on top of distributed cache solution like Hazelcast or Coherence. The system allows to process semi-structured schema-less documents and perform distributed queries on them in real-time. It scales horizontally very well with use of data sharding, when all documents are distributed evenly between distributed cache partitions.

Usergrid - The BaaS Framework you run

  •    Java

Usergrid is an open-source Backend-as-a-Service (“BaaS” or “mBaaS”) composed of an integrated distributed NoSQL database, application layer and client tier with SDKs for developers looking to rapidly build web and/or mobile applications. It provides elementary services (user registration & management, data storage, file storage, queues) and retrieval features (full text search, geolocation search, joins) to power common app features.