HBase - Hadoop database

  •        4525

HBase provides support to handle BigTable - billions of rows X millions of columns. It is a scalable, distributed, versioned, column-oriented store modeled after Google's Bigtable and runs on top of HDFS (Hadoop Distributed Filesystem). It features compression, in-memory operation per-column. Data could be replicated between the nodes. HBase is used in Facebook and Twitter.

http://hbase.apache.org/
https://github.com/apache/hbase

Tags
Implementation
License
Platform

   




Related Projects

Hypertable - A high performance, scalable, distributed storage and processing system for structured


Hypertable is based on Google's Bigtable Design, which is a proven scalable design that powers hundreds of Google services. Many of the current scalable NoSQL database offerings are based on a hash table design which means that the data they manage is not kept physically ordered. Hypertable keeps data physically sorted by a primary key and it is well suited for Analytics.

EventQL - The database for large-scale event analytics


EventQL is a distributed, column-oriented database built for large-scale event collection and analytics. It runs super-fast SQL and MapReduce queries. Its features include Automatic partitioning, Columnar storage, Standard SQL support, Scales to petabytes, Timeseries and relational data, Fast range scans and lot more.

ActorDB - Distributed SQL database with linear scalability


ActorDB is ideal as a server side database for apps. Think of running a large mail service, dropbox, evernote, etc. They all require server side storage for user data, but the vast majority of queries is within a specific user. With many users, the server side database can get very large. Using ActorDB you can keep a full relational database for every user and not be forced into painful scaling strategies that require you to throw away everything that makes relational databases good.

Crate - The fast, scalable, easy to use SQL database with native full text search


Crate is an open source, highly scalable, shared-nothing distributed SQL database. Crate offers the scalability and performance of a modern No-SQL database with the power of Standard SQL. Crate’s distributed SQL query engine lets you use the same syntax that already exists in your applications or integrations, and have queries seamlessly executed across the crate cluster, including any aggregations, if needed.

Pinot - A realtime distributed OLAP datastore


Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally, so that it can scale to larger data sets and higher query rates as needed.


rqlite - The lightweight, distributed relational database built on SQLite.


rqlite is a distributed relational database, which uses SQLite as its storage engine. rqlite uses Raft to achieve consensus across all the instances of the SQLite databases, ensuring that every change made to the system is made to a quorum of SQLite databases, or none at all. It also gracefully handles leader elections, and tolerates failures of machines, including the leader. rqlite is available for Linux, OSX, and Microsoft Windows.rqlite gives you the functionality of a rock solid, fault-tolerant, replicated relational database, but with very easy installation, deployment, and operation. With it you've got a lightweight and reliable distributed relational data store. Think etcd or Consul, but with relational data modelling also available.

citus - Scalable PostgreSQL for multi-tenant and real-time workloads


Citus is a distributed database that scales across commodity servers using transparent sharding and replication. Citus extends the underlying database rather than forking it, giving developers and enterprises the power and familiarity of a relational database. As an extension, Citus supports new PostgreSQL releases, and allows you to benefit from new features while maintaining compatibility with existing PostgreSQL tools. Multi-tenant database: Most B2B applications already have the notion of a tenant / customer / account built into their data model. Citus allows you to scale out your transactional relational database to 100K+ tenants with minimal changes to your application.

VoltDB - Fast Scalable SQL DBMS with ACID


VoltDB was specifically designed for contemporary software applications that are pushed beyond their limits by high volume data sources. VoltDB provides the ability to capture, store and process incoming data at millions of read/write operations per second. And VoltDB’s relational model opens that data to be analyzed in real-time, using familiar Business Intelligence tools, to identify data patterns and trends, spot anomalies, or perform tracking and alerting.

HyperGraphDB - Database for Storing Strongly-Typed Hypergraphs


HyperGraphDB is a general purpose, open-source data storage mechanism based on a powerful knowledge management formalism known as directed hypergraphs. While a persistent memory model designed mostly for Knowledge management, Artificial Intelligence and Semantic web projects, it can also be used as an embedded object-oriented database for Java projects of all sizes. It could also be used as graph database or as (non-SQL) relational database.

OpenTSDB - A scalable, distributed Time Series Database.


OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable.

CockroachDB - Cloud-native SQL database.


CockroachDB is a cloud-native SQL database for building global, scalable cloud services that survive disasters.CockroachDB is a distributed SQL database built on a transactional and strongly-consistent key-value store. It scales horizontally; survives disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention; supports strongly-consistent ACID transactions; and provides a familiar SQL API for structuring, manipulating, and querying data.

Cassandra - Scalable Distributed Database


The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. Cassandra is suitable for applications that can't afford to lose data. Data is automatically replicated to multiple nodes for fault-tolerance.

MapD - The MapD Core database


MapD Core is an in-memory, column store, SQL relational database that was designed from the ground up to run on GPUs. MapD Core is the foundational element of a larger data exploration platform that emphasizes speed at scale. By taking advantage of the parallel processing power of the hardware, MapD Core can query billions of rows in milliseconds. Furthermore, by using the graphics pipelines of GPUs, MapD Core can render graphics directly from the server.

Cloudata - Structured Data Storage implementing Google's Bigtable.


Cloudata is Distributed Large scale Structured Data Storage, and open source project implementing Google's Bigtable. It's DBMS(Database Management System), but not Relational DBMS. It can store more than Peta bytes.

Infobright - The Database for Analytics


Infobright combines a columnar database with our Knowledge Grid architecture to deliver a self-managing, self-tuning database optimized for analytics. Infobright eliminates the need to create indexes, partition data, or do any manual tuning to achieve fast response for queries and reports.

Coolstorage - ORM library for .NET


The main strength of Vici CoolStorage is the ease of use. Most ORM tools still require a lot of unneeded code to accomplish basic data persistence tasks, but Vici CoolStorage is designed to relieve the programmer from these tedious and error-prone tasks, making it very intuitive to use.

Apache Trafodion - Webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop.


Apache Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop. Trafodion builds on the scalability, elasticity, and flexibility of Hadoop. Trafodion extends Hadoop to provide guaranteed transactional integrity, enabling new kinds of big data applications to run on Hadoop.

BigchainDB - The Scalable Blockchain Database


BigchainDB allows developers and enterprise to deploy blockchain proof-of-concepts, platforms and applications with a scalable blockchain database, supporting a wide range of industries and use cases. It is a decentralization ecosystem: a decentralized database, at scale. It can perform 1 million writes per second throughput, store petabytes of data, and sub-second latency.

JanusGraph - Distributed graph database


JanusGraph is a highly scalable graph database optimized for storing and querying large graphs with billions of vertices and edges distributed across a multi-machine cluster. JanusGraph is a transactional database that can support thousands of concurrent users, complex traversals, and analytic graph queries.

FiloDB - Distributed. Columnar. Versioned. Streaming. SQL.


High-performance distributed analytical database + Spark SQL queries + built for streaming. Columnar, versioned layers of data wrapped in a yummy high-performance analytical database engine.