Displaying 1 to 20 from 20 results

Cassandra - Scalable Distributed Database

  •    Java

The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. Cassandra is suitable for applications that can't afford to lose data. Data is automatically replicated to multiple nodes for fault-tolerance.

Hypertable - A high performance, scalable, distributed storage and processing system for structured

  •    C++

Hypertable is based on Google's Bigtable Design, which is a proven scalable design that powers hundreds of Google services. Many of the current scalable NoSQL database offerings are based on a hash table design which means that the data they manage is not kept physically ordered. Hypertable keeps data physically sorted by a primary key and it is well suited for Analytics.

MapD - The MapD Core database

  •    C++

MapD Core is an in-memory, column store, SQL relational database that was designed from the ground up to run on GPUs. MapD Core is the foundational element of a larger data exploration platform that emphasizes speed at scale. By taking advantage of the parallel processing power of the hardware, MapD Core can query billions of rows in milliseconds. Furthermore, by using the graphics pipelines of GPUs, MapD Core can render graphics directly from the server.

bcolz - A columnar data container that can be compressed.

  •    C

bcolz provides columnar, chunked data containers that can be compressed either in-memory and on-disk. Column storage allows for efficiently querying tables, as well as for cheap column addition and removal. It is based on NumPy, and uses it as the standard data container to communicate with bcolz objects, but it also comes with support for import/export facilities to/from HDF5/PyTables tables and pandas dataframes. bcolz objects are compressed by default not only for reducing memory/disk storage, but also to improve I/O speed. The compression process is carried out internally by Blosc, a high-performance, multithreaded meta-compressor that is optimized for binary data (although it works with text data just fine too).




FiloDB - Distributed. Columnar. Versioned. Streaming. SQL.

  •    Scala

High-performance distributed analytical database + Spark SQL queries + built for streaming. Columnar, versioned layers of data wrapped in a yummy high-performance analytical database engine.

Druid IO - Real Time Exploratory Analytics on Large Datasets

  •    Java

Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations. Druid can load both streaming and batch data.

InfiniDB - Scale-up analytics database engine for data warehousing and business intelligence

  •    C++

InfiniDB Community Edition is a scale-up, column-oriented database for data warehousing, analytics, business intelligence and read-intensive applications. InfiniDB's data warehouse columnar engine is multi-terabyte capable and accessed via MySQL.

LucidDB - RDBMS built entirely for Data Warehousing and Business Intelligence

  •    Java

LucidDB is the RDBMS built entirely for data warehousing and business intelligence. It is based on architectural cornerstones such as column-store, bitmap indexing, hash join/aggregation, and page-level multi versioning. Every component of LucidDB was designed with the requirements of flexible, high-performance data integration and sophisticated query processing in mind.


ScyllaDB - NoSQL Column Store Database compatible with Cassandra

  •    C++

Scylladb is a Cassandra compatible NoSQL column store which can do 1MM transactions/sec per server. It scales up linearly with number of cores.

Pinot - A realtime distributed OLAP datastore

  •    Java

Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally, so that it can scale to larger data sets and higher query rates as needed.

EventQL - The database for large-scale event analytics

  •    C++

EventQL is a distributed, column-oriented database built for large-scale event collection and analytics. It runs super-fast SQL and MapReduce queries. Its features include Automatic partitioning, Columnar storage, Standard SQL support, Scales to petabytes, Timeseries and relational data, Fast range scans and lot more.

Metakit - embedded database library with small footprint

  •    C++

Metakit is an embedded database library with a small footprint. It fills the gap between flat-file, relational, object-oriented, and tree-structured databases, supporting relational joins, serialization, nested structures, and instant schema evolution

MonetDB

  •    C

MonetDB is a high-performance SQL- and XQuery- column-store database management system with automatic index management, flexible optimizer infrastructure, and programmable backend functionality.

Cloudata - Structured Data Storage implementing Google's Bigtable.

  •    Java

Cloudata is Distributed Large scale Structured Data Storage, and open source project implementing Google's Bigtable. It's DBMS(Database Management System), but not Relational DBMS. It can store more than Peta bytes.

Skytable - Your next NoSQL database

  •    Rust

Skytable is an effort to provide the best of key/value stores, document stores and columnar databases, that is, simplicity, flexibility and queryability at scale. The name 'Skytable' exemplifies our vision to create a database that has limitless possibilities. It is natively multithreaded and scales to millions of queries per second per node with no optimizations left off the table. The database server doesn't need more than 1MB to run.

kdi - Distributed structured data interface inspired by Google's BigTable

  •    C++

Distributed structured data interface inspired by Google's BigTable

MonetDBLite - MonetDB reconfigured as a library

  •    

MonetDBLite is an embedded analytical SQL database that runs a variety of environments and does not require the installation of any external software. MonetDBLite is derived from free and open-source MonetDB, a product of the Centrum Wiskunde & Informatica.

MonetDBLite-R - MonetDB reconfigured as an R package - See below for an introduction. Edit

  •    C

MonetDBLite for R is a SQL database that runs inside the R environment for statistical computing and does not require the installation of any external software. MonetDBLite is based on free and open-source MonetDB, a product of the Centrum Wiskunde & Informatica. MonetDBLite is similar in functionality to RSQLite, but typically completes queries blazingly fast due to its columnar storage architecture and bulk query processing model. Since both of these embedded SQL options rely on the the R DBI interface, the conversion of legacy RSQLite project syntax over to MonetDBLite code should be a cinch.

rainbow - A data layout optimization framework for wide tables stored on HDFS

  •    Java

Rainbow is a tool that helps improve the I/O performance of wide tables stored in columnar formats on HDFS. More information in our project main page.

talaria - TalariaDB is a distributed, highly available, and low latency time-series database for Presto

  •    Go

This repository contains a fork of TalariaDB, a distributed, highly available, and low latency time-series database for Big Data systems. It was originally designed and implemented in Grab, where millions and millions of transactions and connections take place every day , which requires a platform scalable data-driven decision making. TalariaDB helped us to overcome the challenge of retrieving and acting upon the information from large amounts of data. It addressed our need to query at least 2-3 terabytes of data per hour with predictable low query latency and low cost. Most importantly, it plays very nicely with the different tools’ ecosystems and lets us query data using SQL.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.