Druid IO - Real Time Exploratory Analytics on Large Datasets

  •        529

Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations. Druid can load both streaming and batch data.

http://druid.io/
https://github.com/druid-io/druid/

Tags
Implementation
License
Platform

   




Related Projects

Kairosdb - Fast distributed scalable time series database written on top of Cassandra


KairosDB is a fast distributed scalable time series database written on top of Cassandra. Data can be pushed in KairosDB via multiple protocols : Telnet, Rest, Graphite. KairosDB stores time series in Cassandra, the popular and performant NoSQL datastore. It supports aggregators which can perform an operation on data points and down samples. Standard functions like min, max, sum, count, mean etc.

InfluxDB - Distributed Time Series Database


InfluxDB is an open-source, distributed, time series database with no external dependencies. It's useful for recording metrics, events, and performing analytics. Everything in InfluxDB is a time series that you can perform standard functions on like min, max, sum, count, mean, median, percentiles, and more. Collect your data on any interval and compute rollups on the fly later.

DalmatinerDB - Fast distributed metrics database in Erlang


DalmatinerDB is a metric database written in pure Erlang. It takes advantage of some special properties of metrics to make some tradeoffs. Its goal is to make a store for metric data (time, value of a metric) that is fast, has a low overhead, and is easy to query and manage. DalmatinerDB allows for metric input in second or even sub-second precision. It will interpolate the missing values to the best of it’s abilities. This is usually acceptable for aggregated data.

SiriDB - Highly-scalable, robust and super fast time series database


SiriDB is a highly-scalable, robust and super fast time series database. Build from the ground up SiriDB uses a unique mechanism to operate without indexes and allows server resources to be added on the fly. SiriDB's unique query language includes dynamic grouping of time series for easy and super fast analysis over large amount's of time series.

Pinot - A realtime distributed OLAP datastore


Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally, so that it can scale to larger data sets and higher query rates as needed.



OpenTSDB - A scalable, distributed Time Series Database.


OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable.

InfiniDB - Scale-up analytics database engine for data warehousing and business intelligence


InfiniDB Community Edition is a scale-up, column-oriented database for data warehousing, analytics, business intelligence and read-intensive applications. InfiniDB's data warehouse columnar engine is multi-terabyte capable and accessed via MySQL.

InfluxDB.Net - .NET client for InfluxDB distributed time series database.


InfluxDB An open-source distributed time series database with no external dependencies. It is the new home for all of your metrics, events, and analytics.

widebase - Column-oriented database for large time series


Column-oriented database for large time series

Cubism.js - Time Series Visualization


Cubism.js is a D3 plugin for visualizing time series. Use Cubism to construct better realtime dashboards, pulling data from Graphite, Cube and other sources. Cubism fetches time series data incrementally: after the initial display, Cubism reduces server load by polling only the most recent values. Cubism renders incrementally, too, using Canvas to shift charts one pixel to the left.

AnalyticsDb - A MongoDb powered time-series analytics library for PHP.


AnalyticsDb is a component that enables you to store and query different time-series (numerical) data. Simple use-case would be tracking the number of visitors for your website inside the given date/time range, or tracking ecommerce revenue for a given quarter.

ceres - Distributable time-series database (not actively maintained)


Ceres is not actively maintained.Ceres is a time-series database format intended to replace Whisper as the default storage format for Graphite. In contrast with Whisper, Ceres is not a fixed-size database and is designed to better support sparse data of arbitrary fixed-size resolutions. This allows Graphite to distribute individual time-series across multiple servers or mounts.

Timelion - Time series composer for Elasticsearch and beyond


Timelion, pronounced "Timeline", brings together totally independent data sources into a single interface, driven by a simple, one-line expression language combining data retrieval, time series combination and transformation, plus visualization. Every Timelion expression starts with a data source function.

Graphite - A highly scalable real-time graphing system


Graphite is an enterprise-scale monitoring tool that runs well on cheap hardware. It is a highly scalable real-time graphing system. It stores numeric time-series data and renders graphs of this data on demand.

Grafana - The leading graph and dashboard builder for visualizing time series metrics


Grafana is an open source, feature rich metrics dashboard and graph editor for Graphite, Elasticsearch, OpenTSDB, Prometheus and InfluxDB. It is most commonly used for visualizing time series data for Internet infrastructure and application analytics but many use it in other domains including industrial sensors, home automation, weather, and process control.

Heroic - The Time Series Database


Heroic is a scalable time series database based on Bigtable, Cassandra, and Elasticsearch. It is an open-source monitoring system originally built at Spotify to address the problems that were facing with large scale gathering and near real-time analysis of metrics.

Prometheus - Service Monitoring System and Time Series Database


Prometheus is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.

Tgres - Time Series in PostgreSQL


Tgres is a tool for receiving and reporting on simple time series written in Go which uses PostgreSQL for storage. Tgres can receive data using Graphite Text, UDP and Pickle protocols, as well as Statsd (counters, gauges and timers). It supports enough of a Graphite HTTP API to be usable with Grafana. Tgres implements the majority of the Graphite functions.

MonetDB


MonetDB is a high-performance SQL- and XQuery- column-store database management system with automatic index management, flexible optimizer infrastructure, and programmable backend functionality.

Hypertable - A high performance, scalable, distributed storage and processing system for structured


Hypertable is based on Google's Bigtable Design, which is a proven scalable design that powers hundreds of Google services. Many of the current scalable NoSQL database offerings are based on a hash table design which means that the data they manage is not kept physically ordered. Hypertable keeps data physically sorted by a primary key and it is well suited for Analytics.