Apache Doris - A fast MPP database for all modern analytics on big data

  •        233

Apache Doris is a modern MPP analytical database product. It can provide sub-second queries and efficient real-time data analysis. With it's distributed architecture, up to 10PB level datasets will be well supported and easy to operate. Doris provides batch data loading and real-time mini-batch data loading. It provides high availability, reliability, fault tolerance, and scalability. Its original name was Palo, developed in Baidu.

  • Uses MySQL protocol to communicate. Users can connect to Doris cluster through MySQL client or MySQL JDBC
  • Support standard SQL language, compatible with MySQL protocol
  • Vectorized SQL executor
  • Getting result of a query within one second
  • Rollup, novel pre-computation mechanism
  • Effective data model for aggregation
  • High performance, high availability, high reliability
  • Easy for operation, Elastic data warehouse for big data

https://github.com/apache/incubator-doris
https://doris.apache.org/

Tags
Implementation
License
Platform

   




Related Projects

Pinot - A realtime distributed OLAP datastore

  •    Java

Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally, so that it can scale to larger data sets and higher query rates as needed.

gpdb - Greenplum Database

  •    C

The Greenplum Database (GPDB) is an advanced, fully featured, open source data warehouse. It provides powerful and rapid analytics on petabyte scale data volumes. Uniquely geared toward big data analytics, Greenplum Database is powered by the world’s most advanced cost-based query optimizer delivering high analytical query performance on large data volumes. The Greenplum project is released under the Apache 2 license. We want to thank all our current community contributors and are really interested in all new potential contributions. For the Greenplum Database community no contribution is too small, we encourage all types of contributions.

datafuse - An elastic and scalable Cloud Warehouse, offers Blazing Fast Query and combines Elasticity, Simplicity, Low cost of the Cloud, built to make the Data Cloud easy

  •    Rust

Datafuse is an open source elastic and scalable cloud warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy. Datafuse consists of three components: meta service layer, and the decoupled compute and storage layers.

OmniSciDB - Open Source Analytical Database & SQL Engine

  •    C++

OmniSciDB is the foundation of the OmniSci platform. OmniSciDB is SQL-based, relational, columnar and specifically developed to harness the massive parallelism of modern CPU and GPU hardware. OmniSciDB can query up to billions of rows in milliseconds, and is capable of unprecedented ingestion speeds, making it the ideal SQL engine for the era of big, high-velocity data.

VoltDB - Fast Scalable SQL DBMS with ACID

  •    Java

VoltDB was specifically designed for contemporary software applications that are pushed beyond their limits by high volume data sources. VoltDB provides the ability to capture, store and process incoming data at millions of read/write operations per second. And VoltDB’s relational model opens that data to be analyzed in real-time, using familiar Business Intelligence tools, to identify data patterns and trends, spot anomalies, or perform tracking and alerting.


ClickHouse - Columnar DBMS and Real Time Analytics

  •    C++

ClickHouse is an open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries. It is Linearly Scalable, Blazing Fast, Highly Reliable, Fault Tolerant, Data compression, Real time query processing, Web analytics, Vectorized query execution, Local and distributed joins. It can process hundreds of millions to more than a billion rows and tens of gigabytes of data per single server per second.

Apache Tajo - A big data warehouse system on Hadoop

  •    Java

Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources.

InfiniDB - Scale-up analytics database engine for data warehousing and business intelligence

  •    C++

InfiniDB Community Edition is a scale-up, column-oriented database for data warehousing, analytics, business intelligence and read-intensive applications. InfiniDB's data warehouse columnar engine is multi-terabyte capable and accessed via MySQL.

RadonDB - Cloud-native MySQL database for building global, scalable cloud services

  •    Go

RadonDB is a cloud-native database based on MySQL. It’s architected to fully distributed cluster that delivering unlimited scalability (scale-out), capacity and performance. It supports distributed transaction capability for high data consistency, and leverage MySQL as storage engine with trusted data reliability. RadonDB is compatible with MySQL protocol, at mean time supports automatic table sharding, that simplifying the maintenance and operation workflow.

Infobright - The Database for Analytics

  •    C++

Infobright combines a columnar database with our Knowledge Grid architecture to deliver a self-managing, self-tuning database optimized for analytics. Infobright eliminates the need to create indexes, partition data, or do any manual tuning to achieve fast response for queries and reports.

rethinkdb_rebirth - The open-source database for the realtime web.

  •    C++

RethinkDB is an open-source scalable database built for realtime applications. It exposes a new database access model -- instead of polling for changes, the developer can tell the database to continuously push updated query results to applications in realtime. RethinkDB allows developers to build scalable realtime apps in a fraction of the time with less effort. Point your browser to localhost:8080. You’ll see an administrative UI where you can control the cluster (which so far consists of one server), and play with the query language.

Prisma - Turns your database into a realtime GraphQL API

  •    Scala

Prisma is a performant open-source GraphQL ORM-like layer doing the heavy lifting in your GraphQL server. It turns your database into a GraphQL API which can be consumed by your resolvers via GraphQL bindings. Prisma's auto-generated GraphQL API provides powerful abstractions and modular building blocks to develop flexible and scalable GraphQL backends. Instead of writing SQL or using a NoSQL API, you can query your database with GraphQL.

ClickHouse - Column-oriented database management system that allows generating analytical data reports in real time

  •    C++

ClickHouse is a fast open-source OLAP database management system. It is column-oriented and allows to generate analytical reports using SQL queries in real-time. Large queries are parallelized using multiple cores, taking all the necessary resources available on the current server. It supports distributed processing, data can reside on different shards. Each shard can be a group of replicas used for fault tolerance. All shards are used to run a query in parallel, transparently for the user.

AsterixDB - Big Data Management System (BDMS)

  •    Java

AsterixDB is a BDMS (Big Data Management System) with a rich feature set that sets it apart from other Big Data platforms. Its feature set makes it well-suited to modern needs such as web data warehousing and social data storage and analysis. It is a highly scalable data management system that can store, index, and manage semi-structured data, but it also supports a full-power query language with the expressiveness of SQL (and more).

VictoriaMetrics - Fast, cost-effective and scalable monitoring solution and time series database

  •    Go

VictoriaMetrics is a fast, cost-effective and scalable monitoring solution and time series database. VictoriaMetrics can be used as long-term storage for Prometheus or for vmagent. It supports Prometheus querying API, so it can be used as Prometheus drop-in replacement in Grafana. It implements MetricsQL query language backwards compatible with PromQL.

databend - An elastic and reliable Cloud Data Warehouse, offers Blazing Fast Query and combines Elasticity, Simplicity, Low cost of the Cloud, built to make the Data Cloud easy

  •    Rust

Databend aimed to be an open source elastic and reliable cloud warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy. Databend is inspired by ClickHouse and its computing model is based on apache-arrow.

rethinkdb-legacy - The open-source database for the realtime web.

  •    C++

RethinkDB is the first open-source scalable database built for realtime applications. It exposes a new database access model -- instead of polling for changes, the developer can tell the database to continuously push updated query results to applications in realtime. RethinkDB allows developers to build scalable realtime apps in a fraction of the time with less effort. To learn more, check out rethinkdb.com.

Trino - A query engine that runs at ludicrous speed

  •    Java

Trino is a highly parallel and distributed query engine, that is built from the ground up for efficient, low latency analytics. It is an ANSI SQL compliant query engine, that works with BI tools such as R, Tableau, Power BI, Superset and many others. It helps to natively query data in Hadoop, S3, Cassandra, MySQL, and many others, without the need for complex, slow, and error-prone processes for copying the data.

Dgraph - Fast, Transactional, Distributed Graph Database

  •    Go

Dgraph is a horizontally scalable and distributed graph database, providing ACID transactions, consistent replication and linearizable reads. It's built from ground up to perform for a rich set of queries. Being a native graph database, it tightly controls how the data is arranged on disk to optimize for query performance and throughput, reducing disk seeks and network calls in a cluster.

TiDB - Distributed NewSQL database compatible with MySQL protocol

  •    Go

TiDB is a distributed SQL database. Inspired by the design of Google F1 and Google Spanner, TiDB supports the best features of both traditional RDBMS and NoSQL. It is horizontally scalable, grow TiDB as your business grows. You can increase the capacity simply by adding more machines.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.