Apache Doris is a modern MPP analytical database product. It can provide sub-second queries and efficient real-time data analysis. With it's distributed architecture, up to 10PB level datasets will be well supported and easy to operate. Doris provides batch data loading and real-time mini-batch data loading. It provides high availability, reliability, fault tolerance, and scalability. Its original name was Palo, developed in Baidu. 
 
<ul><li>Uses MySQL protocol to communicate. Users can connect to Doris cluster through MySQL client or MySQL JDBC</li><li>Support standard SQL language, compatible with MySQL protocol</li><li>Vectorized SQL executor</li><li>Getting result of a query within one second</li><li>Rollup, novel pre-computation mechanism
</li><li>Effective data model for aggregation</li><li>High performance, high availability, high reliability</li><li>Easy for operation, Elastic data warehouse for big data</li></ul>

Apache Doris is a modern MPP analytical database product. It can provide sub-second queries and efficient real-time data analysis. With it's distributed architecture, up to 10PB level datasets will be well supported and easy to operate. Doris provides batch data loading and real-time mini-batch data loading. It provides high availability, reliability, fault tolerance, and scalability. Its original name was Palo, developed in Baidu. 

Apache Doris -  A fast MPP database for all modern analytics on big data

Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources. 
 
By supporting SQL standards and leveraging advanced database techniques, Tajo allows direct control of distributed execution and data flow across a variety of query evaluation strategies and optimization opportunities.

Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources. 

Apache Tajo - A big data warehouse system on Hadoop

Cascalog is a fully-featured data processing and querying library for Clojure or Java. The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer. Cascalog is a replacement for tools like Pig, Hive, and Cascading and operates at a significantly higher level of abstraction than those tools.

Cascalog - Data processing on Hadoop

AsterixDB is a BDMS (Big Data Management System) with a rich feature set that sets it apart from other Big Data platforms. Its feature set makes it well-suited to modern needs such as web data warehousing and social data storage and analysis. It is a highly scalable data management system that can store, index, and manage semi-structured data, but it also supports a full-power query language with the expressiveness of SQL (and more). 
 
AsterixDB can exploit its knowledge of data partitioning and the availability of indexes to avoid always scanning data set(s) to process queries

AsterixDB is a BDMS (Big Data Management System) with a rich feature set that sets it apart from other Big Data platforms. Its feature set makes it well-suited to modern needs such as web data warehousing and social data storage and analysis. It is a highly scalable data management system that can store, index, and manage semi-structured data, but it also supports a full-power query language with the expressiveness of SQL (and more). 

AsterixDB -  Big Data Management System (BDMS)

Apache VXQuer will be a standards compliant XML Query processor implemented in Java. The focus is on the evaluation of queries on large amounts of XML data. Specifically the goal is to evaluate queries on large collections of relatively small XML documents. To achieve this queries will be evaluated on a cluster of shared nothing machines. However there are no XQuery processors available today that are capable of processing these datasets in parallel and making the contained information accessible.

Apache VXQuer will be a standards compliant XML Query processor implemented in Java. The focus is on the evaluation of queries on large amounts of XML data. Specifically the goal is to evaluate queries on large collections of relatively small XML documents. To achieve this queries will be evaluated on a cluster of shared nothing machines. 

VXQuery - Query XML Data

Elementary was built out of the need to effortlessly and immediately gain visibility into the data stack, starting with tracing the actual upstream & downstream dependencies in the data warehouse, without any implementation efforts, security risks or compromises on accuracy.
 
Features:
<ul><li>Lineage visualization: Visual map of data flow and dependencies in the data warehouse.&nbsp;</li><li>Dataset status: Present data about freshness and volume on the lineage graph.</li><li>Accuracy: Reflects the actual state in the DWH based on logs.</li><li>Plug-and-play: No need for code changes.</li><li>Graph filters: Filter the graph by dataset, dates, direction and depth.</li></ul>

Elementary was built out of the need to effortlessly and immediately gain visibility into the data stack, starting with tracing the actual upstream & downstream dependencies in the data warehouse, without any implementation efforts, security risks or compromises on accuracy.

Elementary - Data observability platform for modern data teams that is open and transparent

Dev Lake brings all your DevOps data into one practical, personalized, extensible view. Ingest, analyze, and visualize data from an ever-growing list of developer tools, with our free and open source product. Dev Lake is most exciting for leaders and managers looking to make better sense of their development data, though it's useful for any developer looking to bring a more data-driven approach to their own practices. With Dev Lake you can ask your process any question, just connect and query.
 
Dev Lake provides understanding of software development lifecycle, digging workflow bottlenecks, Timely review of team performance, Rapid feedback, Agile adjustment. It helps to quickly build scenario-based data dashboards and drill down to analyze the root cause of problems.
 
<div>What can be accomplished with Dev Lake?</div><div> </div><ol><li>Collect DevOps performance data for the whole process</li><li>Share abstraction layer with similar tools to output standardized performance data</li><li>Built-in 20+ performance metrics and drill-down analysis capability</li><li>Support custom SQL analysis and drag and drop to build scenario-based data views</li><li>Flexible architecture and plug-in design to support fast access to new data sources</li></ol>

Dev Lake brings all your DevOps data into one practical, personalized, extensible view. Ingest, analyze, and visualize data from an ever-growing list of developer tools, with our free and open source product. Dev Lake is most exciting for leaders and managers looking to make better sense of their development data, though it's useful for any developer looking to bring a more data-driven approach to their own practices. With Dev Lake you can ask your process any question, just connect and query.

Dev Lake -  Data lake for Dev

A powerful open source data warehouse system. InterMine allows users to integrate diverse data sources with a minimum of effort, providing powerful web-services and an elegant web-application with minimal configuration. InterMine powers some of the largest data-warehouses in the life sciences.

Intermine - A powerful open source data warehouse system

Discover open source projects across all platforms

Projects

Apache Doris - A fast MPP database for all modern analytics on big data

Apache Tajo - A big data warehouse system on Hadoop

Cascalog - Data processing on Hadoop

AsterixDB - Big Data Management System (BDMS)

VXQuery - Query XML Data

Elementary - Data observability platform for modern data teams that is open and transparent

Dev Lake - Data lake for Dev

Intermine - A powerful open source data warehouse system

TechStack

Tagcloud

License

Suggested keywords:

Projects

TechStack

Tagcloud

License