Trino - A query engine that runs at ludicrous speed

  •        2581

Trino is a highly parallel and distributed query engine, that is built from the ground up for efficient, low latency analytics. It is an ANSI SQL compliant query engine, that works with BI tools such as R, Tableau, Power BI, Superset and many others. It helps to natively query data in Hadoop, S3, Cassandra, MySQL, and many others, without the need for complex, slow, and error-prone processes for copying the data.

Access data from multiple systems within a single query. For example, join historic log data stored in an S3 object storage with customer data stored in a MySQL relational database.

https://trino.io/
https://github.com/trinodb/trino

Tags
Implementation
License
Platform

   




Related Projects

Presto - Distributed SQL query engine for big data

  •    Java

Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. It allows querying data from relational / nosql databases. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization. It is developed by Facebook.

Apache Iceberg - Table format for storing large, slow-moving tabular data

  •    Java

Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to compute engines including Spark, Trino, PrestoDB, Flink and Hive using a high-performance table format that works just like a SQL table. Users don’t need to know about partitioning to get fast queries. Iceberg was built for huge tables. Iceberg is used in production where a single table can contain tens of petabytes of data and even these huge tables can be read without a distributed SQL engine.

OmniSciDB - Open Source Analytical Database & SQL Engine

  •    C++

OmniSciDB is the foundation of the OmniSci platform. OmniSciDB is SQL-based, relational, columnar and specifically developed to harness the massive parallelism of modern CPU and GPU hardware. OmniSciDB can query up to billions of rows in milliseconds, and is capable of unprecedented ingestion speeds, making it the ideal SQL engine for the era of big, high-velocity data.

Apache Trafodion - Webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop.

  •    C++

Apache Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop. Trafodion builds on the scalability, elasticity, and flexibility of Hadoop. Trafodion extends Hadoop to provide guaranteed transactional integrity, enabling new kinds of big data applications to run on Hadoop.

TDengine - Big data platform designed and optimized for the Internet of Things

  •    C

TDengine is an open-source big data platform designed and optimized for Internet of Things (IoT), Connected Vehicles, and Industrial IoT. Besides the 10x faster time-series database, it provides caching, stream computing, message queuing and other functionalities to reduce the complexity and costs of development and operations.


Apache Tajo - A big data warehouse system on Hadoop

  •    Java

Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources.

Shark - Hive on Spark

  •    Scala

Shark is an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users. It runs Hive queries up to 100x faster in memory, or 10x on disk. it is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive.

Apache Doris - A fast MPP database for all modern analytics on big data

  •    Java

Apache Doris is a modern MPP analytical database product. It can provide sub-second queries and efficient real-time data analysis. With it's distributed architecture, up to 10PB level datasets will be well supported and easy to operate. Doris provides batch data loading and real-time mini-batch data loading. It provides high availability, reliability, fault tolerance, and scalability. Its original name was Palo, developed in Baidu.

Pinot - A realtime distributed OLAP datastore

  •    Java

Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally, so that it can scale to larger data sets and higher query rates as needed.

coral - Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages

  •    Java

Coral is a library for analyzing, processing, and rewriting views defined in the Hive Metastore, and sharing them across multiple execution engines. It performs SQL translations to enable views expressed in HiveQL (and potentially other languages) to be accessible in engines such as Trino (formerly PrestoSQL), Apache Spark, and Apache Pig. Coral not only translates view definitions between different SQL/non-SQL dialects, but also rewrites expressions to produce semantically equivalent ones, taking into account the semantics of the target language or engine. For example, it automatically composes new built-in expressions that are equivalent to each built-in expression in the source view definition. Additionally, it integrates with Transport UDFs to enable translating and executing user-defined functions (UDFs) across Hive, Trino, Spark, and Pig. Coral is under active development. Currently, we are looking into expanding the set of input view language APIs beyond HiveQL, and implementing query rewrite algorithms for data governance and query optimization. The project is under active development and we welcome contributions of different forms. Please see the Contribution Agreement.

Apache Superset is a Data Visualization and Data Exploration Platform

  •    Python

Superset is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts. It easily integrates your data, using either our simple no-code viz builder or state of the art SQL IDE. Superset can query data from any SQL-speaking datastore or data engine (e.g. Presto or Athena) that has a Python DB-API driver and a SQLAlchemy dialect.

ClickHouse - Column-oriented database management system that allows generating analytical data reports in real time

  •    C++

ClickHouse is a fast open-source OLAP database management system. It is column-oriented and allows to generate analytical reports using SQL queries in real-time. Large queries are parallelized using multiple cores, taking all the necessary resources available on the current server. It supports distributed processing, data can reside on different shards. Each shard can be a group of replicas used for fault tolerance. All shards are used to run a query in parallel, transparently for the user.

GeoMesa - Suite of tools for working with big geo-spatial data in a distributed fashion

  •    Scala

GeoMesa is an open-source, distributed, spatio-temporal database built on a number of distributed cloud data storage systems, including Accumulo, HBase, Cassandra, and Kafka. Leveraging a highly parallelized indexing strategy, GeoMesa aims to provide as much of the spatial querying and data manipulation to Accumulo as PostGIS does to Postgres.

genie - Distributed Big Data Orchestration Service

  •    Java

Genie is a federated job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.See the official website to find documentation about Genie and specific documentation for various releases.

magellan - Geo Spatial Data Analytics on Spark

  •    Scala

Magellan is a distributed execution engine for geospatial analytics on big data. It is implemented on top of Apache Spark and deeply leverages modern database techniques like efficient data layout, code generation and query optimization in order to optimize geospatial queries. The application developer writes standard sql or data frame queries to evaluate geometric expressions while the execution engine takes care of efficiently laying data out in memory during query processing, picking the right query plan, optimizing the query execution with cheap and efficient spatial indices while presenting a declarative abstraction to the developer.

Kylin - Extreme OLAP Engine for Big Data

  •    Java

Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, original contributed from eBay Inc. It is designed to reduce query latency on Hadoop for 10+ billions of rows of data. It offers ANSI SQL on Hadoop and supports most ANSI SQL query functions.

rqlite - The lightweight, distributed relational database built on SQLite

  •    Go

rqlite is a lightweight, distributed relational database, which uses SQLite as its storage engine. Forming a cluster is very straightforward, it gracefully handles leader elections, and tolerates failures of machines, including the leader. rqlite gives you the functionality of a rock solid, fault-tolerant, replicated relational database, but with very easy installation, deployment, and operation. With it you've got a lightweight and reliable distributed relational data store. Think etcd or Consul, but with relational data modelling also available.

VoltDB - Fast Scalable SQL DBMS with ACID

  •    Java

VoltDB was specifically designed for contemporary software applications that are pushed beyond their limits by high volume data sources. VoltDB provides the ability to capture, store and process incoming data at millions of read/write operations per second. And VoltDB’s relational model opens that data to be analyzed in real-time, using familiar Business Intelligence tools, to identify data patterns and trends, spot anomalies, or perform tracking and alerting.

Apache ShardingSphere - Distributed Database Ecosphere

  •    Java

Apache ShardingSphere is an open-source ecosystem consisted of a set of distributed database solutions, including 3 independent products, JDBC, Proxy & Sidecar (Planning). They all provide functions of data scale-out, distributed transaction and distributed governance, applicable in a variety of situations such as Java isomorphism, heterogeneous language and cloud-native.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.