presto - Distributed SQL query engine for big data

  •        14

Presto

https://prestodb.io
https://github.com/prestodb/presto
https://github.com/facebook/presto

Dependencies:

com.facebook.presto:presto-spi:null
com.facebook.presto:presto-spi:null
com.facebook.presto:presto-resource-group-managers:null
com.facebook.presto:presto-resource-group-managers:null
com.facebook.presto:presto-password-authenticators:null
com.facebook.presto:presto-session-property-managers:null
com.facebook.presto:presto-array:null
com.facebook.presto:presto-plugin-toolkit:null
com.facebook.presto:presto-record-decoder:null
com.facebook.presto:presto-orc:null
com.facebook.presto:presto-parquet:null
com.facebook.presto:presto-rcfile:null
com.facebook.presto:presto-hive:null
com.facebook.presto:presto-hive-hadoop2:null
com.facebook.presto:presto-example-http:null
com.facebook.presto:presto-local-file:null
com.facebook.presto:presto-hive:null
com.teradata:re2j-td:1.4
com.facebook.presto:presto-tpch:null
com.facebook.presto:presto-blackhole:null
com.facebook.presto:presto-memory:null
com.facebook.presto:presto-base-jdbc:null
com.facebook.presto:presto-mysql:null
com.facebook.presto:presto-geospatial:null
com.facebook.presto:presto-geospatial-toolkit:null
com.facebook.presto:presto-raptor:null
com.facebook.presto:presto-cli:null
com.facebook.presto:presto-client:null
com.facebook.presto:presto-parser:null
com.facebook.presto:presto-parser:null
com.facebook.presto:presto-main:null
com.facebook.presto:presto-main:null
com.facebook.presto:presto-matching:null
com.facebook.presto:presto-memory-context:null
com.facebook.presto:presto-jdbc:null
com.facebook.presto:presto-server:null
com.facebook.presto:presto-server-rpm:null
com.facebook.presto:presto-tests:null
com.facebook.presto:presto-benchmark:null
com.facebook.presto:presto-benchto-queries:null
com.facebook.presto:presto-benchto-benchmarks:null
com.facebook.presto:presto-product-tests:null
com.facebook.presto:presto-sqlserver:null
com.facebook.presto.hadoop:hadoop-apache2:2.7.4-5
com.facebook.presto.hive:hive-apache:1.2.0-2
com.facebook.presto.orc:orc-protobuf:6
com.facebook.presto:presto-thrift-connector-api:null
com.facebook.presto:presto-thrift-connector-api:null
com.facebook.presto:presto-thrift-testing-server:null
com.facebook.presto:presto-thrift-connector:null
com.facebook.hive:hive-dwrf:0.8.2
io.airlift:aircompressor:0.12
io.airlift:log:0.174
io.airlift:log-manager:0.174
io.airlift:json:0.174
io.airlift:security:0.174
io.airlift:units:1.3
io.airlift:concurrent:0.174
io.airlift:configuration:0.174
io.airlift:discovery:0.174
io.airlift:testing:0.174
io.airlift:node:0.174
io.airlift:bootstrap:0.174
io.airlift:event:0.174
io.airlift:http-server:0.174
io.airlift:jaxrs:0.174
io.airlift:jaxrs-testing:0.174
io.airlift:jmx:0.174
io.airlift:trace-token:0.174
io.airlift:dbpool:0.174
io.airlift:jmx-http:0.174
io.airlift:http-client:0.174
io.airlift:stats:0.174
io.airlift:bytecode:1.1
io.airlift:joni:2.1.5.1
io.airlift.drift:drift-api:1.14
io.airlift.drift:drift-client:1.14
io.airlift.drift:drift-codec:1.14
io.airlift.drift:drift-protocol:1.14
io.airlift.drift:drift-server:1.14
io.airlift.drift:drift-transport-netty:1.14
io.airlift.tpch:tpch:0.9
com.teradata.tpcds:tpcds:1.2
org.ow2.asm:asm:6.2.1
com.h2database:h2:1.4.197
org.sonatype.aether:aether-api:1.13.1
io.airlift.resolver:resolver:1.4
io.airlift:airline:0.8
org.openjdk.jol:jol-core:0.2
org.jetbrains:annotations:13.0
it.unimi.dsi:fastutil:6.5.9
com.facebook.thirdparty:libsvm:3.18.1
mysql:mysql-connector-java:5.1.44
org.postgresql:postgresql:42.1.4
org.pcollections:pcollections:2.1.2
org.antlr:antlr4-runtime:4.7.1
com.microsoft.sqlserver:mssql-jdbc:6.1.0.jre8
jline:jline:2.14.6
org.jdbi:jdbi:2.78
org.jdbi:jdbi3-core:3.4.0
org.jdbi:jdbi3-sqlobject:3.4.0
com.squareup.okhttp3:okhttp:3.9.0
com.squareup.okhttp3:okhttp-urlconnection:3.9.0
com.squareup.okhttp3:mockwebserver:3.9.0
io.jsonwebtoken:jjwt:0.9.0
org.apache.thrift:libthrift:0.9.1
net.sf.opencsv:opencsv:2.3
org.apache.commons:commons-math3:3.6.1
io.airlift.discovery:discovery-server:1.29
com.amazonaws:aws-java-sdk-core:1.11.293
com.amazonaws:aws-java-sdk-glue:1.11.293
com.amazonaws:aws-java-sdk-s3:1.11.293
io.airlift:testing-mysql-server:5.7.22-1
io.airlift:testing-postgresql-server:9.6.3-3
org.apache.kafka:kafka_2.10:0.8.2.2
org.xerial.snappy:snappy-java:1.1.2.6
com.github.luben:zstd-jni:1.3.5-4
org.apache.zookeeper:zookeeper:3.4.9
org.jgrapht:jgrapht-core:0.9.0
redis.clients:jedis:2.6.2
com.orange.redis-embedded:embedded-redis:0.6
io.prestodb.tempto:tempto-core:1.49
io.prestodb.tempto:tempto-ldap:1.49
io.prestodb.tempto:tempto-kafka:1.49
io.prestodb.tempto:tempto-runner:1.49
com.facebook.presto.hive:hive-apache-jdbc:0.13.1-5
com.esri.geometry:esri-geometry-api:2.2.1
org.apache.lucene:lucene-analyzers-common:7.2.1
org.locationtech.jts:jts-core:1.15.0
org.anarres.lzo:lzo-hadoop:1.0.5
com.facebook.presto.cassandra:cassandra-server:2.1.16-1
com.facebook.presto.cassandra:cassandra-driver:3.1.4-1
org.javassist:javassist:3.22.0-GA

Tags
Implementation
License
Platform

   




Related Projects

Presto - Distributed SQL query engine for big data

  •    Java

Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. It allows querying data from relational / nosql databases. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization. It is developed by Facebook.

genie - Distributed Big Data Orchestration Service

  •    Java

Genie is a federated job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.See the official website to find documentation about Genie and specific documentation for various releases.

hadoop-ansible - Ansible playbook that installs a Hadoop cluster, with HBase, Hive, Presto for analytics, and Ganglia, Smokeping, Fluentd, Elasticsearch and Kibana for monitoring and centralized log indexing

  •    Shell

Ansible playbook that installs a CDH 4.6.0 Hadoop cluster (running on Java 7, supported from CDH 4.4), with HBase, Hive, Presto for analytics, and Ganglia, Smokeping, Fluentd, Elasticsearch and Kibana for monitoring and centralized log indexing. Follow @analytically. Browse the CI build screenshots.

PyHive - Python interface to Hive and Presto. 🐝

  •    Python

PyHive is a collection of Python DB-API and SQLAlchemy interfaces for Presto and Hive.First install this package to register it with SQLAlchemy (see setup.py).


Shark - Hive on Spark

  •    Scala

Shark is an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users. It runs Hive queries up to 100x faster in memory, or 10x on disk. it is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive.

Apache Tez - A Framework for YARN-based, Data Processing Applications In Hadoop

  •    Java

Apache Tez is an extensible framework for building high performance batch and interactive data processing applications, coordinated by YARN in Apache Hadoop. Tez improves the MapReduce paradigm by dramatically improving its speed, while maintaining MapReduce’s ability to scale to petabytes of data. Important Hadoop ecosystem projects like Apache Hive and Apache Pig use Apache Tez, as do a growing number of third party data access applications developed for the broader Hadoop ecosystem.

Cascalog - Data processing on Hadoop

  •    Clojure

Cascalog is a fully-featured data processing and querying library for Clojure or Java. The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer. Cascalog is a replacement for tools like Pig, Hive, and Cascading and operates at a significantly higher level of abstraction than those tools.

Apache Hive - The Apache Hive (TM) data warehouse software facilitates querying and managing large d

  •    Java

The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage.

Apache Tajo - A big data warehouse system on Hadoop

  •    Java

Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources.

Apache Trafodion - Webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop.

  •    C++

Apache Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop. Trafodion builds on the scalability, elasticity, and flexibility of Hadoop. Trafodion extends Hadoop to provide guaranteed transactional integrity, enabling new kinds of big data applications to run on Hadoop.

incubator-doris - Palo,an MPP data warehouse

  •    C++

Palo is an MPP-based interactive SQL data warehousing for reporting and analysis. Palo mainly integrates the technology of Google Mesa and Apache Impala. Unlike other popular SQL-on-Hadoop systems, Palo is designed to be a simple and single tightly coupled system, not depending on other systems. Palo not only provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Palo not only provides batch data loading, but also provides near real-time mini-batch data loading. Palo also provides high availability, reliability, fault tolerance, and scalability. The simplicity (of developing, deploying and using) and meeting many data serving requirements in single system are the main features of Palo. In Baidu, the largest Chinese search engine, we run a two-tiered data warehousing system for data processing, reporting and analysis. Similar to lambda architecture, the whole data warehouse comprises data processing and data serving. Data processing does the heavy lifting of big data: cleaning data, merging and transforming it, analyzing it and preparing it for use by end user queries; data serving is designed to serve queries against that data for different use cases. Currently data processing includes batch data processing and stream data processing technology, like Hadoop, Spark and Storm; Palo is a SQL data warehouse for serving online and interactive data reporting and analysis querying.

Big Data Twitter Demo

  •    

This demo analyzes tweets in real-time, even including a dashboard. The tweets are also archived in Azure DB/Blob and Hadoop where Excel can be used for BI!

Hue - The open source Apache Hadoop UI

  •    Java

Hue is a Web application for interacting with Apache Hadoop. It supports a FileBrowser for accessing HDFS, JobBrowser for accessing MapReduce jobs (MR1/MR2-YARN), Job Designer for creating MapReduce/Streaming/Java jobs, HBase Browser for exploring and modifying HBase tables and data, Oozie App for submitting and scheduling workflows and bundles, A Pig/HBase/Sqoop2 shell, Beeswax application for executing Hive queries, Search app for querying Solr and Solr Cloud.

Kylin - Extreme OLAP Engine for Big Data

  •    Java

Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, original contributed from eBay Inc. It is designed to reduce query latency on Hadoop for 10+ billions of rows of data. It offers ANSI SQL on Hadoop and supports most ANSI SQL query functions.

HiBench - HiBench is a big data benchmark suite.

  •    Java

HiBench is a big data benchmark suite that helps evaluate different big data frameworks in terms of speed, throughput and system resource utilizations. It contains a set of Hadoop, Spark and streaming workloads, including Sort, WordCount, TeraSort, Sleep, SQL, PageRank, Nutch indexing, Bayes, Kmeans, NWeight and enhanced DFSIO, etc. It also contains several streaming workloads for Spark Streaming, Flink, Storm and Gearpump. There are totally 19 workloads in HiBench. The workloads are divided into 6 categories which are micro, ml(machine learning), sql, graph, websearch and streaming.

Luigi - Python module that helps you build complex pipelines of batch jobs

  •    Python

The purpose of Luigi is to address all the plumbing typically associated with long-running batch processes. You want to chain many tasks, automate them, and failures will happen. These tasks can be anything, but are typically long running things like Hadoop jobs, dumping data to/from databases, running machine learning algorithms, or anything else.

Quicksql - Simpler, Safer, Faster Unified SQL Analytics Engine for Multi-Datasources

  •    Java

Quicksql is a SQL query product which can be used for specific datastore queries or multiple datastores correlated queries. It supports relational databases, non-relational databases and even datastore which does not support SQL (such as Elasticsearch, Druid) . In addition, a SQL query can join or union data from multiple datastores in Quicksql. For example, you can perform unified SQL query on one situation that a part of data stored on Elasticsearch, but the other part of data stored on Hive. The most important is that QSQL is not dependent on any intermediate compute engine, users only need to focus on data and unified SQL grammar to finished statistics and analysis. An architecture diagram helps you access Quicksql more easily.

Sqoop - Transfers data between Hadoop and Datastores

  •    Java

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. You can use Sqoop to import data from external structured datastores into Hadoop Distributed File System or related systems like Hive and HBase. Conversely, Sqoop can be used to extract data from Hadoop and export it to external structured datastores such as relational databases and enterprise data warehouses.