Displaying 1 to 20 from 41 results

Trino - A query engine that runs at ludicrous speed

  •    Java

Trino is a highly parallel and distributed query engine, that is built from the ground up for efficient, low latency analytics. It is an ANSI SQL compliant query engine, that works with BI tools such as R, Tableau, Power BI, Superset and many others. It helps to natively query data in Hadoop, S3, Cassandra, MySQL, and many others, without the need for complex, slow, and error-prone processes for copying the data.

PyHive - Python interface to Hive and Presto. 🐝

  •    Python

PyHive is a collection of Python DB-API and SQLAlchemy interfaces for Presto and Hive.First install this package to register it with SQLAlchemy (see setup.py).

Shark - Hive on Spark

  •    Scala

Shark is an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users. It runs Hive queries up to 100x faster in memory, or 10x on disk. it is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive.

drone - :cake: The missing library manager for Android Developers

  •    Javascript

:cake: The missing library manager for Android Developers

Hive Games Reviewer

  •    CSharp

BoardSpace.net Hive Games Reviewer (Ultimate Edition)

Big Data Twitter Demo


This demo analyzes tweets in real-time, even including a dashboard. The tweets are also archived in Azure DB/Blob and Hadoop where Excel can be used for BI!

Quicksql - Simpler, Safer, Faster Unified SQL Analytics Engine for Multi-Datasources

  •    Java

Quicksql is a SQL query product which can be used for specific datastore queries or multiple datastores correlated queries. It supports relational databases, non-relational databases and even datastore which does not support SQL (such as Elasticsearch, Druid) . In addition, a SQL query can join or union data from multiple datastores in Quicksql. For example, you can perform unified SQL query on one situation that a part of data stored on Elasticsearch, but the other part of data stored on Hive. The most important is that QSQL is not dependent on any intermediate compute engine, users only need to focus on data and unified SQL grammar to finished statistics and analysis. An architecture diagram helps you access Quicksql more easily.

hive-funnel-udf - Hive UDFs for funnel analysis

  •    Java

Funnel analysis is a method for tracking user conversion rates across actions. This enables detection of actions causing high user fallout.These Hive UDFs enables funnel analysis to be performed simply and easily on any Hive table.

maha - A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid

  •    Scala

A centralised library for building reporting APIs on top of multiple data stores to exploit them for what they do best.We run millions of queries on multiple data sources for analytics every day. They run on hive, oracle, druid etc. We needed a way to utilize the data stores in our architecture to exploit them for what they do best. This meant we needed to easily tune and identify sets of use cases where each data store fits the best. Our goal became to build a centralized system which was able to make these decisions on the fly at query time and also take care of the end to end query execution. The system needed to take in all the heuristics available, applying any constraints already defined in the system and select the best data store to run the query. It then would need to generate the underlying queries and pass on all available information to the query execution layer in order to facilitate further optimization at that layer.

hiveql-parser - HiveQL Parser

  •    Java

HiveQL Parser. Parse HiveQL code and print AST in JSON format if success(exit 0), else print well formed syntax error message(exit 1).

shib - WebUI for query engines: Hive and Presto

  •    Javascript

Once configured, we can switch query engines per executions. Latest version of 'shib' is v1.0.2.

eel-sdk - Big Data Toolkit for the JVM

  •    Scala

Eel is a toolkit for manipulating data in the hadoop ecosystem. By hadoop ecosystem we mean file formats common to the big-data world, such as parquet, orc, csv in locations such as HDFS or Hive tables. In contrast to distributed batch or streaming engines such as Spark or Flink, Eel is an SDK intended to be used directly in process. Eel is a lower level API than higher level engines like Spark and is aimed for those use cases when you want something like a file API. Here are some of our notes comparing eel to other tools that offer functionality similar to eel.

hive-sublime-text - Hive support for Sublime Text (2/3)

  •    Javascript

Hive syntax highlighting for Sublime Text (2/3). This is the easiest way to install the plugin.

WorkHive - Lightweight, Browser-based, Grid Computing platform for Node.js

  •    Javascript

Why a rename? Because after 2+ years of not touching this project, I finally have time and motivation to work on it! Fresh ideas need fresh names, so while the repository is the same, the direction I want to take it is different. The codebase is squeeky clean and ready for a much simpler approach. This is a standalone server for creating, managing and serving work items for grid-based computational networks. Grid computing allows for client browsers to run calculations and send the result back to your server. This standalone server handles all of the heavy lifting that is required to manage data, requests, and queueing.