Displaying 1 to 20 from 53 results

DataSphereStudio - DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling

  •    Java

DataSphere Studio (DSS for short) is WeDataSphere, a big data platform of WeBank, a self-developed one-stop data application development management portal. Based on Linkis computation middleware, DSS can easily integrate upper-level data application systems, making data application development simple and easy to use.

Scriptis - Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis

  •    Vue

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis. Script editor: Support multi-language, auto-completion, syntax highlighting and SQL syntax error-correction.




Trino - A query engine that runs at ludicrous speed

  •    Java

Trino is a highly parallel and distributed query engine, that is built from the ground up for efficient, low latency analytics. It is an ANSI SQL compliant query engine, that works with BI tools such as R, Tableau, Power BI, Superset and many others. It helps to natively query data in Hadoop, S3, Cassandra, MySQL, and many others, without the need for complex, slow, and error-prone processes for copying the data.

PyHive - Python interface to Hive and Presto. 🐝

  •    Python

PyHive is a collection of Python DB-API and SQLAlchemy interfaces for Presto and Hive.First install this package to register it with SQLAlchemy (see setup.py).

Cube.js — Open-Source Analytical API Platform

  •    Javascript

Cube.js is an open-source analytical API platform. It is primarily used to build internal business intelligence tools or add customer-facing analytics to existing applications. Cube.js was designed to work with serverless data warehouses and query engines like Google BigQuery and AWS Athena. A multi-stage querying approach makes it suitable for handling trillions of data points. Most modern RDBMS work with Cube.js as well and can be further tuned for performance.

Shark - Hive on Spark

  •    Scala

Shark is an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users. It runs Hive queries up to 100x faster in memory, or 10x on disk. it is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive.


drone - :cake: The missing library manager for Android Developers

  •    Javascript

:cake: The missing library manager for Android Developers

WeDataSphere - WeDataSphere is a financial level one-stop open-source suitcase for big data platforms

  •    

DataSphere Studio, Linkis, Scriptis, Qualitis, Schedulis, Exchangis. DataSphere Studio is positioned as a data application development portal, and the closed loop covers the entire process of data application development. With a unified UI, the workflow-like graphical drag-and-drop development experience meets the entire lifecycle of data application development from data import, desensitization cleaning, data analysis, data mining, quality inspection, visualization, scheduling to data output applications, etc.

Hive Games Reviewer

  •    CSharp

BoardSpace.net Hive Games Reviewer (Ultimate Edition)

Big Data Twitter Demo

  •    

This demo analyzes tweets in real-time, even including a dashboard. The tweets are also archived in Azure DB/Blob and Hadoop where Excel can be used for BI!

Quicksql - Simpler, Safer, Faster Unified SQL Analytics Engine for Multi-Datasources

  •    Java

Quicksql is a SQL query product which can be used for specific datastore queries or multiple datastores correlated queries. It supports relational databases, non-relational databases and even datastore which does not support SQL (such as Elasticsearch, Druid) . In addition, a SQL query can join or union data from multiple datastores in Quicksql. For example, you can perform unified SQL query on one situation that a part of data stored on Elasticsearch, but the other part of data stored on Hive. The most important is that QSQL is not dependent on any intermediate compute engine, users only need to focus on data and unified SQL grammar to finished statistics and analysis. An architecture diagram helps you access Quicksql more easily.

MLCraft - Low-code business intelligence tool and a data science workflow

  •    Javascript

MLCraft is an open-source low-code business intelligence tool and a data science workflow. MLCraft was designed to query the data from several data warehouses and run machine learning experiments. Cube.js is used as a primary query layer and makes it suitable for handling trillions of data points. It is a full-stack data science platform that provides everything you need to build, manage and automate machine learning

hive-funnel-udf - Hive UDFs for funnel analysis

  •    Java

Funnel analysis is a method for tracking user conversion rates across actions. This enables detection of actions causing high user fallout.These Hive UDFs enables funnel analysis to be performed simply and easily on any Hive table.

maha - A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid

  •    Scala

A centralised library for building reporting APIs on top of multiple data stores to exploit them for what they do best.We run millions of queries on multiple data sources for analytics every day. They run on hive, oracle, druid etc. We needed a way to utilize the data stores in our architecture to exploit them for what they do best. This meant we needed to easily tune and identify sets of use cases where each data store fits the best. Our goal became to build a centralized system which was able to make these decisions on the fly at query time and also take care of the end to end query execution. The system needed to take in all the heuristics available, applying any constraints already defined in the system and select the best data store to run the query. It then would need to generate the underlying queries and pass on all available information to the query execution layer in order to facilitate further optimization at that layer.

hiveql-parser - HiveQL Parser

  •    Java

HiveQL Parser. Parse HiveQL code and print AST in JSON format if success(exit 0), else print well formed syntax error message(exit 1).






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.