DataSphere Studio (DSS for short) is WeDataSphere, a big data platform of WeBank, a self-developed one-stop data application development management portal. Based on Linkis computation middleware, DSS can easily integrate upper-level data application systems, making data application development simple and easy to use.
workflow airflow spark hive hadoop etl kettle hue tableau flink zeppelin griffin azkaban governance davinci visualis supperset linkis scriptis dataworksLinkis helps easily connect to various back-end computation/storage engines
sql spark presto hive storage jdbc rest-api engine impala pyspark udf thrift-server resource-manager jobserver application-manager livy hive-table linkis context-service scriptisScriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis. Script editor: Support multi-language, auto-completion, syntax highlighting and SQL syntax error-correction.
sql spark hive ide pyspark udf hue zeppelin hql hive-table resouce-management linkis errorcodeTrino is a highly parallel and distributed query engine, that is built from the ground up for efficient, low latency analytics. It is an ANSI SQL compliant query engine, that works with BI tools such as R, Tableau, Power BI, Superset and many others. It helps to natively query data in Hadoop, S3, Cassandra, MySQL, and many others, without the need for complex, slow, and error-prone processes for copying the data.
distributed-systems data-science sql database big-data presto hive hadoop analytics jdbc databases distributed-database query-engine datalake prestodb trinoPyHive is a collection of Python DB-API and SQLAlchemy interfaces for Presto and Hive.First install this package to register it with SQLAlchemy (see setup.py).
hive hiveserver2 presto dbapi sqlalchemyCube.js is an open-source analytical API platform. It is primarily used to build internal business intelligence tools or add customer-facing analytics to existing applications. Cube.js was designed to work with serverless data warehouses and query engines like Google BigQuery and AWS Athena. A multi-stage querying approach makes it suitable for handling trillions of data points. Most modern RDBMS work with Cube.js as well and can be further tuned for performance.
analytics mysql bigquery chart spark presto hive microservice serverless athena postgresql cubeShark is an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users. It runs Hive queries up to 100x faster in memory, or 10x on disk. it is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive.
distributed-sql hive big-dataDataSphere Studio, Linkis, Scriptis, Qualitis, Schedulis, Exchangis. DataSphere Studio is positioned as a data application development portal, and the closed loop covers the entire process of data application development. With a unified UI, the workflow-like graphical drag-and-drop development experience meets the entire lifecycle of data application development from data import, desensitization cleaning, data analysis, data mining, quality inspection, visualization, scheduling to data output applications, etc.
bi kafka spark hive hadoop etl scheduler ide hbase portal mask sqoop data-quality data-mapBoardSpace.net Hive Games Reviewer (Ultimate Edition)
board-game boardgame boardspace game hive offlineThis demo analyzes tweets in real-time, even including a dashboard. The tweets are also archived in Azure DB/Blob and Hadoop where Excel can be used for BI!
azure-storage big-data blob-storage hadoop hive streaminsight twitter基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
spark-mllib spark-streaming ssm-maven scrapy hadoop nginx hive mysqlQuicksql is a SQL query product which can be used for specific datastore queries or multiple datastores correlated queries. It supports relational databases, non-relational databases and even datastore which does not support SQL (such as Elasticsearch, Druid) . In addition, a SQL query can join or union data from multiple datastores in Quicksql. For example, you can perform unified SQL query on one situation that a part of data stored on Elasticsearch, but the other part of data stored on Hive. The most important is that QSQL is not dependent on any intermediate compute engine, users only need to focus on data and unified SQL grammar to finished statistics and analysis. An architecture diagram helps you access Quicksql more easily.
hive spark sql flinkMLCraft is an open-source low-code business intelligence tool and a data science workflow. MLCraft was designed to query the data from several data warehouses and run machine learning experiments. Cube.js is used as a primary query layer and makes it suitable for handling trillions of data points. It is a full-stack data science platform that provides everything you need to build, manage and automate machine learning
mysql bigquery big-data spark presto hive athena analytics clickhouse postgresql business-intelligence redshiftHive Metastore federation service.
hive hive-metastore federation metastoreFunnel analysis is a method for tracking user conversion rates across actions. This enables detection of actions causing high user fallout.These Hive UDFs enables funnel analysis to be performed simply and easily on any Hive table.
hive-udf hive udf hadoop analytics funnelA centralised library for building reporting APIs on top of multiple data stores to exploit them for what they do best.We run millions of queries on multiple data sources for analytics every day. They run on hive, oracle, druid etc. We needed a way to utilize the data stores in our architecture to exploit them for what they do best. This meant we needed to easily tune and identify sets of use cases where each data store fits the best. Our goal became to build a centralized system which was able to make these decisions on the fly at query time and also take care of the end to end query execution. The system needed to take in all the heuristics available, applying any constraints already defined in the system and select the best data store to run the query. It then would need to generate the underlying queries and pass on all available information to the query execution layer in order to facilitate further optimization at that layer.
big-data druid oracle oracle-db hive hiveql sql analytics star-schemaA luigi powered analytics / warehouse stack
luigi redshift workflow etl salesforce teradata google-sheets typeform postgresql mysql hive aws sparkHiveQL Parser. Parse HiveQL code and print AST in JSON format if success(exit 0), else print well formed syntax error message(exit 1).
hive sql hiveql parser syntax-checker
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.