Displaying 1 to 20 from 35 results

DataSphereStudio - DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling

  •    Java

DataSphere Studio (DSS for short) is WeDataSphere, a big data platform of WeBank, a self-developed one-stop data application development management portal. Based on Linkis computation middleware, DSS can easily integrate upper-level data application systems, making data application development simple and easy to use.

AthenaX - SQL-based streaming analytics platform at scale

  •    Java

AthenaX is a streaming analytics platform that enables users to run production-quality, large scale streaming analytics using Structured Query Language (SQL). AthenaX was released and open sourced by Uber Technologies. It is capable of scaling across hundreds of machines and processing hundreds of billions of real-time events daily. Apache 2.0 License.

AthenaX - SQL-based streaming analytics platform at scale

  •    Java

AthenaX is a streaming analytics platform that enables users to run production-quality, large scale streaming analytics using Structured Query Language (SQL). AthenaX was released and open sourced by Uber Technologies. It is capable of scaling across hundreds of machines and processing hundreds of billions of real-time events daily.Apache 2.0 License.




Quicksql - Simpler, Safer, Faster Unified SQL Analytics Engine for Multi-Datasources

  •    Java

Quicksql is a SQL query product which can be used for specific datastore queries or multiple datastores correlated queries. It supports relational databases, non-relational databases and even datastore which does not support SQL (such as Elasticsearch, Druid) . In addition, a SQL query can join or union data from multiple datastores in Quicksql. For example, you can perform unified SQL query on one situation that a part of data stored on Elasticsearch, but the other part of data stored on Hive. The most important is that QSQL is not dependent on any intermediate compute engine, users only need to focus on data and unified SQL grammar to finished statistics and analysis. An architecture diagram helps you access Quicksql more easily.

registry - Schema Registry

  •    Java

Registry is a versioned entity framework that allows to build various registry services such as Schema Registry, ML Model Registry etc..

streamline - StreamLine - Streaming Analytics

  •    Java

Develop and deploy Streaming Analytics applications visually with bindings for streaming engine and multiple source/sinks, rich set of streaming operators and operational lifecycle management. Streaming Analytics Manager makes it easy to develop, monitor streaming applications and also provides analytics of data thats being processed by streaming application.

featran - A Scala feature transformation library for data science and machine learning

  •    Scala

Featran, also known as Featran77 or F77 (get it?), is a Scala library for feature transformation. It aims to simplify the time consuming task of feature engineering in data science and machine learning processes. It supports various collection types for feature extraction and output formats for feature representation.We can implement this in a naive way using reduce and map.


FlinkExperiments - Experiments with Apache Flink.

  •    Java

This project is a sample project for Apache Flink. The application parses the Quality Controlled Local Climatological Data (QCLCD) of March 2015, calculates the maximum daily temperature of the stream by using Apache Flink and writes the results back into an Elasticsearch and PostgreSQL database. Quality Controlled Local Climatological Data (QCLCD) consist of hourly, daily, and monthly summaries for approximately 1,600 U.S. locations. Daily Summary forms are not available for all stations. Data are available beginning January 1, 2005 and continue to the present. Please note, there may be a 48-hour lag in the availability of the most recent data.

realtime-dashboard-example - This is a real-time dashboard example using Spark Streaming and Node.js

  •    Java

Spark Streaming makes it easy to build scalable fault-tolerant streaming applications. At AppsFlyer, we use Spark for many of our offline processing services. Spark Streaming joined our technology stack a few months ago for real-time work flows, reading directly from Kafka to provide value to our clients in near-real-time.

fdp-modelserver - An umbrella project for multiple implementations of model serving

  •    Scala

-kafkastreamserver - implementation of model scoring and queryable state using Kafka streams Also includes implementation of custom Kafka streams store.

nussknacker - Process authoring tool for Apache Flink

  •    Scala

Nussknacker lets you design, deploy and monitor streaming processes using easy to use GUI. We leverage power, performance and reliability of Apache Flink to make your processes fast and accurate. Visit our pages to see documentation. Visit our quickstart to have a look around.

dataflow-runner - Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR

  •    Go

Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR. Dataflow Runner is copyright 2016-2018 Snowplow Analytics Ltd.

bigdata-notebook

  •    Scala

A repository to hold all my Hadoop and Machine Learning related codes.

flink-deployer - A tool that help automate deployment to an Apache Flink cluster

  •    Go

A Go command-line utility to facilitate deployments to Apache Flink. Repeat step 3 with any commands you'd like to try.

flink-connectors - Apache Flink connectors for Pravega.

  •    Java

This repository implements connectors to read and write Pravega Streams with Apache Flink stream processing framework. The connectors can be used to build end-to-end stream processing pipelines (see Samples) that use Pravega as the stream storage and message bus, and Apache Flink for computation over the streams.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.