Displaying 1 to 20 from 23 results

pipeline - PipelineAI: Real-Time Enterprise AI Platform

  •    HTML

Each model is built into a separate Docker image with the appropriate Python, C++, and Java/Scala Runtime Libraries for training or prediction. Use the same Docker Image from Local Laptop to Production to avoid dependency surprises.

DataSphereStudio - DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling

  •    Java

DataSphere Studio (DSS for short) is WeDataSphere, a big data platform of WeBank, a self-developed one-stop data application development management portal. Based on Linkis computation middleware, DSS can easily integrate upper-level data application systems, making data application development simple and easy to use.

argo-workflows - Workflow engine for Kubernetes

  •    Go

Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD (Custom Resource Definition). Argo is a Cloud Native Computing Foundation (CNCF) hosted project.

Gather-Deployment - Gathers scalable Tensorflow and Python infrastructure deployment, Husein Go-To for development, 100% Docker

  •    Jupyter

Gathers scalable tensorflow and self-hosted infrastructure deployment, Husein Go-To for development, 100% Docker. Stream from webcam using WebRTC -> Flask SocketIO to detect objects -> WebRTC -> Website.

Agile_Data_Code_2 - Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

  •    Jupyter

Like my work? I am Principal Consultant at Data Syndrome, a consultancy offering assistance and training with building full-stack analytics products, applications and systems. Find us on the web at datasyndrome.com. There is now a video course using code from chapter 8, Realtime Predictive Analytics with Kafka, PySpark, Spark MLlib and Spark Streaming. Check it out now at datasyndrome.com/video.

awesome-apache-airflow - Curated list of resources about Apache Airflow


This is a curated list of resources about Apache Airflow (incubating). Please feel free to contribute any items that should be included. Items are generally added at the top of each section so that more fresh items are featured more prominently.

airflow-docker - Apache Airflow Docker Image.

  •    Python

This repository contains Dockerfile of apache-airflow for Docker's automated build published to the public Docker Hub Registry. Pull the image from the Docker repository.

airflow-operator - Kubernetes custom controller and CRDs to managing Airflow

  •    Go

This is not an officially supported Google product. The Airflow Operator is still under active development and has not been extensively tested in production environment. Backward compatibility of the APIs is not guaranteed for alpha releases.

dag-factory - Dynamically generate Apache Airflow DAGs from YAML configuration files

  •    Python

dag-factory is a library for dynamically generating Apache Airflow DAGs from YAML configuration files. To install dag-factory run pip install dag-factory. It requires Python 3.6.0+ and Apache Airflow 1.9+.

discreETLy - ETLy is an add-on dashboard service on top of Apache Airflow.

  •    Python

DiscreETLy is an add-on dashboard service on top of Apache Airflow. It is a user friendly UI showing status of particular DAGs. Moreover, it allows the users to map Tasks within a particular DAG to tables available in any system (relational and non-relational) via friendly yaml definition. DiscreETLy provides fuctionality for monitoring DAGs status as well as optional communication with services such as Prometheus or InfluxDB. Minimal setup required to run the dashboard requires docker. You can find docker installation instructions on official docker website.

airflow-notebook - Airflow-Notebook is an Apache Airflow operator that enables running notebooks or Python scripts as tasks in a DAG

  •    Python

airflow-notebook implements an Apache Airflow operator NotebookOp that supports running of notebooks and Python scripts in DAGs. To use the operator, configure Airflow to use the Elyra-enabled container image or install this package on the host(s) where the Apache Airflow webserver, scheduler, and workers are running. Follow the instructions in this document.

pipeline-editor - Common pipeline-editor components used in different clients (e

  •    TypeScript

A react component for editing pipeline files. Used across all Elyra applications and browser extensions.

aws-airflow-demo - Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for Apache Airflow (MWAA) on AWS

  •    Python

Project files for the post, Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for Apache Airflow (MWAA) on AWS. Please see post for complete instructions on using the project's files. Below is the final high-level architecture for the post’s demonstration. The diagram shows the approximate route of a DAG Run request, in red. The diagram includes an optional S3 Gateway VPC endpoint, not detailed in the post, but recommended for additional security.

AirflowETL - Blog post on ETL pipelines with Airflow

  •    Jupyter

In this blog post I want to go over the operations of data engineering called Extract, Transform, Load (ETL) and show how they can be automated and scheduled using Apache Airflow. You can see the source code for this project here. Extracting data can be done in a multitude of ways, but one of the most common ways is to query a WEB API. If the query is sucessful, then we will receive data back from the API's server. Often times the data we get back is in the form of JSON. JSON can pretty much be thought of a semi-structured data or as a dictionary where the dictionary keys and values are strings. Since the data is a dictionary of strings this means we must transform it before storing or loading into a database. Airflow is a platform to schedule and monitor workflows and in this post I will show you how to use it to extract the daily weather in New York from the OpenWeatherMap API, convert the temperature to Celsius and load the data in a simple PostgreSQL database.

cloud-composer-mssql-dataflow-bigquery - This repository contains an example of how to leverage Cloud Composer and Cloud Dataflow to move data from a Microsoft SQL Server to BigQuery

  •    Python

This repository contains an example of how to leverage Cloud Composer and Cloud Dataflow to move data from a Microsoft SQL Server to BigQuery. The diagrams below demonstrate the workflow pipeline. A Cloud Composer DAG is either scheduled or manually triggered which connects a Microsoft SQL Server defined and exports the defined data to Google Cloud Storage as a JSON file.

airflow-client-go - Apache Airflow - OpenApi Client for Go

  •    Go

Go Airflow OpenAPI client generated from openapi spec. See README for full client API documentation.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.