Each model is built into a separate Docker image with the appropriate Python, C++, and Java/Scala Runtime Libraries for training or prediction. Use the same Docker Image from Local Laptop to Production to avoid dependency surprises.
machine-learning artificial-intelligence tensorflow kubernetes elasticsearch cassandra ipython spark kafka netflixoss presto airflow pipeline jupyter-notebook zeppelin docker redis neural-network gpu microservicesElyra is a set of AI-centric extensions to JupyterLab Notebooks. The Elyra Getting Started Guide includes more details on these features.
docker machine-learning airflow binder ai anaconda pypi pipelines jupyterlab notebooks hacktoberfest apache-airflow jupyterlab-extensions kubeflow jupyterlab-extension notebook-jupyter kubeflow-pipelines elyra jupyterlab-notebooksDataSphere Studio (DSS for short) is WeDataSphere, a big data platform of WeBank, a self-developed one-stop data application development management portal. Based on Linkis computation middleware, DSS can easily integrate upper-level data application systems, making data application development simple and easy to use.
workflow airflow spark hive hadoop etl kettle hue tableau flink zeppelin griffin azkaban governance davinci visualis supperset linkis scriptis dataworksArgo has set of open source tools for Kubernetes to run workflows, manage clusters, and do GitOps right.
continuous-deployment gitops machine-learning airflow workflow-engine argo dag knative argo-workflows ci-cd kubernetes-toolsArgo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD (Custom Resource Definition). Argo is a Cloud Native Computing Foundation (CNCF) hosted project.
kubernetes workflow machine-learning airflow workflow-engine cncf argo k8s cloud-native hacktoberfest dag knative argo-workflowsGathers scalable tensorflow and self-hosted infrastructure deployment, Husein Go-To for development, 100% Docker. Stream from webcam using WebRTC -> Flask SocketIO to detect objects -> WebRTC -> Website.
docker airflow docker-compose tensorflowLike my work? I am Principal Consultant at Data Syndrome, a consultancy offering assistance and training with building full-stack analytics products, applications and systems. Find us on the web at datasyndrome.com. There is now a video course using code from chapter 8, Realtime Predictive Analytics with Kafka, PySpark, Spark MLlib and Spark Streaming. Check it out now at datasyndrome.com/video.
data-syndrome data data-science analytics apache-spark apache-kafka kafka spark predictive-analytics machine-learning machine-learning-algorithms airflow python-3 python3 amazon-ec2 agile-data agile-data-science vagrant amazon-web-servicesThis is a curated list of resources about Apache Airflow (incubating). Please feel free to contribute any items that should be included. Items are generally added at the top of each section so that more fresh items are featured more prominently.
airflow apache-airflow workflow-managementThis repository contains Dockerfile of apache-airflow for Docker's automated build published to the public Docker Hub Registry. Pull the image from the Docker repository.
docker airflow scheduler workflowThis is not an officially supported Google product. The Airflow Operator is still under active development and has not been extensively tested in production environment. Backward compatibility of the APIs is not guaranteed for alpha releases.
kubernetes kubernetes-operator apache-airflow airflow crd kubernetes-controller workflow-engineExport airflow metrics in Prometheus format. Requires Go. Tested with Go 1.9+.
airflow mysql prometheus exporter metrics apache-airflow apachedag-factory is a library for dynamically generating Apache Airflow DAGs from YAML configuration files. To install dag-factory run pip install dag-factory. It requires Python 3.6.0+ and Apache Airflow 1.9+.
airflow apache-airflow dagsDiscreETLy is an add-on dashboard service on top of Apache Airflow. It is a user friendly UI showing status of particular DAGs. Moreover, it allows the users to map Tasks within a particular DAG to tables available in any system (relational and non-relational) via friendly yaml definition. DiscreETLy provides fuctionality for monitoring DAGs status as well as optional communication with services such as Prometheus or InfluxDB. Minimal setup required to run the dashboard requires docker. You can find docker installation instructions on official docker website.
etl edt-dashboard airflowairflow-notebook implements an Apache Airflow operator NotebookOp that supports running of notebooks and Python scripts in DAGs. To use the operator, configure Airflow to use the Elyra-enabled container image or install this package on the host(s) where the Apache Airflow webserver, scheduler, and workers are running. Follow the instructions in this document.
airflow jupyter-notebook airflow-dagA react component for editing pipeline files. Used across all Elyra applications and browser extensions.
machine-learning airflow ai pipeline apache-airflow kubeflow-pipelines pipeline-editorProject files for the post, Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for Apache Airflow (MWAA) on AWS. Please see post for complete instructions on using the project's files. Below is the final high-level architecture for the post’s demonstration. The diagram shows the approximate route of a DAG Run request, in red. The diagram includes an optional S3 Gateway VPC endpoint, not detailed in the post, but recommended for additional security.
aws airflow amazon-emr apache-airflow pyspark-applications amazon-mwaaExample of an ETL Pipeline using Airflow
airflow etl postgresql data-engineering data-pipelinesIn this blog post I want to go over the operations of data engineering called Extract, Transform, Load (ETL) and show how they can be automated and scheduled using Apache Airflow. You can see the source code for this project here. Extracting data can be done in a multitude of ways, but one of the most common ways is to query a WEB API. If the query is sucessful, then we will receive data back from the API's server. Often times the data we get back is in the form of JSON. JSON can pretty much be thought of a semi-structured data or as a dictionary where the dictionary keys and values are strings. Since the data is a dictionary of strings this means we must transform it before storing or loading into a database. Airflow is a platform to schedule and monitor workflows and in this post I will show you how to use it to extract the daily weather in New York from the OpenWeatherMap API, convert the temperature to Celsius and load the data in a simple PostgreSQL database.
airflow sql database schedule etl postgresql data-engineering data-pipeline etl-pipelineThis repository contains an example of how to leverage Cloud Composer and Cloud Dataflow to move data from a Microsoft SQL Server to BigQuery. The diagrams below demonstrate the workflow pipeline. A Cloud Composer DAG is either scheduled or manually triggered which connects a Microsoft SQL Server defined and exports the defined data to Google Cloud Storage as a JSON file.
bigquery airflow microsoft-sql-server dataflow cloud-composerGo Airflow OpenAPI client generated from openapi spec. See README for full client API documentation.
airflow apache apache-airflow apache-airflow-client
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.