Displaying 1 to 9 from 9 results

Hub - Fastest dataset optimization and management for machine and deep learning

  •    Python

Note: the translations of this document may not be up-to-date. For the latest version, please check the README in English. Software 2.0 needs Data 2.0, and Hub delivers it. Most of the time Data Scientists/ML researchers work on data management and preprocessing instead of training models. With Hub, we are fixing this. We store your (even petabyte-scale) datasets as single numpy-like array on the cloud, so you can seamlessly access and work with it from any machine. Hub makes any data type (images, text files, audio, or video) stored in cloud usable as fast as if it were stored on premise. With same dataset view, your team can always be in sync.

orchest - Build data pipelines, the easy way 🛠️

  •    Python

No frameworks. No YAML. Just write Python and R code in Notebooks. 💡 Watch the full narrated video to learn more about building data pipelines in Orchest.

dagster - An orchestration platform for the development, production, and observation of data assets.

  •    Python

An orchestration platform for the development, production, and observation of data assets. Dagster lets you define jobs in terms of the data flow between reusable, logical components, then test locally and run anywhere. With a unified view of jobs and the assets they produce, Dagster can schedule and orchestrate Pandas, Spark, SQL, or anything else that Python can invoke.

mleap - MLeap: Deploy Spark Pipelines to Production

  •    Scala

Deploying machine learning data pipelines and algorithms should not be a time-consuming or difficult task. MLeap allows data scientists and engineers to deploy machine learning pipelines from Spark and Scikit-learn to a portable format and execution engine. Documentation is available at mleap-docs.combust.ml.




Elementary - Data observability platform for modern data teams that is open and transparent

  •    Python

Elementary was built out of the need to effortlessly and immediately gain visibility into the data stack, starting with tracing the actual upstream & downstream dependencies in the data warehouse, without any implementation efforts, security risks or compromises on accuracy.

neon-workshop - A Pachyderm deep learning tutorial for conference workshops

  •    Python

This workshop focuses on building a production scale machine learning pipeline with Pachyderm that integrates Nervana Neon training and inference. In particular, this pipeline trains and utilizes a model that predicts the sentiment of movie reviews, based on data from IMDB. Finally, we provide some Resources for you for further exploration.


versatile-data-kit - Versatile Data Kit is a data engineering framework that enables Data Engineers to develop, troubleshoot, deploy, run, and manage data processing workloads

  •    Python

Versatile Data Kit is a data engineering framework that enables Data Engineers to develop, troubleshoot, deploy, run, and manage data processing workloads (referred to as "Data Jobs"). A "Data Job" enables Data Engineers to implement automated pull ingestion (E in ELT) and batch data transformation (T in ELT) into a database. Versatile Data Kit provides an abstraction layer that helps solve common data engineering problems. It can be called by the workflow engine with the goal of making data engineers more efficient (for example, it ensures data applications are packaged, versioned and deployed correctly, while dealing with credentials, retries, reconnects, etc.). Everything exposed by Versatile Data Kit provides built-in monitoring, troubleshooting, and smart notification capabilities. For example, tracking both code and data modifications and the relations between them enables engineers to troubleshoot more quickly and provides an easy revert to a stable version.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.