Note: the translations of this document may not be up-to-date. For the latest version, please check the README in English. Software 2.0 needs Data 2.0, and Hub delivers it. Most of the time Data Scientists/ML researchers work on data management and preprocessing instead of training models. With Hub, we are fixing this. We store your (even petabyte-scale) datasets as single numpy-like array on the cloud, so you can seamlessly access and work with it from any machine. Hub makes any data type (images, text files, audio, or video) stored in cloud usable as fast as if it were stored on premise. With same dataset view, your team can always be in sync.
training data-science machine-learning cloud ai computer-vision deep-learning tensorflow cv ml collaboration pytorch cloud-computing datasets dataset-generation data-processing data-version-control data-pipelines mlopsMetaflow is a human-friendly Python/R library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning. For more information, see Metaflow's website and documentation.
productivity data-science machine-learning r ai reproducible-research ml rstats r-package model-management ml-infrastructure mlops ml-platformAlways know what to expect from your data. Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.
data-science pipeline exploratory-data-analysis eda data-engineering data-quality data-profiling datacleaner exploratory-analysis cleandata dataquality datacleaning mlops pipeline-tests pipeline-testing dataunittest data-unit-tests exploratorydataanalysis pipeline-debt data-profilersThis repository contains a curated list of awesome open source libraries that will help you deploy, monitor, version, scale, and secure your production machine learning.
machine-learning data-mining awesome deep-learning awesome-list interpretability privacy-preserving production-machine-learning mlops privacy-preserving-machine-learning explainability responsible-ai machine-learning-operations ml-ops ml-operations privacy-preserving-ml large-scale-ml production-ml large-scale-machine-learningInteractive reports and JSON profiles to analyze, monitor and debug machine learning models. Evidently helps evaluate machine learning models during validation and monitor them in production. The tool generates interactive visual reports and JSON profiles from pandas DataFrame or csv files. You can use visual reports for ad hoc analysis, debugging and team sharing, and JSON profiles to integrate Evidently in prediction pipelines or with other visualization tools.
data-science machine-learning pandas-dataframe jupyter-notebook html-report production-machine-learning mlops model-monitoring machine-learning-operations data-driftA curated list of awesome MLOps tools. Inspired by awesome-python.
data-science machine-learning awesome ai ml machine-learning-engineering mle mlopsExample Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using Amazon SageMaker. Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows. You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models.
training aws data-science machine-learning reinforcement-learning deep-learning examples jupyter-notebook inference sagemaker mlopsKedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning. Our Get Started guide contains full installation instructions, and includes how to set up Python virtual environments.
pipeline pipelines-as-code hacktoberfest data-versioning data-abstraction mlops kedro cookiecutter-data-scienceRubrix is a production-ready Python framework for exploring, annotating, and managing data in NLP projects. Most annotation tools treat data collection as a one-off activity at the beginning of each project. In real-world projects, data collection is a key activity of the iterative process of ML model development. Once a model goes into production, you want to monitor and analyze its predictions, and collect more data to improve your model over time. Rubrix is designed to close this gap, enabling you to iterate as much as you need.
nlp elasticsearch data-science machine-learning natural-language-processing pytorch artificial-intelligence weak-supervision knowledge-graph developer-tools active-learning annotation-tool weakly-supervised-learning human-in-the-loop mlops text-labelingMLOps empowers data scientists and app developers to help bring ML models to production. MLOps enables you to track / version / audit / certify / re-use every asset in your ML lifecycle and provides orchestration services to streamline managing this lifecycle. Azure ML contains a number of asset management and orchestration services to help you manage the lifecycle of your model training & deployment workflows.
mlops azuremlThis repository contains machine learning pipelines based on Tensorflow TFX library. Every pipeline is designed to be published on a Kubernetes/Kubeflow cluster on premise. Further pipelines are welcome via pull request.
kubernetes devops machine-learning tensorflow pipelines kubeflow tfx mlopsProject bringing Kubeflow Pipelines and Tekton together. The project is driven according to this design doc. The current code allows you run Kubeflow Pipelines with Tekton backend end to end. For more details about the project please follow this detailed blog post. Additionally, look at these slides as well as this deep dive presentation for demos.
dsl hacktoberfest kubeflow mlops tekton tekton-pipelines kubeflow-pipelineGreat Expectations is a leading tool for validating, documenting, and profiling your data to maintain quality and improve communication between teams. In order to configure the GitHub action for your repository, add the following code snippet to your GitHub workflows file. The file should be located under my_repo_name/.github/my_workflow.yml.
data-science continuous-integration actions data-quality data-integrity mlopsAn open source framework to enable data scientists to productionise, test and deploy models with simple workflows that abstract the underlying complexity of scalable MLOps platforms. Tempo provides a unified interface to multiple MLOps projects that enable data scientists to deploy and productionise machine learning systems.
kubernetes machine-learning sklearn xgboost mlopsTo install Merlin in your local machine, click Local Development. Go to the docs folder for the full documentation and guides.
machine-learning mlopsThe actions for creating compute for Azure Machine Learning will allow you to create a new compute target on Azure Machine Learning using GitHub Actions. This repository contains a GitHub Action for creating and connecting to Azure Machine Learning compute resources, so you can later train or deploy machine learning models models remotely. If the compute target exists, it will connect to it, otherwise the action can create a new compute target based on the provided parameters. Currently, the action only supports Azure ML Clusters and AKS Clusters.
data-science machine-learning azure aml azure-machine-learning mlopsThe Deploy Machine Learning Models to Azure action will deploy your model on Azure Machine Learning using GitHub Actions. This repository contains GitHub Action for deploying Machine Learning Models to Azure Machine Learning and creates a real-time endpoint on the model to integrate models in other systems. The endpoint can be hosted either on an Azure Container Instance or on an Azure Kubernetes Service.
data-science machine-learning azure aml azure-machine-learning mlopsThe Register Machine Learning Models with Azure action will deploy your model on Azure Machine Learning using GitHub Actions. This repository contains a GitHub Action for registering Machine Learning Models with Azure Machine Learning model registry for use in deployment and testing. This action is designed to register models that may or may not have been trained using Azure Machine Learning. If they are not trained using Azure Machine Learning, we expect the model to be present in your GitHub Repository.
data-science machine-learning azure aml azure-machine-learning mlopsThe Azure Machine Learning training action will help you train your models on Azure Machine Learning using GitHub Actions. This action is one in a series of actions that can be used to setup an ML Ops process. We suggest getting started with one of our template repositories, which will allow you to create an ML Ops process in less than 5 minutes.
data-science machine-learning azure aml azure-machine-learning mlopsThe aml-workspace action will login / connect to Azure Machine Learning. This repository contains a GitHub Action for connecting to an Azure Machine Learning workspace. You can later use this context to train your model remotely, deploy your models to endpoints etc. You can also use this action to create a new workspace, if you provide the appropriate parameters.
data-science machine-learning azure aml azure-machine-learning mlops
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.