Displaying 1 to 16 from 16 results

h2o-tutorials - Tutorials and training material for the H2O Machine Learning Platform

  •    Jupyter

This document contains tutorials and training materials for H2O-3. If you find any problems with the tutorial code, please open an issue in this repository. For general H2O questions, please post those to Stack Overflow using the "h2o" tag or join the H2O Stream Google Group for questions that don't fit into the Stack Overflow format.

sparkling-water - Sparkling Water provides H2O functionality inside Spark cluster

  •    Scala

Are you looking for RSparkling? It's README is available here. The Sparkling Water is developed in multiple parallel branches. Each branch corresponds to a Spark major release (e.g., branch rel-2.3 provides implementation of Sparkling Water for Spark 2.3).

benchm-ml - A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc

  •    R

This project aims at a minimal benchmark for scalability, speed and accuracy of commonly used implementations of a few machine learning algorithms. The target of this study is binary classification with numeric and categorical inputs (of limited cardinality i.e. not very sparse) and no missing data, perhaps the most common problem in business applications (e.g. credit scoring, fraud detection or churn prediction). If the input matrix is of n x p, n is varied as 10K, 100K, 1M, 10M, while p is ~1K (after expanding the categoricals into dummy variables/one-hot encoding). This particular type of data structure/size (the largest) stems from this author's interest in some particular business applications. Note: While a large part of this benchmark was done in Spring 2015 reflecting the state of ML implementations at that time, this repo is being updated if I see significant changes in implementations or new implementations have become widely available (e.g. lightgbm). Also, please find a summary of the progress and learnings from this benchmark at the end of this repo.

cudf - cuDF - GPU DataFrame Library

  •    C++

NOTE: For the latest stable README.md ensure you are on the main branch. Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.

pygdf - GPU Data Frame

  •    Jupyter

PyGDF implements the Python interface to access and manipulate the GPU Dataframe of GPU Open Analytics Initialive (GOAI). We aim to provide a simple interface that similar to the Pandas dataframe and hide the details of GPU programming.

h2o-meetups - Presentations from H2O meetups & conferences by the H2O.ai team

  •    Jupyter

The meetup presentations are also hosted on SlideShare. Links to presentations coming soon.


  •    Go

The example of configuration file.This software is released under the MIT License, see LICENSE.md.

h2o2 - Proxy handler for hapi.js

  •    Javascript

Proxy handler plugin for hapi.js.h2o2 is a hapi plugin that adds proxying functionality.

awesome-h2o - A curated list of research, applications and projects built using H2O Machine Learning


Below is a curated list of all the awesome projects, applications, research, tutorials, courses and books that use H2O, an open source, distributed machine learning platform. H2O offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models, Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Cox Proportional Hazards, K-means, PCA, Word2Vec, as well as a fully automatic machine learning algorithm (AutoML). H2O.ai produces many tutorials, blog posts, presentations and videos about H2O, but the list below is comprised of awesome content produced by the greater H2O user community.

h2o-flow - Web based interactive computing environment for H2O

  •    Javascript

H2O Flow is a web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media to build machine learning workflows. Think of Flow as a hybrid GUI + REPL + storytelling environment for exploratory data analysis and machine learning, with async, re-scriptable record/replay capabilities. Flow sandboxes and evals user-Javascript in the browser via static analysis and tree-rewriting. Flow is written in non-standard Javascript (with compile-time unqualified imports), with a veritable heap of little embedded DSLs for reactive dataflow programming, markup generation, lazy evaluation and multicast signals/slots.

mli-resources - Machine Learning Interpretability Resources

  •    Jupyter

Machine learning algorithms create potentially more accurate models than linear models, but any increase in accuracy over more traditional, better-understood, and more easily explainable techniques is not practical for those who must explain their models to regulators or customers. For many decades, the models created by machine learning algorithms were generally taken to be black-boxes. However, a recent flurry of research has introduced credible techniques for interpreting complex, machine-learned models. Materials presented here illustrate applications or adaptations of these techniques for practicing data scientists. Want to contribute your own examples? Just make a pull request.

rsparkling - RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)

  •    R

Please submit issues, questions and PRs in the new location. The current repo is not maintained. The repository has been moved for several reasons, mainly to improve the integrations with Sparkling Water and for the stability reasons.

GWU_data_mining - Materials for GWU DNSC 6279 and DNSC 6290.

  •    Jupyter

DNSC 6279 ("Data Mining") provides exposure to various data preprocessing, statistics, and machine learning techniques that can be used both to discover relationships in large data sets and to build predictive models. Techniques covered will include basic and analytical data preprocessing, regression models, decision trees, neural networks, clustering, association analysis, and basic text mining. Techniques will be presented in the context of data driven organizational decision making using statistical and machine learning approaches. DNSC 6290 ("Machine Learning") provides a follow up course to DNSC 6279 that will expand on both the theoretical and practical aspects of subjects covered in the pre-requisite course while optionally introducing new materials. Techniques covered may include feature engineering, penalized regression, neural networks and deep learning, ensemble models including stacked generalization and super learner approaches, matrix factorization, model validation, and model interpretation. Classes will be taught as workshops where groups of students will apply lecture materials to the ongoing Kaggle Advanced Regression and Digit Recognizer contests.

interpretable_machine_learning_with_python - Practical techniques for interpreting machine learning models

  •    Jupyter

Monotonicity constraints can turn opaque, complex models into transparent, and potentially regulator-approved models, by ensuring predictions only increase or only decrease for any change in a given input variable. In this notebook, I will demonstrate how to use monotonicity constraints in the popular open source gradient boosting package XGBoost to train a simple, accurate, nonlinear classifier on the UCI credit card default data. Once we have trained a monotonic XGBoost model, we will use partial dependence plots and individual conditional expectation (ICE) plots to investigate the internal mechanisms of the model and to verify its monotonic behavior. Partial dependence plots show us the way machine-learned response functions change based on the values of one or two input variables of interest, while averaging out the effects of all other input variables. ICE plots can be used to create more localized descriptions of model predictions, and ICE plots pair nicely with partial dependence plots. An example of generating regulator mandated reason codes from high fidelity Shapley explanations for any model prediction is also presented. The combination of monotonic XGBoost, partial dependence, ICE, and Shapley explanations is likely the most direct way to create an interpretable machine learning model today.

sldm4-h2o - Statistical Learning & Data Mining IV - H2O Presenation & Tutorial

  •    HTML

This repository contains the H2O presentation for Trevor Hastie and Rob Tibshirani's Statistical Learning and Data Mining IV course in Washington, DC on October 19, 2016.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.