superlearner-guide - SuperLearner guide: fitting models, ensembling, prediction, hyperparameters, parallelization, timing, feature selection, etc

  •        5

A guide to using SuperLearner for prediction. Also many Coursera offerings and other online classes.



Related Projects

xcessiv - A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python

  •    Python

Stacked ensembles are simple in theory. You combine the predictions of smaller models and feed those into another model. However, in practice, implementing them can be a major headache. Xcessiv holds your hand through all the implementation details of creating and optimizing stacked ensembles so you're free to fully define only the things you care about.

php-ml - PHP-ML - Machine Learning library for PHP

  •    PHP

Fresh approach to Machine Learning in PHP. Algorithms, Cross Validation, Neural Network, Preprocessing, Feature Extraction and much more in one library. PHP-ML requires PHP >= 7.1.

ai-resources - Selection of resources to learn Artificial Intelligence / Machine Learning / Statistical Inference / Deep Learning / Reinforcement Learning


Update April 2017: It’s been almost a year since I posted this list of resources, and over the year there’s been an explosion of articles, videos, books, tutorials etc on the subject — even an explosion of ‘lists of resources’ such as this one. It’s impossible for me to keep this up to date. However, the one resource I would like to add is ( led by Gene Kogan. It’s specifically aimed at artists and the creative coding community. This is a very incomplete and subjective selection of resources to learn about the algorithms and maths of Artificial Intelligence (AI) / Machine Learning (ML) / Statistical Inference (SI) / Deep Learning (DL) / Reinforcement Learning (RL). It is aimed at beginners (those without Computer Science background and not knowing anything about these subjects) and hopes to take them to quite advanced levels (able to read and understand DL papers). It is not an exhaustive list and only contains some of the learning materials that I have personally completed so that I can include brief personal comments on them. It is also by no means the best path to follow (nowadays most MOOCs have full paths all the way from basic statistics and linear algebra to ML/DL). But this is the path I took and in a sense it's a partial documentation of my personal journey into DL (actually I bounced around all of these back and forth like crazy). As someone who has no formal background in Computer Science (but has been programming for many years), the language, notation and concepts of ML/SI/DL and even CS was completely alien to me, and the learning curve was not only steep, but vertical, treacherous and slippery like ice.

pycm - Multi-class confusion matrix library in Python

  •    Python

PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters. PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and an accurate evaluation of large variety of classifiers. threshold is added in version 0.9 for real value prediction.

CloudForest - Ensembles of decision trees in go/golang.

  •    Go

Fast, flexible, multi-threaded ensembles of decision trees for machine learning in pure Go (golang).It can achieve quicker training times then many other popular implementations on some datasets. This is the result of cpu cache friendly memory utilization well suited to modern processors and separate, optimized paths to learn splits from binary, numerical and categorical data.

dist-keras - Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark

  •    Python

Distributed Deep Learning with Apache Spark and Keras. Distributed Keras is a distributed deep learning framework built op top of Apache Spark and Keras, with a focus on "state-of-the-art" distributed optimization algorithms. We designed the framework in such a way that a new distributed optimizer could be implemented with ease, thus enabling a person to focus on research. Several distributed methods are supported, such as, but not restricted to, the training of ensembles and models using data parallel methods.

ISLR-python - An Introduction to Statistical Learning (James, Witten, Hastie, Tibshirani, 2013): Python code

  •    Jupyter

This repository contains Python code for a selection of tables, figures and LAB sections from the book 'An Introduction to Statistical Learning with Applications in R' by James, Witten, Hastie, Tibshirani (2013). 2018-01-15: Minor updates to the repository due to changes/deprecations in several packages. The notebooks have been tested with these package versions. Thanks @lincolnfrias and @telescopeuser.

Smile - Statistical Machine Intelligence & Learning Engine

  •    Java

Smile (Statistical Machine Intelligence and Learning Engine) is a fast and comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system in Java and Scala. With advanced data structures and algorithms, Smile delivers state-of-art performance.Smile covers every aspect of machine learning, including classification, regression, clustering, association rule mining, feature selection, manifold learning, multidimensional scaling, genetic algorithms, missing value imputation, efficient nearest neighbor search, etc.

datumbox-framework - Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications

  •    Java

Datumbox is an open-source Machine Learning Framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

carla - Open-source simulator for autonomous driving research.

  •    C++

CARLA is an open-source simulator for autonomous driving research. CARLA has been developed from the ground up to support development, training, and validation of autonomous urban driving systems. In addition to open-source code and protocols, CARLA provides open digital assets (urban layouts, buildings, vehicles) that were created for this purpose and can be used freely. The simulation platform supports flexible specification of sensor suites and environmental conditions. If you want to benchmark your model in the same conditions as in our CoRL’17 paper, check out Benchmarking.

pmtk3 - Probabilistic Modeling Toolkit for Matlab/Octave.

  •    HTML

PMTK is a collection of Matlab/Octave functions, written by Matt Dunham, Kevin Murphy and various other people. The toolkit is primarily designed to accompany Kevin Murphy's textbook Machine learning: a probabilistic perspective, but can also be used independently of this book. The goal is to provide a unified conceptual and software framework encompassing machine learning, graphical models, and Bayesian statistics (hence the logo). (Some methods from frequentist statistics, such as cross validation, are also supported.) Since December 2011, the toolbox is in maintenance mode, meaning that bugs will be fixed, but no new features will be added (at least not by Kevin or Matt). PMTK builds on top of several existing packages, available from pmtksupport, and provides a unified interface to them. In addition, it provides readable "reference" implementations of many common machine learning techniques. The vast majority of the code is written in Matlab. (For a brief discussion of why we chose Matlab, click here. Most of the code also runs on Octave an open-source Matlab clone.) However, in a few cases we also provide wrappers to implementations written in C, for speed reasons. PMTK currently (October 2010) has over 67,000 lines.

Math-of-Machine-Learning-Course-by-Siraj - Implements common data science methods and machine learning algorithms from scratch in python

  •    Jupyter

This repository was initially created to submit machine learning assignments for Siraj Raval's online machine learning course. The purpose of the course was to learn how to implement the most common machine learning algorithms from scratch (without using machine learning libraries such as tensorflow, PyTorch, scikit-learn, etc). Although that course has ended now, I am continuing to learn data science and machine learning from other sources such as Coursera, online blogs, and attending machine learning lectures at University of Toronto. Sticking to the theme of implementing machine learning algortihms from scratch, I will continue to post detailed notebooks in python here as I learn more.

deep-learning-book - Repository for "Introduction to Artificial Neural Networks and Deep Learning: A Practical Guide with Applications in Python"

  •    Jupyter

Repository for the book Introduction to Artificial Neural Networks and Deep Learning: A Practical Guide with Applications in Python. Deep learning is not just the talk of the town among tech folks. Deep learning allows us to tackle complex problems, training artificial neural networks to recognize complex patterns for image and speech recognition. In this book, we'll continue where we left off in Python Machine Learning and implement deep learning algorithms in PyTorch.

Pyod - A Python Toolkit for Scalable Outlier Detection (Anomaly Detection)

  •    Python

Important Notes: PyOD contains some neural network based models, e.g., AutoEncoders, which are implemented in keras. However, PyOD would NOT install keras and tensorflow automatically. This would reduce the risk of damaging your local installations. You are responsible for installing keras and tensorflow if you want to use neural net based models. An instruction is provided here. Anomaly detection resources, e.g., courses, books, papers and videos.

mlens - ML-Ensemble – high performance ensemble learning

  •    Python

ML-Ensemble combines a Scikit-learn high-level API with a low-level computational graph framework to build memory efficient, maximally parallelized ensemble networks in as few lines of codes as possible. ML-Ensemble is thread safe as long as base learners are and can fall back on memory mapped multiprocessing for memory-neutral process-based concurrency. For tutorials and full documentation, visit the project website.

high-school-guide-to-machine-learning - Being a high schooler myself and having studied Machine Learning and Artificial Intelligence for a year now, I believe that there fails to exist a learning path in this field for High School students


Being a high schooler myself and having studied Machine Learning and Artificial Intelligence for a year now, I believe that there fails to exist a learning path in this field for High School students. This is my attempt to create one. Over the past few months, I've tried to spend a couple of hours every day understanding this field, be it watching Youtube videos or undertaking projects. I've been guided by older peers who've had far more experience than me, and now feel that I have ample experience to share my insights.

Hands-On-Reinforcement-Learning-With-Python - Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow

  •    Jupyter

Reinforcement Learning with Python will help you to master basic reinforcement learning algorithms to the advanced deep reinforcement learning algorithms. The book starts with an introduction to Reinforcement Learning followed by OpenAI and Tensorflow. You will then explore various RL algorithms and concepts such as the Markov Decision Processes, Monte-Carlo methods, and dynamic programming, including value and policy iteration. This example-rich guide will introduce you to deep learning, covering various deep learning algorithms. You will then explore deep reinforcement learning in depth, which is a combination of deep learning and reinforcement learning. You will master various deep reinforcement learning algorithms such as DQN, Double DQN. Dueling DQN, DRQN, A3C, DDPG, TRPO, and PPO. You will also learn about recent advancements in reinforcement learning such as imagination augmented agents, learn from human preference, DQfD, HER and many more.

DeepBench - Benchmarking Deep Learning operations on different hardware

  •    C++

The primary purpose of DeepBench is to benchmark operations that are important to deep learning on different hardware platforms. Although the fundamental computations behind deep learning are well understood, the way they are used in practice can be surprisingly diverse. For example, a matrix multiplication may be compute-bound, bandwidth-bound, or occupancy-bound, based on the size of the matrices being multiplied and the kernel implementation. Because every deep learning model uses these operations with different parameters, the optimization space for hardware and software targeting deep learning is large and underspecified. DeepBench attempts to answer the question, "Which hardware provides the best performance on the basic operations used for deep neural networks?". We specify these operations at a low level, suitable for use in hardware simulators for groups building new processors targeted at deep learning. DeepBench includes operations and workloads that are important to both training and inference.

probability - Probabilistic reasoning and statistical analysis in TensorFlow

  •    Jupyter

TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. As part of the TensorFlow ecosystem, TensorFlow Probability provides integration of probabilistic methods with deep networks, gradient-based inference via automatic differentiation, and scalability to large datasets and models via hardware acceleration (e.g., GPUs) and distributed computation. Our probabilistic machine learning tools are structured as follows.