mli-resources - Machine Learning Interpretability Resources

  •        22

Machine learning algorithms create potentially more accurate models than linear models, but any increase in accuracy over more traditional, better-understood, and more easily explainable techniques is not practical for those who must explain their models to regulators or customers. For many decades, the models created by machine learning algorithms were generally taken to be black-boxes. However, a recent flurry of research has introduced credible techniques for interpreting complex, machine-learned models. Materials presented here illustrate applications or adaptations of these techniques for practicing data scientists. Want to contribute your own examples? Just make a pull request.



Related Projects

AIX360 - Interpretability and explainability of data and machine learning models

  •    Python

The AI Explainability 360 toolkit is an open-source library that supports interpretability and explainability of datasets and machine learning models. The AI Explainability 360 Python package includes a comprehensive set of algorithms that cover different dimensions of explanations along with proxy explainability metrics. The AI Explainability 360 interactive experience provides a gentle introduction to the concepts and capabilities by walking through an example use case for different consumer personas. The tutorials and example notebooks offer a deeper, data scientist-oriented introduction. The complete API is also available.

benchm-ml - A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc

  •    R

This project aims at a minimal benchmark for scalability, speed and accuracy of commonly used implementations of a few machine learning algorithms. The target of this study is binary classification with numeric and categorical inputs (of limited cardinality i.e. not very sparse) and no missing data, perhaps the most common problem in business applications (e.g. credit scoring, fraud detection or churn prediction). If the input matrix is of n x p, n is varied as 10K, 100K, 1M, 10M, while p is ~1K (after expanding the categoricals into dummy variables/one-hot encoding). This particular type of data structure/size (the largest) stems from this author's interest in some particular business applications. Note: While a large part of this benchmark was done in Spring 2015 reflecting the state of ML implementations at that time, this repo is being updated if I see significant changes in implementations or new implementations have become widely available (e.g. lightgbm). Also, please find a summary of the progress and learnings from this benchmark at the end of this repo.

DALEX - Descriptive mAchine Learning EXplanations

  •    R

Machine Learning models are widely used and have various applications in classification or regression tasks. Due to increasing computational power, availability of new data sources and new methods, ML models are more and more complex. Models created with techniques like boosting, bagging of neural networks are true black boxes. It is hard to trace the link between input variables and model outcomes. They are use because of high performance, but lack of interpretability is one of their weakest sides. In many applications we need to know, understand or prove how input variables are used in the model and what impact do they have on final model prediction. DALEX is a set of tools that help to understand how complex models are working.

spark-py-notebooks - Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

  •    Jupyter

This is a collection of IPython notebook/Jupyter notebooks intended to train the reader on different Apache Spark concepts, from basic to advanced, by using the Python language. If Python is not your language, and it is R, you may want to have a look at our R on Apache Spark (SparkR) notebooks instead. Additionally, if your are interested in being introduced to some basic Data Science Engineering, you might find these series of tutorials interesting. There we explain different concepts and applications using Python and R.

h2o-tutorials - Tutorials and training material for the H2O Machine Learning Platform

  •    Jupyter

This document contains tutorials and training materials for H2O-3. If you find any problems with the tutorial code, please open an issue in this repository. For general H2O questions, please post those to Stack Overflow using the "h2o" tag or join the H2O Stream Google Group for questions that don't fit into the Stack Overflow format.

practical-machine-learning-with-python - Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system

  •    Jupyter

"Data is the new oil" is a saying which you must have heard by now along with the huge interest building up around Big Data and Machine Learning in the recent past along with Artificial Intelligence and Deep Learning. Besides this, data scientists have been termed as having "The sexiest job in the 21st Century" which makes it all the more worthwhile to build up some valuable expertise in these areas. Getting started with machine learning in the real world can be overwhelming with the vast amount of resources out there on the web. "Practical Machine Learning with Python" follows a structured and comprehensive three-tiered approach packed with concepts, methodologies, hands-on examples, and code. This book is packed with over 500 pages of useful information which helps its readers master the essential skills needed to recognize and solve complex problems with Machine Learning and Deep Learning by following a data-driven mindset. By using real-world case studies that leverage the popular Python Machine Learning ecosystem, this book is your perfect companion for learning the art and science of Machine Learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute Machine Learning systems and projects successfully.

Math-of-Machine-Learning-Course-by-Siraj - Implements common data science methods and machine learning algorithms from scratch in python

  •    Jupyter

This repository was initially created to submit machine learning assignments for Siraj Raval's online machine learning course. The purpose of the course was to learn how to implement the most common machine learning algorithms from scratch (without using machine learning libraries such as tensorflow, PyTorch, scikit-learn, etc). Although that course has ended now, I am continuing to learn data science and machine learning from other sources such as Coursera, online blogs, and attending machine learning lectures at University of Toronto. Sticking to the theme of implementing machine learning algortihms from scratch, I will continue to post detailed notebooks in python here as I learn more.

deepLearningBook-Notes - Notes on the Deep Learning book from Ian Goodfellow, Yoshua Bengio and Aaron Courville (2016)

  •    Jupyter

This content is part of a series following the chapter 2 on linear algebra from the Deep Learning Book by Goodfellow, I., Bengio, Y., and Courville, A. (2016). It aims to provide intuitions/drawings/python code on mathematical theories and is constructed as my understanding of these concepts. I'd like to introduce a series of blog posts and their corresponding Python Notebooks gathering notes on the Deep Learning Book from Ian Goodfellow, Yoshua Bengio, and Aaron Courville (2016). The aim of these notebooks is to help beginners/advanced beginners to grasp linear algebra concepts underlying deep learning and machine learning. Acquiring these skills can boost your ability to understand and apply various data science algorithms. In my opinion, it is one of the bedrock of machine learning, deep learning and data science.

tpot - A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming

  •    Python

Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines to find the best one for your data.

pygdf - GPU Data Frame

  •    Jupyter

PyGDF implements the Python interface to access and manipulate the GPU Dataframe of GPU Open Analytics Initialive (GOAI). We aim to provide a simple interface that similar to the Pandas dataframe and hide the details of GPU programming.

python-machine-learning-book - The "Python Machine Learning (1st edition)" book code repository and info resource

  •    Jupyter

This GitHub repository contains the code examples of the 1st Edition of Python Machine Learning book. If you are looking for the code examples of the 2nd Edition, please refer to this repository instead. What you can expect are 400 pages rich in useful material just about everything you need to know to get started with machine learning ... from theory to the actual code that you can directly put into action! This is not yet just another "this is how scikit-learn works" book. I aim to explain all the underlying concepts, tell you everything you need to know in terms of best practices and caveats, and we will put those concepts into action mainly using NumPy, scikit-learn, and Theano.

H2O - Fast Scalable Machine Learning API For Smarter Applications

  •    Java

H2O is for data scientists and application developers who need fast, in-memory scalable machine learning for smarter applications. H2O is an open source parallel processing engine for machine learning. Unlike traditional analytics tools, H2O provides a combination of extraordinary math, a high performance parallel architecture, and unrivaled ease of use.

data-science-your-way - Ways of doing Data Science Engineering and Machine Learning in R and Python

  •    Jupyter

These series of tutorials on Data Science engineering will try to compare how different concepts in the discipline can be implemented in the two dominant ecosystems nowadays: R and Python. We will do this from a neutral point of view. Our opinion is that each environment has good and bad things, and any data scientist should know how to use both in order to be as prepared as posible for job market or to start personal project.

scikit-learn-videos - Jupyter notebooks from the scikit-learn video series

  •    Jupyter

This video series will teach you how to solve machine learning problems using Python's popular scikit-learn library. It was featured on Kaggle's blog in 2015. There are 9 video tutorials totaling 4 hours, each with a corresponding Jupyter notebook. The notebook contains everything you see in the video: code, output, images, and comments.

CADL - Course materials/Homework materials for the FREE MOOC course on "Creative Applications of Deep Learning w/ Tensorflow" #CADL

  •    Jupyter

This repository contains lecture transcripts and homework assignments as Jupyter Notebooks for the first of three Kadenze Academy courses on Creative Applications of Deep Learning w/ Tensorflow. It also contains a python package containing all the code developed during all three courses. The first course makes heavy usage of Jupyter Notebook. This will be necessary for submitting the homeworks and interacting with the guided session notebooks I will provide for each assignment. Follow along this guide and we'll see how to obtain all of the necessary libraries that we'll be using. By the end of this, you'll have installed Jupyter Notebook, NumPy, SciPy, and Matplotlib. While many of these libraries aren't necessary for performing the Deep Learning which we'll get to in later lectures, they are incredibly useful for manipulating data on your computer, preparing data for learning, and exploring results.

DataScienceR - a curated list of R tutorials for Data Science, NLP and Machine Learning

  •    R

This repo contains a curated list of R tutorials and packages for Data Science, NLP and Machine Learning. This also serves as a reference guide for several common data analysis tasks. Curated list of Python tutorials for Data Science, NLP and Machine Learning.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.