Performance of various open source GBM implementations (h2o, xgboost, lightgbm) on the airline dataset (1M and 10M records). If you don't have a GPU, lightgbm (CPU) trains the fastest.machine-learning gradient-boosting-machine gbm h2oai xgboost lightgbm benchmark
Monotonicity constraints can turn opaque, complex models into transparent, and potentially regulator-approved models, by ensuring predictions only increase or only decrease for any change in a given input variable. In this notebook, I will demonstrate how to use monotonicity constraints in the popular open source gradient boosting package XGBoost to train a simple, accurate, nonlinear classifier on the UCI credit card default data. Once we have trained a monotonic XGBoost model, we will use partial dependence plots and individual conditional expectation (ICE) plots to investigate the internal mechanisms of the model and to verify its monotonic behavior. Partial dependence plots show us the way machine-learned response functions change based on the values of one or two input variables of interest, while averaging out the effects of all other input variables. ICE plots can be used to create more localized descriptions of model predictions, and ICE plots pair nicely with partial dependence plots. An example of generating regulator mandated reason codes from high fidelity Shapley explanations for any model prediction is also presented. The combination of monotonic XGBoost, partial dependence, ICE, and Shapley explanations is likely the most direct way to create an interpretable machine learning model today.machine-learning fatml xai gradient-boosting-machine decision-tree data-science fairness interpretable-machine-learning interpretability machine-learning-interpretability iml accountability transparency data-mining interpretable-ml interpretable interpretable-ai lime h2o
The goal of this repo is to study the impact of having one dataset/sample ("the dataset") when training and tuning machine learning models in practice (or in competitions) on the prediction accuracy on new data (that usually comes from a slightly different distribution due to non-stationarity). To keep things simple we focus on binary classification, use only one source dataset with mix of numeric and categorical features and no missing values, we don't perform feature engineering, tune only GBMs with lightgbm and random hyperparameter search (might also ensemble the best models later), and we use only AUC as a measure of accuracy.machine-learning gradient-boosting-machine gbm hyperparameter-optimization overfitting
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.