Performance of various open source GBM implementations (h2o, xgboost, lightgbm) on the airline dataset (1M and 10M records). If you don't have a GPU, lightgbm (CPU) trains the fastest.machine-learning gradient-boosting-machine gbm h2oai xgboost lightgbm benchmark
The goal of this repo is to study the impact of having one dataset/sample ("the dataset") when training and tuning machine learning models in practice (or in competitions) on the prediction accuracy on new data (that usually comes from a slightly different distribution due to non-stationarity). To keep things simple we focus on binary classification, use only one source dataset with mix of numeric and categorical features and no missing values, we don't perform feature engineering, tune only GBMs with lightgbm and random hyperparameter search (might also ensemble the best models later), and we use only AUC as a measure of accuracy.machine-learning gradient-boosting-machine gbm hyperparameter-optimization overfitting
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.