This is the repository for D-Lab’s Introduction to Machine Learning in R workshop.machine-learning dlab-berkeley tutorial knn random-forest gradient-boosting-machine superlearner decision-trees
The goal of this repo is to study the impact of having one dataset/sample ("the dataset") when training and tuning machine learning models in practice (or in competitions) on the prediction accuracy on new data (that usually comes from a slightly different distribution due to non-stationarity). To keep things simple we focus on binary classification, use only one source dataset with mix of numeric and categorical features and no missing values, we don't perform feature engineering, tune only GBMs with lightgbm and random hyperparameter search (might also ensemble the best models later), and we use only AUC as a measure of accuracy.machine-learning gradient-boosting-machine gbm hyperparameter-optimization overfitting
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.