Displaying 1 to 20 from 21 results

tpot - A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming


Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines to find the best one for your data.

benchm-ml - A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc


This project aims at a minimal benchmark for scalability, speed and accuracy of commonly used implementations of a few machine learning algorithms. The target of this study is binary classification with numeric and categorical inputs (of limited cardinality i.e. not very sparse) and no missing data, perhaps the most common problem in business applications (e.g. credit scoring, fraud detection or churn prediction). If the input matrix is of n x p, n is varied as 10K, 100K, 1M, 10M, while p is ~1K (after expanding the categoricals into dummy variables/one-hot encoding). This particular type of data structure/size (the largest) stems from this author's interest in some particular business applications. Note: While a large part of this benchmark was done in Spring 2015 reflecting the state of ML implementations at that time, this repo is being updated if I see significant changes in implementations or new implementations have become widely available (e.g. lightgbm). Also, please find a summary of the progress and learnings from this benchmark at the end of this repo.

useR-machine-learning-tutorial - useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016


Instructions for how to install the necessary software for this tutorial is available here. Data for the tutorial can be downloaded by running ./data/get-data.sh (requires wget). Certain algorithms don't scale well when there are millions of features. For example, decision trees require computing some sort of metric (to determine the splits) on all the feature values (or some fraction of the values as in Random Forest and Stochastic GBM). Therefore, computation time is linear in the number of features. Other algorithms, such as GLM, scale much better to high-dimensional (n << p) and wide data with appropriate regularization (e.g. Lasso, Elastic Net, Ridge).




scoruby - Ruby Scoring API for PMML


Ruby scoring API for Predictive Model Markup Language (PMML).Currently supports Decision Tree, Random Forest Naive Bayes and Gradient Boosted Models.

random-forest-classifier - A random forest classifier in Javascript.


A random forest classifier. A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.Modeled after scikit-learn's RandomForestClassifier.

decision-tree-js - Small JavaScript implementation of ID3 Decision tree


Small JavaScript implementation of algorithm for training Decision Tree and Random Forest classifiers.

edarf - exploratory data analysis using random forests


Functions useful for exploratory data analysis using random forests. This package extends the functionality of random forests fit by party (multivariate, regression, and classification), randomForestSRC (regression and classification,), randomForest (regression and classification), and ranger (classification and regression).


receiptdID - Receipt


Receipt.ID is a multi-label, multi-class, hierarchical classification system. It trains individual Random Forest text-based classifiers and combines the result with other features. Receipt.ID is built to scale with an application as the taxonomy for the domain in which it is applied grows. The data preprocessing code is provided in the notebook receiptID_1_Data_Preprocessing.ipynb. While the modeling code is provided in the notebook receiptID_2_Model.ipynb.

cl-random-forest - Random forest in Common Lisp


Cl-random-forest is a implementation of Random Forest for multiclass classification and univariate regression written in Common Lisp. It also includes a implementation of Global Refinement of Random Forest (Ren, Cao, Wei and Sun. “Global Refinement of Random Forest” CVPR2015). This refinement makes faster and more accurate than standard Random Forest. A dataset consists of a target vector and a input data matrix. For classification, the target vector should be a fixnum simple-vector and the data matrix should be a 2-dimensional double-float array whose row corresponds one datum. Note that the target is a integer starting from 0. For example, the following dataset is valid for 4-class classification with 2-dimensional input.

infiniteboost - InfiniteBoost: building infinite ensembles with gradient descent


InfiniteBoost is an approach to building ensembles which combines best sides of random forest and gradient boosting. Trees in the ensemble encounter mistakes done by previous trees (as in gradient boosting), but due to modified scheme of encountering contributions the ensemble converges to the limit, thus avoiding overfitting (just as random forest).

emtrees - Tree-based machine learning classifiers for embedded systems


Tree-based machine learning classifiers for microcontroller and embedded systems. Train in Python, then do inference on any device with support for C.

Flower-Recognition - Image recognition tool for flower clasification.


Application of computer vision techniques to get useful data for the machine learning algorithms in order to create a flower classifier. Made by Braulio Vargas López and Marta Gómez Macías for our university subject Computer Vision at University of Granada.

FIFA-World-Cup-Prediction - Predict who will win the FIFA World Cup 2018


Data: Data are assembled from multiple sources, most of them are from Kaggle, others come from FIFA website / EA games. Feature list reflects those factors.

AdaptiveRandomForest - Repository for the AdaptiveRandomForest algorithm implemented in MOA 2016-04


Massive On-line Analysis is an environment for massive data mining. MOA provides a framework for data stream mining and includes tools for evaluation and a collection of machine learning algorithms. Related to the WEKA project, also written in Java, while scaling to more demanding problems.

2018-MachineLearning-Lectures-ESA - Machine Learning Lectures at the European Space Agency (ESA) in 2018


In 2018, The European Space Agency (ESA) organized a series of 6 lectures on Machine Learning at the European Space Operations Centre (ESOC). This repository contains the lectures resources: presentations, notebooks and links to the videos (presentation and hands-on).

reproduce-stock-market-direction-random-forests - Reproduce research from paper "Predicting the direction of stock market prices using random forest"


This is my attemp to reproduce this paper. In my way I found that the results I got are much worse than those from the authors and I wonder if the authors accidentaly had a data leakage issue. Please, let me know if you notice any mistake in the analysis / code or if you feel there is something I misunderstood.