Displaying 1 to 20 from 28 results

tpot - A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming

  •    Python

Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines to find the best one for your data.

benchm-ml - A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc

  •    R

This project aims at a minimal benchmark for scalability, speed and accuracy of commonly used implementations of a few machine learning algorithms. The target of this study is binary classification with numeric and categorical inputs (of limited cardinality i.e. not very sparse) and no missing data, perhaps the most common problem in business applications (e.g. credit scoring, fraud detection or churn prediction). If the input matrix is of n x p, n is varied as 10K, 100K, 1M, 10M, while p is ~1K (after expanding the categoricals into dummy variables/one-hot encoding). This particular type of data structure/size (the largest) stems from this author's interest in some particular business applications. Note: While a large part of this benchmark was done in Spring 2015 reflecting the state of ML implementations at that time, this repo is being updated if I see significant changes in implementations or new implementations have become widely available (e.g. lightgbm). Also, please find a summary of the progress and learnings from this benchmark at the end of this repo.




grt - gesture recognition toolkit

  •    C++

The Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, C++ machine learning library designed for real-time gesture recognition. Classification: Adaboost, Decision Tree, Dynamic Time Warping, Gaussian Mixture Models, Hidden Markov Models, k-nearest neighbor, Naive Bayes, Random Forests, Support Vector Machine, Softmax, and more...

useR-machine-learning-tutorial - useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016

  •    Jupyter

Instructions for how to install the necessary software for this tutorial is available here. Data for the tutorial can be downloaded by running ./data/get-data.sh (requires wget). Certain algorithms don't scale well when there are millions of features. For example, decision trees require computing some sort of metric (to determine the splits) on all the feature values (or some fraction of the values as in Random Forest and Stochastic GBM). Therefore, computation time is linear in the number of features. Other algorithms, such as GLM, scale much better to high-dimensional (n << p) and wide data with appropriate regularization (e.g. Lasso, Elastic Net, Ridge).

scoruby - Ruby Scoring API for PMML

  •    Ruby

Ruby scoring API for Predictive Model Markup Language (PMML).Currently supports Decision Tree, Random Forest Naive Bayes and Gradient Boosted Models.


random-forest-classifier - A random forest classifier in Javascript.

  •    Javascript

A random forest classifier. A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.Modeled after scikit-learn's RandomForestClassifier.

decision-tree-js - Small JavaScript implementation of ID3 Decision tree

  •    Javascript

Small JavaScript implementation of algorithm for training Decision Tree and Random Forest classifiers.

edarf - exploratory data analysis using random forests

  •    R

Functions useful for exploratory data analysis using random forests. This package extends the functionality of random forests fit by party (multivariate, regression, and classification), randomForestSRC (regression and classification,), randomForest (regression and classification), and ranger (classification and regression).

receiptdID - Receipt

  •    Jupyter

Receipt.ID is a multi-label, multi-class, hierarchical classification system. It trains individual Random Forest text-based classifiers and combines the result with other features. Receipt.ID is built to scale with an application as the taxonomy for the domain in which it is applied grows. The data preprocessing code is provided in the notebook receiptID_1_Data_Preprocessing.ipynb. While the modeling code is provided in the notebook receiptID_2_Model.ipynb.

cl-random-forest - Random forest in Common Lisp

  •    Common

Cl-random-forest is a implementation of Random Forest for multiclass classification and univariate regression written in Common Lisp. It also includes a implementation of Global Refinement of Random Forest (Ren, Cao, Wei and Sun. “Global Refinement of Random Forest” CVPR2015). This refinement makes faster and more accurate than standard Random Forest. A dataset consists of a target vector and a input data matrix. For classification, the target vector should be a fixnum simple-vector and the data matrix should be a 2-dimensional double-float array whose row corresponds one datum. Note that the target is a integer starting from 0. For example, the following dataset is valid for 4-class classification with 2-dimensional input.

infiniteboost - InfiniteBoost: building infinite ensembles with gradient descent

  •    Jupyter

InfiniteBoost is an approach to building ensembles which combines best sides of random forest and gradient boosting. Trees in the ensemble encounter mistakes done by previous trees (as in gradient boosting), but due to modified scheme of encountering contributions the ensemble converges to the limit, thus avoiding overfitting (just as random forest).

emtrees - Tree-based machine learning classifiers for embedded systems

  •    Python

Tree-based machine learning classifiers for microcontroller and embedded systems. Train in Python, then do inference on any device with support for C.

Flower-Recognition - Image recognition tool for flower clasification.

  •    Python

Application of computer vision techniques to get useful data for the machine learning algorithms in order to create a flower classifier. Made by Braulio Vargas López and Marta Gómez Macías for our university subject Computer Vision at University of Granada.

FIFA-World-Cup-Prediction - Predict who will win the FIFA World Cup 2018

  •    Jupyter

Data: Data are assembled from multiple sources, most of them are from Kaggle, others come from FIFA website / EA games. Feature list reflects those factors.