Displaying 1 to 20 from 33 results

tpot - A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming

  •    Python

Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines to find the best one for your data.

benchm-ml - A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc

  •    R

This project aims at a minimal benchmark for scalability, speed and accuracy of commonly used implementations of a few machine learning algorithms. The target of this study is binary classification with numeric and categorical inputs (of limited cardinality i.e. not very sparse) and no missing data, perhaps the most common problem in business applications (e.g. credit scoring, fraud detection or churn prediction). If the input matrix is of n x p, n is varied as 10K, 100K, 1M, 10M, while p is ~1K (after expanding the categoricals into dummy variables/one-hot encoding). This particular type of data structure/size (the largest) stems from this author's interest in some particular business applications. Note: While a large part of this benchmark was done in Spring 2015 reflecting the state of ML implementations at that time, this repo is being updated if I see significant changes in implementations or new implementations have become widely available (e.g. lightgbm). Also, please find a summary of the progress and learnings from this benchmark at the end of this repo.

grt - gesture recognition toolkit

  •    C++

The Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, C++ machine learning library designed for real-time gesture recognition. Classification: Adaboost, Decision Tree, Dynamic Time Warping, Gaussian Mixture Models, Hidden Markov Models, k-nearest neighbor, Naive Bayes, Random Forests, Support Vector Machine, Softmax, and more...

useR-machine-learning-tutorial - useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016

  •    Jupyter

Instructions for how to install the necessary software for this tutorial is available here. Data for the tutorial can be downloaded by running ./data/get-data.sh (requires wget). Certain algorithms don't scale well when there are millions of features. For example, decision trees require computing some sort of metric (to determine the splits) on all the feature values (or some fraction of the values as in Random Forest and Stochastic GBM). Therefore, computation time is linear in the number of features. Other algorithms, such as GLM, scale much better to high-dimensional (n << p) and wide data with appropriate regularization (e.g. Lasso, Elastic Net, Ridge).

NowTrade - Algorithmic trading library with a focus on creating powerful strategies

  •    Python

NowTrade is an algorithmic trading library with a focus on creating powerful strategies using easily-readable and simple Python code. With the help of NowTrade, full blown stock/currency trading strategies, harnessing the power of machine learning, can be implemented with few lines of code. NowTrade strategies are not event driven like most other algorithmic trading libraries available. The strategies are implemented in a sequential manner (one line at a time) without worrying about events, callbacks, or object overloading.

decision-forests - A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras

  •    Python

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models and supports classification, regression and ranking. TF-DF is a TensorFlow wrapper around the Yggdrasil Decision Forests C++ libraries. Models trained with TF-DF are compatible with Yggdrasil Decision Forests' models, and vice versa.

scoruby - Ruby Scoring API for PMML

  •    Ruby

Ruby scoring API for Predictive Model Markup Language (PMML).Currently supports Decision Tree, Random Forest Naive Bayes and Gradient Boosted Models.

random-forest-classifier - A random forest classifier in Javascript.

  •    Javascript

A random forest classifier. A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.Modeled after scikit-learn's RandomForestClassifier.

decision-tree-js - Small JavaScript implementation of ID3 Decision tree

  •    Javascript

Small JavaScript implementation of algorithm for training Decision Tree and Random Forest classifiers.

edarf - exploratory data analysis using random forests

  •    R

Functions useful for exploratory data analysis using random forests. This package extends the functionality of random forests fit by party (multivariate, regression, and classification), randomForestSRC (regression and classification,), randomForest (regression and classification), and ranger (classification and regression).

receiptdID - Receipt

  •    Jupyter

Receipt.ID is a multi-label, multi-class, hierarchical classification system. It trains individual Random Forest text-based classifiers and combines the result with other features. Receipt.ID is built to scale with an application as the taxonomy for the domain in which it is applied grows. The data preprocessing code is provided in the notebook receiptID_1_Data_Preprocessing.ipynb. While the modeling code is provided in the notebook receiptID_2_Model.ipynb.

cl-random-forest - Random forest in Common Lisp

  •    Common

Cl-random-forest is a implementation of Random Forest for multiclass classification and univariate regression written in Common Lisp. It also includes a implementation of Global Refinement of Random Forest (Ren, Cao, Wei and Sun. “Global Refinement of Random Forest” CVPR2015). This refinement makes faster and more accurate than standard Random Forest. A dataset consists of a target vector and a input data matrix. For classification, the target vector should be a fixnum simple-vector and the data matrix should be a 2-dimensional double-float array whose row corresponds one datum. Note that the target is a integer starting from 0. For example, the following dataset is valid for 4-class classification with 2-dimensional input.

infiniteboost - InfiniteBoost: building infinite ensembles with gradient descent

  •    Jupyter

InfiniteBoost is an approach to building ensembles which combines best sides of random forest and gradient boosting. Trees in the ensemble encounter mistakes done by previous trees (as in gradient boosting), but due to modified scheme of encountering contributions the ensemble converges to the limit, thus avoiding overfitting (just as random forest).

emtrees - Tree-based machine learning classifiers for embedded systems

  •    Python

Tree-based machine learning classifiers for microcontroller and embedded systems. Train in Python, then do inference on any device with support for C.

YouTube-Like-predictor - YouTube Like Count Predictions using Machine Learning

  •    Jupyter

This a tool for getting youtube video like count prediction.A Random Forest model was used for training on a large dataset of ~3,50,000 videos.Feature engineering,Data cleaning, Data selection and many other techniques were used for this task. Report.pdf contains a detailed explanation of different steps and techniques that were used for this task.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.