Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines to find the best one for your data.
machine-learning data-science automl automation scikit-learn hyperparameter-optimization model-selection parameter-tuning automated-machine-learning random-forest gradient-boosting feature-engineering xgboost genetic-programmingThis project aims at a minimal benchmark for scalability, speed and accuracy of commonly used implementations of a few machine learning algorithms. The target of this study is binary classification with numeric and categorical inputs (of limited cardinality i.e. not very sparse) and no missing data, perhaps the most common problem in business applications (e.g. credit scoring, fraud detection or churn prediction). If the input matrix is of n x p, n is varied as 10K, 100K, 1M, 10M, while p is ~1K (after expanding the categoricals into dummy variables/one-hot encoding). This particular type of data structure/size (the largest) stems from this author's interest in some particular business applications. Note: While a large part of this benchmark was done in Spring 2015 reflecting the state of ML implementations at that time, this repo is being updated if I see significant changes in implementations or new implementations have become widely available (e.g. lightgbm). Also, please find a summary of the progress and learnings from this benchmark at the end of this repo.
machine-learning data-science r gradient-boosting-machine random-forest deep-learning xgboost h2o sparkPython codes for common Machine Learning Algorithms
linear-regression polynomial-regression logistic-regression decision-trees random-forest svm svr knn-classification naive-bayes-classifier kmeans-clustering hierarchical-clustering pca lda xgboost-algorithmThe Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, C++ machine learning library designed for real-time gesture recognition. Classification: Adaboost, Decision Tree, Dynamic Time Warping, Gaussian Mixture Models, Hidden Markov Models, k-nearest neighbor, Naive Bayes, Random Forests, Support Vector Machine, Softmax, and more...
gesture-recognition grt machine-learning gesture-recognition-toolkit support-vector-machine random-forest kmeans dynamic-time-warping softmax linear-regressionPractice and tutorial-style notebooks covering wide variety of machine learning techniques
numpy statistics pandas matplotlib regression scikit-learn classification principal-component-analysis clustering decision-trees random-forest dimensionality-reduction neural-network deep-learning artificial-intelligence data-science machine-learning k-nearest-neighbours naive-bayesInstructions for how to install the necessary software for this tutorial is available here. Data for the tutorial can be downloaded by running ./data/get-data.sh (requires wget). Certain algorithms don't scale well when there are millions of features. For example, decision trees require computing some sort of metric (to determine the splits) on all the feature values (or some fraction of the values as in Random Forest and Stochastic GBM). Therefore, computation time is linear in the number of features. Other algorithms, such as GLM, scale much better to high-dimensional (n << p) and wide data with appropriate regularization (e.g. Lasso, Elastic Net, Ridge).
machine-learning deep-learning random-forest gradient-boosting-machine tutorial data-science ensemble-learning rNowTrade is an algorithmic trading library with a focus on creating powerful strategies using easily-readable and simple Python code. With the help of NowTrade, full blown stock/currency trading strategies, harnessing the power of machine learning, can be implemented with few lines of code. NowTrade strategies are not event driven like most other algorithmic trading libraries available. The strategies are implemented in a sequential manner (one line at a time) without worrying about events, callbacks, or object overloading.
trading technical-indicators neural-network random-forest stock currency algorithmic-trading-library machine-learning algorithmic-tradingI just built out v2 of this project that now gives you analytics info from your models, and is production-ready. machineJS is an amazing research project that clearly proved there's a hunger for automated machine learning. auto_ml tackles this exact same goal, but with more features, cleaner code, and the ability to be copy/pasted into production.
machine-learning data-science machine-learning-library machine-learning-algorithms ml data-scientists javascript-library scikit-learn kaggle numerai automated-machine-learning automl auto-ml neuralnet neural-network algorithms random-forest svm naive-bayes bagging optimization brainjs date-night sklearn ensemble data-formatting js xgboost scikit-neuralnetwork knn k-nearest-neighbors gridsearch gridsearchcv grid-search randomizedsearchcv preprocessing data-formatter kaggle-competitionTensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models and supports classification, regression and ranking. TF-DF is a TensorFlow wrapper around the Yggdrasil Decision Forests C++ libraries. Models trained with TF-DF are compatible with Yggdrasil Decision Forests' models, and vice versa.
machine-learning random-forest tensorflow keras ml decision-trees gradient-boosting interpretability decision-forestRuby scoring API for Predictive Model Markup Language (PMML).Currently supports Decision Tree, Random Forest Naive Bayes and Gradient Boosted Models.
ruby-gem pmml random-forest classification rubyml machine-learning gradient-boosting-classifier gbm gradient-boosted-models decision-tree naive-bayesA random forest classifier. A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.Modeled after scikit-learn's RandomForestClassifier.
random-forest machine-learning classifierSmall JavaScript implementation of algorithm for training Decision Tree and Random Forest classifiers.
decision-tree random-forest machine-learningFunctions useful for exploratory data analysis using random forests. This package extends the functionality of random forests fit by party (multivariate, regression, and classification), randomForestSRC (regression and classification,), randomForest (regression and classification), and ranger (classification and regression).
random-forest r rstats exploratory-data-analysis machine-learningReceipt.ID is a multi-label, multi-class, hierarchical classification system. It trains individual Random Forest text-based classifiers and combines the result with other features. Receipt.ID is built to scale with an application as the taxonomy for the domain in which it is applied grows. The data preprocessing code is provided in the notebook receiptID_1_Data_Preprocessing.ipynb. While the modeling code is provided in the notebook receiptID_2_Model.ipynb.
machine-learning random-forest word2vecCl-random-forest is a implementation of Random Forest for multiclass classification and univariate regression written in Common Lisp. It also includes a implementation of Global Refinement of Random Forest (Ren, Cao, Wei and Sun. “Global Refinement of Random Forest” CVPR2015). This refinement makes faster and more accurate than standard Random Forest. A dataset consists of a target vector and a input data matrix. For classification, the target vector should be a fixnum simple-vector and the data matrix should be a 2-dimensional double-float array whose row corresponds one datum. Note that the target is a integer starting from 0. For example, the following dataset is valid for 4-class classification with 2-dimensional input.
random-forest machine-learning common-lisp classifier regressionThis is the repository for D-Lab’s Introduction to Machine Learning in R workshop.
machine-learning dlab-berkeley tutorial knn random-forest gradient-boosting-machine superlearner decision-treesInfiniteBoost is an approach to building ensembles which combines best sides of random forest and gradient boosting. Trees in the ensemble encounter mistakes done by previous trees (as in gradient boosting), but due to modified scheme of encountering contributions the ensemble converges to the limit, thus avoiding overfitting (just as random forest).
machine-learning research gradient-boosting random-forest experimentsTree-based machine learning classifiers for microcontroller and embedded systems. Train in Python, then do inference on any device with support for C.
machine-learning random-forest classifier embedded-systems microcontroller scikit-learnThis a tool for getting youtube video like count prediction.A Random Forest model was used for training on a large dataset of ~3,50,000 videos.Feature engineering,Data cleaning, Data selection and many other techniques were used for this task. Report.pdf contains a detailed explanation of different steps and techniques that were used for this task.
machine-learning predictive-analysis youtube-api random-forest visualization data-science data-analysisGo-mining is a small library for data mining. The library is written in Go language.
mining smote data-mining random-forest data-mining-algorithms ln-smote knn cart cascaded-random-forest
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.