random-forest-classifier - A random forest classifier in Javascript.

  •        56

A random forest classifier. A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.Modeled after scikit-learn's RandomForestClassifier.



async : ^0.9.0
underscore : ^1.6.0



Related Projects

rumale - Rumale is a machine learning library in Ruby

  •    Ruby

Rumale (Ruby machine learning) is a machine learning library in Ruby. Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python. Rumale supports Linear / Kernel Support Vector Machine, Logistic Regression, Linear Regression, Ridge, Lasso, Kernel Ridge, Factorization Machine, Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor classifier, K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, SNN, Power Iteration Clustering, Mutidimensional Scaling, t-SNE, Principal Component Analysis, Kernel PCA and Non-negative Matrix Factorization. This project was formerly known as "SVMKit". If you are using SVMKit, please install Rumale and replace SVMKit constants with Rumale.

Apache Mahout - Scalable machine learning library

  •    Java

Apache Mahout has implementations of a wide range of machine learning and data mining algorithms: clustering, classification, collaborative filtering and frequent pattern mining.

useR-machine-learning-tutorial - useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016

  •    Jupyter

Instructions for how to install the necessary software for this tutorial is available here. Data for the tutorial can be downloaded by running ./data/get-data.sh (requires wget). Certain algorithms don't scale well when there are millions of features. For example, decision trees require computing some sort of metric (to determine the splits) on all the feature values (or some fraction of the values as in Random Forest and Stochastic GBM). Therefore, computation time is linear in the number of features. Other algorithms, such as GLM, scale much better to high-dimensional (n << p) and wide data with appropriate regularization (e.g. Lasso, Elastic Net, Ridge).

Hyperparameter-Optimization-of-Machine-Learning-Algorithms - Implementation of hyperparameter optimization/tuning methods for machine learning & deep learning models (easy&clear)

  •    Jupyter

This code provides a hyper-parameter optimization implementation for machine learning algorithms, as described in the paper: L. Yang and A. Shami, “On hyperparameter optimization of machine learning algorithms: Theory and practice,” Neurocomputing, vol. 415, pp. 295–316, 2020, doi: https://doi.org/10.1016/j.neucom.2020.07.061. To fit a machine learning model into different problems, its hyper-parameters must be tuned. Selecting the best hyper-parameter configuration for machine learning models has a direct impact on the model's performance. In this paper, optimizing the hyper-parameters of common machine learning models is studied. We introduce several state-of-the-art optimization techniques and discuss how to apply them to machine learning algorithms. Many available libraries and frameworks developed for hyper-parameter optimization problems are provided, and some open challenges of hyper-parameter optimization research are also discussed in this paper. Moreover, experiments are conducted on benchmark datasets to compare the performance of different optimization methods and provide practical examples of hyper-parameter optimization.

awesome-random-forest - Random Forest - a curated list of resources regarding random forest


Random Forest - a curated list of resources regarding tree-based methods and more, including but not limited to random forest, bagging and boosting. Please feel free to pull requests, email Jung Kwon Lee (deruci@snu.ac.kr) or join our chats to add links.

grt - gesture recognition toolkit

  •    C++

The Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, C++ machine learning library designed for real-time gesture recognition. Classification: Adaboost, Decision Tree, Dynamic Time Warping, Gaussian Mixture Models, Hidden Markov Models, k-nearest neighbor, Naive Bayes, Random Forests, Support Vector Machine, Softmax, and more...

tpot - A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming

  •    Python

Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines to find the best one for your data.

benchm-ml - A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc

  •    R

This project aims at a minimal benchmark for scalability, speed and accuracy of commonly used implementations of a few machine learning algorithms. The target of this study is binary classification with numeric and categorical inputs (of limited cardinality i.e. not very sparse) and no missing data, perhaps the most common problem in business applications (e.g. credit scoring, fraud detection or churn prediction). If the input matrix is of n x p, n is varied as 10K, 100K, 1M, 10M, while p is ~1K (after expanding the categoricals into dummy variables/one-hot encoding). This particular type of data structure/size (the largest) stems from this author's interest in some particular business applications. Note: While a large part of this benchmark was done in Spring 2015 reflecting the state of ML implementations at that time, this repo is being updated if I see significant changes in implementations or new implementations have become widely available (e.g. lightgbm). Also, please find a summary of the progress and learnings from this benchmark at the end of this repo.

elasticsearch-learning-to-rank - Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch

  •    Java

Rank Elasticsearch results using tree based (LambdaMART, Random Forest, MART) and linear models. Models are trained using the scores of Elasicsearch queries as features. You train offline using tooling such as with xgboost or ranklib. You then POST your model to a to Elasticsearch in a specific text format (the custom "ranklib" language, documented here). You apply a model using this plugin's ltr query. See blog post and the full demo (training and searching).Models are stored using an Elasticsearch script plugin. Tree-based models can be large. So we recommend increasing the script.max_size_in_bytes setting. Don't worry, just because tree-based models are verbose, doesn't nescesarilly imply they'll be slow.

NowTrade - Algorithmic trading library with a focus on creating powerful strategies

  •    Python

NowTrade is an algorithmic trading library with a focus on creating powerful strategies using easily-readable and simple Python code. With the help of NowTrade, full blown stock/currency trading strategies, harnessing the power of machine learning, can be implemented with few lines of code. NowTrade strategies are not event driven like most other algorithmic trading libraries available. The strategies are implemented in a sequential manner (one line at a time) without worrying about events, callbacks, or object overloading.

lime - Lime: Explaining the predictions of any machine learning classifier

  •    Javascript

Our plan is to add more packages that help users understand and interact meaningfully with machine learning. Lime is able to explain any black box classifier, with two or more classes. All we require is that the classifier implements a function that takes in raw text or a numpy array and outputs a probability for each class. Support for scikit-learn classifiers is built-in.

simple_bayes - A Naive Bayes machine learning implementation in Elixir.

  •    Elixir

A Naive Bayes machine learning implementation in Elixir. In machine learning, naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features.

classifier - [UNMAINTAINED] Bayesian classifier with Redis backend

  •    Javascript

Deprecation notice: This library is no longer actively maintained. Try the natural classifier. It doesn't have a Redis backend, but otherwise works even better. The first argument to train() can be a string of text or an array of words, the second argument can be any category name you want.

ranger - A Fast Implementation of Random Forests

  •    C++

ranger is a fast implementation of random forests (Breiman 2001) or recursive partitioning, particularly suited for high dimensional data. Classification, regression, and survival forests are supported. Classification and regression forests are implemented as in the original Random Forest (Breiman 2001), survival forests as in Random Survival Forests (Ishwaran et al. 2008). Includes implementations of extremely randomized trees (Geurts et al. 2006) and quantile regression forests (Meinshausen 2006). ranger is written in C++, but a version for R is available, too. We recommend to use the R version. It is easy to install and use and the results are readily available for further analysis. The R version is as fast as the standalone C++ version.

pattern - Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization

  •    Python

It is well documented, thoroughly tested with 350+ unit tests and comes bundled with 50+ examples. The source code is licensed under BSD and available from http://www.clips.ua.ac.be/pages/pattern. This example trains a classifier on adjectives mined from Twitter using Python 3. First, tweets that contain hashtag #win or #fail are collected. For example: "$20 tip off a sweet little old lady today #win". The word part-of-speech tags are then parsed, keeping only adjectives. Each tweet is transformed to a vector, a dictionary of adjective → count items, labeled WIN or FAIL. The classifier uses the vectors to learn which other tweets look more like WIN or more like FAIL.

twss.js - A node.js "that's what she said" classifier

  •    Javascript

This is a node.js module that classifies if a sentence can be replied with "that's what she said". You change algorithm from the default naive bayes classifier (nbc) to a k-nearest neighbor algorithm (knn).

pyquil - A Python library for quantum programming using Quil.

  •    Python

A library for easily generating Quil programs to be executed using the Rigetti Forest platform. pyQuil is licensed under the Apache 2.0 license. pyQuil can be used to build and manipulate Quil programs without restriction. However, to run programs (e.g., to get wavefunctions, get multishot experiment data), you will need an API key for Rigetti Forest. This will allow you to run your programs on the Rigetti Quantum Virtual Machine (QVM) or on a real quantum processor (QPU).