Related Projects

fastFM - fastFM: A Library for Factorization Machines

  •    Python

The library fastFM is an academic project. The time and resources spent developing fastFM are therefore justified by the number of citations of the software. If you publish scientific articles using fastFM, please cite the following article (bibtex entry citation.bib). This repository allows you to use Factorization Machines in Python (2.7 & 3.x) with the well known scikit-learn API. All performance critical code as been written in C and wrapped with Cython. fastFM provides stochastic gradient descent (SGD) and coordinate descent (CD) optimization routines as well as Markov Chain Monte Carlo (MCMC) for Bayesian inference. The solvers can be used for regression, classification and ranking problems. Detailed usage instructions can be found in the online documentation and on arXiv.

rumale - Rumale is a machine learning library in Ruby

  •    Ruby

Rumale (Ruby machine learning) is a machine learning library in Ruby. Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python. Rumale supports Linear / Kernel Support Vector Machine, Logistic Regression, Linear Regression, Ridge, Lasso, Kernel Ridge, Factorization Machine, Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor classifier, K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, SNN, Power Iteration Clustering, Mutidimensional Scaling, t-SNE, Principal Component Analysis, Kernel PCA and Non-negative Matrix Factorization. This project was formerly known as "SVMKit". If you are using SVMKit, please install Rumale and replace SVMKit constants with Rumale.

lrslibrary - Low-Rank and Sparse Tools for Background Modeling and Subtraction in Videos

  •    Matlab

Low-Rank and Sparse tools for Background Modeling and Subtraction in Videos. The LRSLibrary provides a collection of low-rank and sparse decomposition algorithms in MATLAB. The library was designed for motion segmentation in videos, but it can be also used (or adapted) for other computer vision problems (for more information, please see this page). Currently the LRSLibrary offers more than 100 algorithms based on matrix and tensor methods. The LRSLibrary was tested successfully in several MATLAB versions (e.g. R2014, R2015, R2016, R2017, on both x86 and x64 versions). It requires minimum R2014b.

lightfm - A Python implementation of LightFM, a hybrid recommendation algorithm.

  •    Python

LightFM is a Python implementation of a number of popular recommendation algorithms for both implicit and explicit feedback, including efficient implementation of BPR and WARP ranking losses. It's easy to use, fast (via multithreaded model estimation), and produces high quality results. It also makes it possible to incorporate both item and user metadata into the traditional matrix factorization algorithms. It represents each user and item as the sum of the latent representations of their features, thus allowing recommendations to generalise to new items (via item features) and to new users (via user features).

fmin - Unconstrained function minimization in Javascript

  •    Javascript

Unconstrained function minimization in javascript. This package implements some basic numerical optimization algorithms: Nelder-Mead, Gradient Descent, Wolf Line Search and Non-Linear Conjugate Gradient methods are all provided.

nimfa - Nimfa: Nonnegative matrix factorization in Python

  •    Python

Nimfa is a Python module that implements many algorithms for nonnegative matrix factorization. Nimfa is distributed under the BSD license. The project was started in 2011 by Marinka Zitnik as a Google Summer of Code project, and since then many volunteers have contributed. See AUTHORS file for a complete list of contributors.

xlearn - High performance, easy-to-use, and scalable ML package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and command line interface

  •    C++

xLearn is a high performance, easy-to-use, and scalable machine learning package, which can be used to solve large-scale machine learning problems. xLearn is especially useful for solving machine learning problems on large-scale sparse data, which is very common in Internet services such as online advertisement and recommender systems in recent years. If you are the user of liblinear, libfm, or libffm, now xLearn is your another better choice. xLearn is developed with high-performance C++ code with careful design and optimizations. Our system is designed to maximize CPU and memory utilization, provide cache-aware computation, and support lock-free learning. By combining these insights, xLearn is 5x-13x faster compared to similar systems.

dist-keras - Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark

  •    Python

Distributed Deep Learning with Apache Spark and Keras. Distributed Keras is a distributed deep learning framework built op top of Apache Spark and Keras, with a focus on "state-of-the-art" distributed optimization algorithms. We designed the framework in such a way that a new distributed optimizer could be implemented with ease, thus enabling a person to focus on research. Several distributed methods are supported, such as, but not restricted to, the training of ensembles and models using data parallel methods.

librec - LibRec: A Leading Java Library for Recommender Systems, see

  •    Java

LibRec ( is a Java library for recommender systems (Java version 1.7 or higher required). It implements a suit of state-of-the-art recommendation algorithms, aiming to resolve two classic recommendation tasks: rating prediction and item ranking. A movie recommender system is designed and available here.

differentiable-plasticity - Implementations of the algorithms described in Differentiable plasticity: training plastic networks with gradient descent, a research paper from Uber AI Labs

  •    Python

This repo contains implementations of the algorithms described in Differentiable plasticity: training plastic networks with gradient descent, a research paper from Uber AI Labs. We strongly recommend studying the simple/ program first, as it is deliberately kept as simple as possible while showing full-fledged differentiable plasticity learning.

useR-machine-learning-tutorial - useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016

  •    Jupyter

Instructions for how to install the necessary software for this tutorial is available here. Data for the tutorial can be downloaded by running ./data/ (requires wget). Certain algorithms don't scale well when there are millions of features. For example, decision trees require computing some sort of metric (to determine the splits) on all the feature values (or some fraction of the values as in Random Forest and Stochastic GBM). Therefore, computation time is linear in the number of features. Other algorithms, such as GLM, scale much better to high-dimensional (n << p) and wide data with appropriate regularization (e.g. Lasso, Elastic Net, Ridge).

spotlight - Deep recommender models using PyTorch.

  •    Python

Spotlight uses PyTorch to build both deep and shallow recommender models. By providing both a slew of building blocks for loss functions (various pointwise and pairwise ranking losses), representations (shallow factorization representations, deep sequence models), and utilities for fetching (or generating) recommendation datasets, it aims to be a tool for rapid exploration and prototyping of new recommender models. See the full documentation for details.

NakedTensor - Bare bone examples of machine learning in TensorFlow

  •    Python

This is a bare bones example of TensorFlow, a machine learning package published by Google. You will not find a simpler introduction to it. In each example, a straight line is fit to some data. Values for the slope and y-intercept of the line that best fit the data are determined using gradient descent. If you do not know about gradient descent, check out the Wikipedia page.

benchm-ml - A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc

  •    R

This project aims at a minimal benchmark for scalability, speed and accuracy of commonly used implementations of a few machine learning algorithms. The target of this study is binary classification with numeric and categorical inputs (of limited cardinality i.e. not very sparse) and no missing data, perhaps the most common problem in business applications (e.g. credit scoring, fraud detection or churn prediction). If the input matrix is of n x p, n is varied as 10K, 100K, 1M, 10M, while p is ~1K (after expanding the categoricals into dummy variables/one-hot encoding). This particular type of data structure/size (the largest) stems from this author's interest in some particular business applications. Note: While a large part of this benchmark was done in Spring 2015 reflecting the state of ML implementations at that time, this repo is being updated if I see significant changes in implementations or new implementations have become widely available (e.g. lightgbm). Also, please find a summary of the progress and learnings from this benchmark at the end of this repo.

Dclib - Portable C++ library

  •    C++

dlib is a library for developing portable applications dealing with networking, threads, graphical interfaces, data structures, linear algebra, machine learning, XML and text parsing, numerical optimization, Bayesian nets, data compression routines, linked lists, binary search trees, linear algebra and matrix utilities, machine learning algorithms, and many other general utilities.

grokking-pytorch - The Hitchiker's Guide to PyTorch


PyTorch is a flexible deep learning framework that allows automatic differentiation through dynamic neural networks (i.e., networks that utilise dynamic control flow like if statements and while loops). It supports GPU acceleration, distributed training, various optimisations, and plenty more neat features. These are some notes on how I think about using PyTorch, and don't encompass all parts of the library or every best practice, but may be helpful to others. Neural networks are a subclass of computation graphs. Computation graphs receive input data, and data is routed to and possibly transformed by nodes which perform processing on the data. In deep learning, the neurons (nodes) in neural networks typically transform data with parameters and differentiable functions, such that the parameters can be optimised to minimise a loss via gradient descent. More broadly, the functions can be stochastic, and the structure of the graph can be dynamic. So while neural networks may be a good fit for dataflow programming, PyTorch's API has instead centred around imperative programming, which is a more common way for thinking about programs. This makes it easier to read code and reason about complex programs, without necessarily sacrificing much performance; PyTorch is actually pretty fast, with plenty of optimisations that you can safely forget about as an end user (but you can dig in if you really want to).