LightGBM - A fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks

  •        289

For more details, please refer to Features.Experiments on public datasets show that LightGBM can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. What's more, the experiments show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings.

https://github.com/Microsoft/LightGBM

Tags
Implementation
License
Platform

   




Related Projects

xgboost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more

  •    C++

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. The same code runs on major distributed environment (Hadoop, SGE, MPI) and can solve problems beyond billions of examples.XGBoost has been developed and used by a group of active community members. Your help is very valuable to make the package better for everyone.

DMTK - Microsoft Distributed Machine Learning Toolkit

  •    

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

auto_ml - Automated machine learning for analytics & production

  •    Python

auto_ml is designed for production. Here's an example that includes serializing and loading the trained model, then getting predictions on single dictionaries, roughly the process you'd likely follow to deploy the trained model. All of these projects are ready for production. These projects all have prediction time in the 1 millisecond range for a single prediction, and are able to be serialized to disk and loaded into a new environment after training.


benchm-ml - A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc

  •    R

This project aims at a minimal benchmark for scalability, speed and accuracy of commonly used implementations of a few machine learning algorithms. The target of this study is binary classification with numeric and categorical inputs (of limited cardinality i.e. not very sparse) and no missing data, perhaps the most common problem in business applications (e.g. credit scoring, fraud detection or churn prediction). If the input matrix is of n x p, n is varied as 10K, 100K, 1M, 10M, while p is ~1K (after expanding the categoricals into dummy variables/one-hot encoding). This particular type of data structure/size (the largest) stems from this author's interest in some particular business applications. Note: While a large part of this benchmark was done in Spring 2015 reflecting the state of ML implementations at that time, this repo is being updated if I see significant changes in implementations or new implementations have become widely available (e.g. lightgbm). Also, please find a summary of the progress and learnings from this benchmark at the end of this repo.

open-solution-home-credit - Open solution to the Home Credit Default Risk challenge :house_with_garden:

  •    Python

This is an open solution to the Home Credit Default Risk challenge 🏡. In this open source solution you will find references to the neptune.ml. It is free platform for community Users, which we use daily to keep track of our experiments. Please note that using neptune.ml is not necessary to proceed with this solution. You may run it as plain Python script 🐍.

lightgbm-doc-zh - LightGBM 中文文档

  •    

更多有关 LightGBM 特性的详情, 请参阅: LightGBM 特性. 如果想要加入我们, 请参阅: http://www.apachecn.org/organization/209.html. 欢迎各位爱装逼的大佬们.

astroML - Machine learning, statistics, and data mining for astronomy and astrophysics

  •    Python

AstroML is a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, and matplotlib, and distributed under the BSD license. It contains a growing library of statistical and machine learning routines for analyzing astronomical data in python, loaders for several open astronomical datasets, and a large suite of examples of analyzing and visualizing astronomical datasets. This project was started in 2012 by Jake VanderPlas to accompany the book Statistics, Data Mining, and Machine Learning in Astronomy by Zeljko Ivezic, Andrew Connolly, Jacob VanderPlas, and Alex Gray.

Jubatus - Framework and Library for Distributed Online Machine Learning

  •    C++

Jubatus is a distributed processing framework and streaming machine learning library. Jubatus includes these functionalities: Online Machine Learning Library: Classification, Regression, Recommendation (Nearest Neighbor Search), Graph Mining, Anomaly Detection, Clustering, Feature Vector Converter (fv_converter): Data Preprocess and Feature Extraction, Framework for Distributed Online Machine Learning with Fault Tolerance.

useR-machine-learning-tutorial - useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016

  •    Jupyter

Instructions for how to install the necessary software for this tutorial is available here. Data for the tutorial can be downloaded by running ./data/get-data.sh (requires wget). Certain algorithms don't scale well when there are millions of features. For example, decision trees require computing some sort of metric (to determine the splits) on all the feature values (or some fraction of the values as in Random Forest and Stochastic GBM). Therefore, computation time is linear in the number of features. Other algorithms, such as GLM, scale much better to high-dimensional (n << p) and wide data with appropriate regularization (e.g. Lasso, Elastic Net, Ridge).

Deep-Learning-Boot-Camp - A community run, 5-day PyTorch Deep Learning Bootcamp

  •    Jupyter

Tel-Aviv Deep Learning Bootcamp is an intensive (and free!) 5-day program intended to teach you all about deep learning. It is nonprofit focused on advancing data science education and fostering entrepreneurship. The Bootcamp is a prominent venue for graduate students, researchers, and data science professionals. It offers a chance to study the essential and innovative aspects of deep learning. Participation is via a donation to the A.L.S ASSOCIATION for promoting research of the Amyotrophic Lateral Sclerosis (ALS) disease.

Apache Mahout - Scalable machine learning library

  •    Java

Apache Mahout has implementations of a wide range of machine learning and data mining algorithms: clustering, classification, collaborative filtering and frequent pattern mining.

MMLSpark - Microsoft Machine Learning for Apache Spark

  •    Scala

MMLSpark provides a number of deep learning and data science tools for Apache Spark, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK) and OpenCV, enabling you to quickly create powerful, highly-scalable predictive and analytical models for large image and text datasets.MMLSpark requires Scala 2.11, Spark 2.1+, and either Python 2.7 or Python 3.5+. See the API documentation for Scala and for PySpark.

incubator-mxnet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

  •    C++

Apache MXNet (incubating) is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. MXNet is portable and lightweight, scaling effectively to multiple GPUs and multiple machines.MXNet is also more than a deep learning project. It is also a collection of blue prints and guidelines for building deep learning systems, and interesting insights of DL systems for hackers.

Scikit Learn - Machine Learning in Python

  •    Python

scikit-learn is a Python module for machine learning built on top of SciPy. It is simple and efficient tools for data mining and data analysis. It supports automatic classification, clustering, model selection, pre processing and lot more.

mahout - Mirror of Apache Mahout

  •    Java

Mahout's goal is to build scalable machine learning libraries. With scalable we mean: Scalable to reasonably large data sets. Our core algorithms for clustering, classification and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm. However we do not restrict contributions to Hadoop based implementations: Contributions that run on a single node or on a non-Hadoop cluster are welcome as well. The core libraries are highly optimized to allow for good performance also for non-distributed algorithms. Scalable to support your business case. Mahout is distributed under a commercially friendly Apache Software license. Scalable community. The goal of Mahout is to build a vibrant, responsive, diverse community to facilitate discussions not only on the project itself but also on potential use cases. Come to the mailing lists to find out more. Currently Mahout supports mainly four use cases: Recommendation mining takes users' behavior and from that tries to find items users might like. Clustering takes e.g. text documents and groups them into groups of topically related documents. Classification learns from existing categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. Frequent itemset mining takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.