- 32

Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms

https://github.com/yuantiku/ytk-learn Dependencies:

commons-logging:commons-logging:1.1.3

com.typesafe:config:1.2.1

com.fenbi.mp4j:ytk-mp4j:0.0.1

org.apache.spark:spark-core_2.10:1.6.0

org.python:jython-standalone:2.7.0

org.projectlombok:lombok:1.16.10

org.apache.hadoop:hadoop-common:2.5.0

org.apache.hadoop:hadoop-hdfs:2.5.0

org.apache.hadoop:hadoop-client:2.5.0

com.esotericsoftware:kryo:4.0.0

Tags | machine-learning distributed gbm gbdt logistic-regression factorization-machines spark hadoop |

Implementation | Java |

License | MIT |

Platform | OS-Independent |

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. The same code runs on major distributed environment (Hadoop, SGE, MPI) and can solve problems beyond billions of examples.XGBoost has been developed and used by a group of active community members. Your help is very valuable to make the package better for everyone.

gbdt gbrt gbm distributed-systems xgboost gradient-boosting histogramFor more details, please refer to Features.Experiments on public datasets show that LightGBM can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. What's more, the experiments show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings.

gbdt gbm machine-learning data-mining kaggle efficiency distributed lightgbm gbrtRumale (Ruby machine learning) is a machine learning library in Ruby. Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python. Rumale supports Linear / Kernel Support Vector Machine, Logistic Regression, Linear Regression, Ridge, Lasso, Kernel Ridge, Factorization Machine, Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor classifier, K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, SNN, Power Iteration Clustering, Mutidimensional Scaling, t-SNE, Principal Component Analysis, Kernel PCA and Non-negative Matrix Factorization. This project was formerly known as "SVMKit". If you are using SVMKit, please install Rumale and replace SVMKit constants with Rumale.

machine-learning data-science data-analysis artificial-intelligenceThe Accord.NET project provides machine learning, statistics, artificial intelligence, computer vision and image processing methods to .NET. It can be used on Microsoft Windows, Xamarin, Unity3D, Windows Store applications, Linux or mobile.

machine-learning framework c-sharp nuget visual-studio statistics unity3d neural-network support-vector-machines computer-vision image-processing ffmpegParacel is a distributed computational framework, designed for many machine learning problems: Logistic Regression, SVD, Matrix Factorization(BFGS, sgd, als, cg), LDA, Lasso... Firstly, paracel splits both massive dataset and massive parameter space. Unlike Mapreduce-Like Systems, paracel offers a simple communication model, allowing you to work with a global and distributed key-value storage, which is called parameter server.

machine-learning distributed-computing graph c-plus-plusThe library fastFM is an academic project. The time and resources spent developing fastFM are therefore justified by the number of citations of the software. If you publish scientific articles using fastFM, please cite the following article (bibtex entry citation.bib). This repository allows you to use Factorization Machines in Python (2.7 & 3.x) with the well known scikit-learn API. All performance critical code as been written in C and wrapped with Cython. fastFM provides stochastic gradient descent (SGD) and coordinate descent (CD) optimization routines as well as Markov Chain Monte Carlo (MCMC) for Bayesian inference. The solvers can be used for regression, classification and ranking problems. Detailed usage instructions can be found in the online documentation and on arXiv.

machine-learning recommender-system factorization-machines matrix-factorizationTensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark clusters. By combining salient features from deep learning framework TensorFlow and big-data frameworks Apache Spark and Apache Hadoop, TensorFlowOnSpark enables distributed deep learning on a cluster of GPU and CPU servers.TensorFlowOnSpark was developed by Yahoo for large-scale distributed deep learning on our Hadoop clusters in Yahoo's private cloud.

tensorflow spark yahoo machine-learning cluster featuredDistributed Deep Learning with Apache Spark and Keras. Distributed Keras is a distributed deep learning framework built op top of Apache Spark and Keras, with a focus on "state-of-the-art" distributed optimization algorithms. We designed the framework in such a way that a new distributed optimizer could be implemented with ease, thus enabling a person to focus on research. Several distributed methods are supported, such as, but not restricted to, the training of ensembles and models using data parallel methods.

machine-learning deep-learning apache-spark data-parallelism distributed-optimizers keras optimization-algorithms tensorflow data-science hadoopThis GitHub repository contains the code examples of the 1st Edition of Python Machine Learning book. If you are looking for the code examples of the 2nd Edition, please refer to this repository instead. What you can expect are 400 pages rich in useful material just about everything you need to know to get started with machine learning ... from theory to the actual code that you can directly put into action! This is not yet just another "this is how scikit-learn works" book. I aim to explain all the underlying concepts, tell you everything you need to know in terms of best practices and caveats, and we will put those concepts into action mainly using NumPy, scikit-learn, and Theano.

machine-learning machine-learning-algorithms logistic-regression data-science data-mining scikit-learn neural-networkThe open-source project that does large-scale machine learning including logistic regression, matrix factorization etc.

CatBoost is a machine learning method based on gradient boosting over decision trees. All CatBoost documentation is available here.

machine-learning decision-trees gradient-boosting gbm gbdt r kaggle gpu-computing catboost tutorial categorical-features distributed gpu coreml opensource data-science big-dataIPython Notebook(s) demonstrating deep learning functionality.IPython Notebook(s) demonstrating scikit-learn functionality.

machine-learning deep-learning data-science big-data aws tensorflow theano caffe scikit-learn kaggle spark mapreduce hadoop matplotlib pandas numpy scipy kerasThis is the official code repository for Machine Learning with TensorFlow. Get started with machine learning using TensorFlow, Google's latest and greatest machine learning library.

tensorflow machine-learning regression convolutional-neural-networks logistic-regression book reinforcement-learning autoencoder linear-regression classification clusteringxLearn is a high performance, easy-to-use, and scalable machine learning package, which can be used to solve large-scale machine learning problems. xLearn is especially useful for solving machine learning problems on large-scale sparse data, which is very common in Internet services such as online advertisement and recommender systems in recent years. If you are the user of liblinear, libfm, or libffm, now xLearn is your another better choice. xLearn is developed with high-performance C++ code with careful design and optimizations. Our system is designed to maximize CPU and memory utilization, provide cache-aware computation, and support lock-free learning. By combining these insights, xLearn is 5x-13x faster compared to similar systems.

machine-learning statistics data-science data-analysisMMLSpark provides a number of deep learning and data science tools for Apache Spark, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK) and OpenCV, enabling you to quickly create powerful, highly-scalable predictive and analytical models for large image and text datasets.MMLSpark requires Scala 2.11, Spark 2.1+, and either Python 2.7 or Python 3.5+. See the API documentation for Scala and for PySpark.

machine-learning spark cntk pyspark azure microsoft-machine-learning microsoft mlThis repository contains implementations of basic machine learning algorithms in plain Python (Python Version 3.6+). All algorithms are implemented from scratch without using additional machine learning libraries. The intention of these notebooks is to provide a basic understanding of the algorithms and their underlying structure, not to provide the most efficient implementations. After several requests I started preparing notebooks on how to preprocess datasets for machine learning. Within the next months I will add one notebook for each kind of dataset (text, images, ...). As before, the intention of these notebooks is to provide a basic understanding of the preprocessing steps, not to provide the most efficient implementations.

machine-learning logistic-regression ipynb machine-learning-algorithms linear-regression perceptron python-implementations kmeans algorithm python3 neural-network k-nearest-neighbours k-nearest-neighbor k-nn neural-networksThis repository contains python implementations of certain exercises from the course by Andrew Ng. For a number of assignments in the course you are instructed to create complete, stand-alone Octave/MATLAB implementations of certain algorithms (Linear and Logistic Regression for example). The rest of the assignments depend on additional code provided by the course authors. For most of the code in this repository I have instead used existing Python implementations like Scikit-learn.

coursera-machine-learning predictive-modeling andrew-ngPython codes for common Machine Learning Algorithms

linear-regression polynomial-regression logistic-regression decision-trees random-forest svm svr knn-classification naive-bayes-classifier kmeans-clustering hierarchical-clustering pca lda xgboost-algorithmCourse materials for General Assembly's Data Science course in Washington, DC (8/18/15 - 10/29/15).

data-science machine-learning scikit-learn data-analysis pandas jupyter-notebook course linear-regression logistic-regression model-evaluation naive-bayes natural-language-processing decision-trees ensemble-learning clustering regular-expressions web-scraping data-visualization data-cleaningThe Oryx open source project provides infrastructure for lambda-architecture applications on top of Spark, Spark Streaming and Kafka. On this, it provides further support for real-time, large scale machine learning, and end-to-end applications of this support for common machine learning use cases, like recommendations, clustering, classification and regression.

lambda lambda-architecture oryx apache-spark machine-learning kafka classification clustering
We have large collection of open source products. Follow the tags from
Tag Cloud >>

Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
**Add Projects.**