- 13

Please cite our JMLR paper [bibtex]. Some parts of the package were created as part of other publications. If you use these parts, please cite the relevant work appropriately. An overview of all mlr related publications can be found here.

https://mlr-org.github.io/mlr/https://github.com/mlr-org/mlr

Practice and tutorial-style notebooks covering wide variety of machine learning techniques

numpy statistics pandas matplotlib regression scikit-learn classification principal-component-analysis clustering decision-trees random-forest dimensionality-reduction neural-network deep-learning artificial-intelligence data-science machine-learning k-nearest-neighbours naive-bayesPyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters. PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and an accurate evaluation of large variety of classifiers. threshold is added in version 0.9 for real value prediction.

machine-learning confusion-matrix matrix statistics statistical-analysis accuracy ml ai mathematics data-mining data-analysis classification classifier data-science data neural-network multiclass-classification deep-learning artificial-intelligence deeplearningThe brms package provides an interface to fit Bayesian generalized (non-)linear multivariate multilevel models using Stan, which is a C++ package for performing full Bayesian inference (see http://mc-stan.org/). The formula syntax is very similar to that of the package lme4 to provide a familiar and simple interface for performing regression analyses. A wide range of distributions and link functions are supported, allowing users to fit -- among others -- linear, robust linear, count data, survival, response times, ordinal, zero-inflated, hurdle, and even self-defined mixture models all in a multilevel context. Further modeling options include non-linear and smooth terms, auto-correlation structures, censored data, missing value imputation, and quite a few more. In addition, all parameters of the response distribution can be predicted in order to perform distributional regression. Multivariate models (i.e. models with multiple response variables) can be fitted, as well. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. Model fit can easily be assessed and compared with posterior predictive checks, leave-one-out cross-validation, and Bayes factors. As a simple example, we use poisson regression to model the seizure counts in epileptic patients to investigate whether the treatment (represented by variable Trt) can reduce the seizure counts and whether the effect of the treatment varies with the baseline number of seizures a person had before treatment (variable log_Base4_c). As we have multiple observations per person, a group-level intercept is incorporated to account for the resulting dependency in the data.

brms stan bayesian-inference multilevel-models statistical-models r-packageMMLSpark provides a number of deep learning and data science tools for Apache Spark, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK) and OpenCV, enabling you to quickly create powerful, highly-scalable predictive and analytical models for large image and text datasets.MMLSpark requires Scala 2.11, Spark 2.1+, and either Python 2.7 or Python 3.5+. See the API documentation for Scala and for PySpark.

machine-learning spark cntk pyspark azure microsoft-machine-learning microsoft mlSmile (Statistical Machine Intelligence and Learning Engine) is a fast and comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system in Java and Scala. With advanced data structures and algorithms, Smile delivers state-of-art performance.Smile covers every aspect of machine learning, including classification, regression, clustering, association rule mining, feature selection, manifold learning, multidimensional scaling, genetic algorithms, missing value imputation, efficient nearest neighbor search, etc.

machine-learning nlp linear-algebra natural-language-processingThe Accord.NET project provides machine learning, statistics, artificial intelligence, computer vision and image processing methods to .NET. It can be used on Microsoft Windows, Xamarin, Unity3D, Windows Store applications, Linux or mobile.

machine-learning framework c-sharp nuget visual-studio statistics unity3d neural-network support-vector-machines computer-vision image-processing ffmpegscikit-learn is a Python module for machine learning built on top of SciPy. It is simple and efficient tools for data mining and data analysis. It supports automatic classification, clustering, model selection, pre processing and lot more.

machine-learning data-mining data-analysis classificationJava Machine Learning Library is a library of machine learning algorithms and related datasets. Machine learning techniques include: clustering, classification, feature selection, regression, data pre-processing, ensemble learning, voting, ...

"Data is the new oil" is a saying which you must have heard by now along with the huge interest building up around Big Data and Machine Learning in the recent past along with Artificial Intelligence and Deep Learning. Besides this, data scientists have been termed as having "The sexiest job in the 21st Century" which makes it all the more worthwhile to build up some valuable expertise in these areas. Getting started with machine learning in the real world can be overwhelming with the vast amount of resources out there on the web. "Practical Machine Learning with Python" follows a structured and comprehensive three-tiered approach packed with concepts, methodologies, hands-on examples, and code. This book is packed with over 500 pages of useful information which helps its readers master the essential skills needed to recognize and solve complex problems with Machine Learning and Deep Learning by following a data-driven mindset. By using real-world case studies that leverage the popular Python Machine Learning ecosystem, this book is your perfect companion for learning the art and science of Machine Learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute Machine Learning systems and projects successfully.

machine-learning deep-learning text-analytics classification clustering natural-language-processing computer-vision data-science spacy nltk scikit-learn prophet time-series-analysis convolutional-neural-networks tensorflow keras statsmodels pandas deep-neural-networksMLlib is a Spark implementation of some common machine learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction and lot more.

machine-learning data-mining data-analysis classificationcaret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models

Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines to find the best one for your data.

machine-learning data-science automl automation scikit-learn hyperparameter-optimization model-selection parameter-tuning automated-machine-learning random-forest gradient-boosting feature-engineering xgboost genetic-programmingMachine Learning models are widely used and have various applications in classification or regression tasks. Due to increasing computational power, availability of new data sources and new methods, ML models are more and more complex. Models created with techniques like boosting, bagging of neural networks are true black boxes. It is hard to trace the link between input variables and model outcomes. They are use because of high performance, but lack of interpretability is one of their weakest sides. In many applications we need to know, understand or prove how input variables are used in the model and what impact do they have on final model prediction. DALEX is a set of tools that help to understand how complex models are working.

machine-learning interpretability data-science xai visual-explanations imlJubatus is a distributed processing framework and streaming machine learning library. Jubatus includes these functionalities: Online Machine Learning Library: Classification, Regression, Recommendation (Nearest Neighbor Search), Graph Mining, Anomaly Detection, Clustering, Feature Vector Converter (fv_converter): Data Preprocess and Feature Extraction, Framework for Distributed Online Machine Learning with Fault Tolerance.

machine-learning machine-learning-framework distributedThis project aims at a minimal benchmark for scalability, speed and accuracy of commonly used implementations of a few machine learning algorithms. The target of this study is binary classification with numeric and categorical inputs (of limited cardinality i.e. not very sparse) and no missing data, perhaps the most common problem in business applications (e.g. credit scoring, fraud detection or churn prediction). If the input matrix is of n x p, n is varied as 10K, 100K, 1M, 10M, while p is ~1K (after expanding the categoricals into dummy variables/one-hot encoding). This particular type of data structure/size (the largest) stems from this author's interest in some particular business applications. Note: While a large part of this benchmark was done in Spring 2015 reflecting the state of ML implementations at that time, this repo is being updated if I see significant changes in implementations or new implementations have become widely available (e.g. lightgbm). Also, please find a summary of the progress and learnings from this benchmark at the end of this repo.

machine-learning data-science r gradient-boosting-machine random-forest deep-learning xgboost h2o sparkThis chapter intends to introduce the main objects and concepts in TensorFlow. We also introduce how to access the data for the rest of the book and provide additional resources for learning about TensorFlow. After we have established the basic objects and methods in TensorFlow, we now want to establish the components that make up TensorFlow algorithms. We start by introducing computational graphs, and then move to loss functions and back propagation. We end with creating a simple classifier and then show an example of evaluating regression and classification algorithms.

tensorflow tensorflow-cookbook linear-regression neural-network tensorflow-algorithms rnn cnn svm nlp packtpub machine-learning tensorboard classification regression kmeans-clustering genetic-algorithm odeThese series of tutorials on Data Science engineering will try to compare how different concepts in the discipline can be implemented in the two dominant ecosystems nowadays: R and Python. We will do this from a neutral point of view. Our opinion is that each environment has good and bad things, and any data scientist should know how to use both in order to be as prepared as posible for job market or to start personal project.

data-science data-science-engineering tutorial data-frame exploratory-data-analysis r jupyter notebook machine-learningThe Oryx open source project provides infrastructure for lambda-architecture applications on top of Spark, Spark Streaming and Kafka. On this, it provides further support for real-time, large scale machine learning, and end-to-end applications of this support for common machine learning use cases, like recommendations, clustering, classification and regression.

lambda lambda-architecture oryx apache-spark machine-learning kafka classification clusteringThis repo contains a curated list of R tutorials and packages for Data Science, NLP and Machine Learning. This also serves as a reference guide for several common data analysis tasks. Curated list of Python tutorials for Data Science, NLP and Machine Learning.

datascience data-science r text-miningSkater is a unified framework to enable Model Interpretation for all forms of model to help one build an Interpretable machine learning system often needed for real world use-cases(** we are actively working towards to enabling faithful interpretability for all forms models). It is an open source python library designed to demystify the learned structures of a black box model both globally(inference on the basis of a complete data set) and locally(inference about an individual prediction). The project was started as a research idea to find ways to enable better interpretability(preferably human interpretability) to predictive "black boxes" both for researchers and practioners. The project is still in beta phase.

ml predictive-modeling machine-learning modeling-tools model-interpretation blackbox datascience model-explanation explanation-system deep-learning deep-neural-networks attribution lstm-neural-networks cnn-classification
We have large collection of open source products. Follow the tags from
Tag Cloud >>

Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
**Add Projects.**