Data-Analysis-and-Machine-Learning-Projects - Repository of teaching materials, code, and data for my data analysis and machine learning projects

  •        256

This is a repository of teaching materials, code, and data for my data analysis and machine learning projects.Each repository will (usually) correspond to one of the blog posts on my web site.

http://www.randalolson.com/blog/
https://github.com/rhiever/Data-Analysis-and-Machine-Learning-Projects

Tags
Implementation
License
Platform

   




Related Projects

spark-py-notebooks - Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

  •    Jupyter

This is a collection of IPython notebook/Jupyter notebooks intended to train the reader on different Apache Spark concepts, from basic to advanced, by using the Python language. If Python is not your language, and it is R, you may want to have a look at our R on Apache Spark (SparkR) notebooks instead. Additionally, if your are interested in being introduced to some basic Data Science Engineering, you might find these series of tutorials interesting. There we explain different concepts and applications using Python and R.

practical-machine-learning-with-python - Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system

  •    Jupyter

"Data is the new oil" is a saying which you must have heard by now along with the huge interest building up around Big Data and Machine Learning in the recent past along with Artificial Intelligence and Deep Learning. Besides this, data scientists have been termed as having "The sexiest job in the 21st Century" which makes it all the more worthwhile to build up some valuable expertise in these areas. Getting started with machine learning in the real world can be overwhelming with the vast amount of resources out there on the web. "Practical Machine Learning with Python" follows a structured and comprehensive three-tiered approach packed with concepts, methodologies, hands-on examples, and code. This book is packed with over 500 pages of useful information which helps its readers master the essential skills needed to recognize and solve complex problems with Machine Learning and Deep Learning by following a data-driven mindset. By using real-world case studies that leverage the popular Python Machine Learning ecosystem, this book is your perfect companion for learning the art and science of Machine Learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute Machine Learning systems and projects successfully.

data-science-your-way - Ways of doing Data Science Engineering and Machine Learning in R and Python

  •    Jupyter

These series of tutorials on Data Science engineering will try to compare how different concepts in the discipline can be implemented in the two dominant ecosystems nowadays: R and Python. We will do this from a neutral point of view. Our opinion is that each environment has good and bad things, and any data scientist should know how to use both in order to be as prepared as posible for job market or to start personal project.

Data-Science-45min-Intros - Ipython notebook presentations for getting starting with basic programming, statistics and machine learning techniques

  •    Jupyter

Every week*, our data science team @Gnip (aka @TwitterBoulder) gets together for about 50 minutes to learn something. While these started as opportunities to collectively "raise the tide" on common stumbling blocks in data munging and analysis tasks, they have since grown to machine learning, statistics, and general programming topics. Anything that will help us do our jobs better is fair game.

OpenML - Open Machine Learning

  •    CSS

We are a group of people who are excited about open science, open data and machine learning. We want to make machine learning and data analysis simple, accessible, collaborative and open with an optimal division of labour between computers and humans. OpenML is an online machine learning platform for sharing and organizing data, machine learning algorithms and experiments. It is designed to create a frictionless, networked ecosystem, that you can readily integrate into your existing processes/code/environments, allowing people all over the world to collaborate and build directly on each other’s latest ideas, data and results, irrespective of the tools and infrastructure they happen to use.


rumale - Rumale is a machine learning library in Ruby

  •    Ruby

Rumale (Ruby machine learning) is a machine learning library in Ruby. Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python. Rumale supports Linear / Kernel Support Vector Machine, Logistic Regression, Linear Regression, Ridge, Lasso, Kernel Ridge, Factorization Machine, Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor classifier, K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, SNN, Power Iteration Clustering, Mutidimensional Scaling, t-SNE, Principal Component Analysis, Kernel PCA and Non-negative Matrix Factorization. This project was formerly known as "SVMKit". If you are using SVMKit, please install Rumale and replace SVMKit constants with Rumale.

DataScienceR - a curated list of R tutorials for Data Science, NLP and Machine Learning

  •    R

This repo contains a curated list of R tutorials and packages for Data Science, NLP and Machine Learning. This also serves as a reference guide for several common data analysis tasks. Curated list of Python tutorials for Data Science, NLP and Machine Learning.

DataSciencePython - common data analysis and machine learning tasks using python

  •    Python

This repo contains a curated list of Python tutorials for Data Science, NLP and Machine Learning. Curated list of R tutorials for Data Science, NLP and Machine Learning.

dive-into-machine-learning - Dive into Machine Learning with Python Jupyter notebook and scikit-learn!

  •    

I learned Python by hacking first, and getting serious later. I wanted to do this with Machine Learning. If this is your style, join me in getting a bit ahead of yourself. I suggest you get your feet wet ASAP. You'll boost your confidence.

Math-of-Machine-Learning-Course-by-Siraj - Implements common data science methods and machine learning algorithms from scratch in python

  •    Jupyter

This repository was initially created to submit machine learning assignments for Siraj Raval's online machine learning course. The purpose of the course was to learn how to implement the most common machine learning algorithms from scratch (without using machine learning libraries such as tensorflow, PyTorch, scikit-learn, etc). Although that course has ended now, I am continuing to learn data science and machine learning from other sources such as Coursera, online blogs, and attending machine learning lectures at University of Toronto. Sticking to the theme of implementing machine learning algortihms from scratch, I will continue to post detailed notebooks in python here as I learn more.

TensorFlow-Machine-Learning-Cookbook - Code repository for TensorFlow Machine Learning Cookbook by Packt

  •    Python

This is the code repository for TensorFlow Machine Learning Cookbook, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish. TensorFlow is an open source software library for Machine Intelligence. The independent recipes in this book will teach you how to use TensorFlow for complex data computations and will let you dig deeper and gain more insights into your data than ever before. You’ll work through recipes on training models, model evaluation, sentiment analysis, regression analysis, clustering analysis, artificial neural networks, and deep learning – each using Google’s machine learning library TensorFlow.

xlearn - High performance, easy-to-use, and scalable ML package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and command line interface

  •    C++

xLearn is a high performance, easy-to-use, and scalable machine learning package, which can be used to solve large-scale machine learning problems. xLearn is especially useful for solving machine learning problems on large-scale sparse data, which is very common in Internet services such as online advertisement and recommender systems in recent years. If you are the user of liblinear, libfm, or libffm, now xLearn is your another better choice. xLearn is developed with high-performance C++ code with careful design and optimizations. Our system is designed to maximize CPU and memory utilization, provide cache-aware computation, and support lock-free learning. By combining these insights, xLearn is 5x-13x faster compared to similar systems.

probability - Probabilistic reasoning and statistical analysis in TensorFlow

  •    Jupyter

TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. As part of the TensorFlow ecosystem, TensorFlow Probability provides integration of probabilistic methods with deep networks, gradient-based inference via automatic differentiation, and scalability to large datasets and models via hardware acceleration (e.g., GPUs) and distributed computation. Our probabilistic machine learning tools are structured as follows.

Apache Spot - A Community Approach to Fighting Cyber Threats

  •    Java

Apache Spot is a community-driven cybersecurity project, built from the ground up, to bring advanced analytics to all IT Telemetry data on an open, scalable platform. pot expedites threat detection, investigation, and remediation via machine learning and consolidates all enterprise security data into a comprehensive IT telemetry hub based on open data models.

useR-machine-learning-tutorial - useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016

  •    Jupyter

Instructions for how to install the necessary software for this tutorial is available here. Data for the tutorial can be downloaded by running ./data/get-data.sh (requires wget). Certain algorithms don't scale well when there are millions of features. For example, decision trees require computing some sort of metric (to determine the splits) on all the feature values (or some fraction of the values as in Random Forest and Stochastic GBM). Therefore, computation time is linear in the number of features. Other algorithms, such as GLM, scale much better to high-dimensional (n << p) and wide data with appropriate regularization (e.g. Lasso, Elastic Net, Ridge).