art-data-science - The Art of Data Science

  •        1

The Art of Data Science



Related Projects

datascience-box - Data Science Course in a Box

  •    HTML

This introductory data science course that is our (working) answer to these questions. The courses focuses on data acquisition and wrangling, exploratory data analysis, data visualization, and effective communication and approaching statistics from a model-based, instead of an inference-based, perspective. A heavy emphasis is placed on a consitent syntax (with tools from the tidyverse), reproducibility (with R Markdown) and version control and collaboration (with git/GitHub). We help ease the learning curve by avoiding local installation and supplementing out-of-class learning with interactive tools (like learnr tutorials). By the end of the semester teams of students work on fully reproducible data analysis projects on data they acquired, answering questions they care about. This repository serves as a "data science course in a box" containing all materials required to teach (or learn from) the course described above.

awesome-datascience - :memo: An awesome Data Science repository to learn and apply for real world problems


An open source Data Science repository to learn and apply towards solving real world problems. First of all, Data Science is one of the hottest topics on the Computer and Internet farmland nowadays. People have gathered data from applications and systems until today and now is the time to analyze them. The next steps are producing suggestions from the data and creating predictions about the future. Here you can find the biggest question for Data Science and hundreds of answers from experts. Our favorite data scientist is Clare Corthell. She is an expert in data-related systems and a hacker, and has been working on a company as a data scientist. Clare's blog. This website helps you to understand the exact way to study as a professional data scientist.

Data-Analysis-and-Machine-Learning-Projects - Repository of teaching materials, code, and data for my data analysis and machine learning projects

  •    Jupyter

This is a repository of teaching materials, code, and data for my data analysis and machine learning projects.Each repository will (usually) correspond to one of the blog posts on my web site.

data-science-with-ruby - Practical Data Science with Ruby based tools.

  •    Ruby

Data Science is a new "sexy" buzzword without specific meaning but often used to substitute Statistics, Scientific Computing, Text and Data Mining and Visualization, Machine Learning, Data Processing and Warehousing as well as Retrieval Algorithms of any kind. This curated list comprises awesome tutorials, libraries, information sources about various Data Science applications using the Ruby programming language.

data-science-your-way - Ways of doing Data Science Engineering and Machine Learning in R and Python

  •    Jupyter

These series of tutorials on Data Science engineering will try to compare how different concepts in the discipline can be implemented in the two dominant ecosystems nowadays: R and Python. We will do this from a neutral point of view. Our opinion is that each environment has good and bad things, and any data scientist should know how to use both in order to be as prepared as posible for job market or to start personal project.

practical-machine-learning-with-python - Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system

  •    Jupyter

"Data is the new oil" is a saying which you must have heard by now along with the huge interest building up around Big Data and Machine Learning in the recent past along with Artificial Intelligence and Deep Learning. Besides this, data scientists have been termed as having "The sexiest job in the 21st Century" which makes it all the more worthwhile to build up some valuable expertise in these areas. Getting started with machine learning in the real world can be overwhelming with the vast amount of resources out there on the web. "Practical Machine Learning with Python" follows a structured and comprehensive three-tiered approach packed with concepts, methodologies, hands-on examples, and code. This book is packed with over 500 pages of useful information which helps its readers master the essential skills needed to recognize and solve complex problems with Machine Learning and Deep Learning by following a data-driven mindset. By using real-world case studies that leverage the popular Python Machine Learning ecosystem, this book is your perfect companion for learning the art and science of Machine Learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute Machine Learning systems and projects successfully.

urban-data-science - Course materials, Jupyter notebooks, tutorials, guides, and demos for a Python-based urban data science course

  •    Jupyter

This repo is my workspace for developing a cycle of course materials, IPython notebooks, and tutorials towards an academic urban data science course based on Python. Between Fall 2013 and Fall 2016, I was the grad student instructor (3 years) and co-lead instructor (1 year) for CP255, Urban Informatics and Visualization, at UC Berkeley. This course was developed by Paul Waddell and is ongoing at Berkeley with the fantastic contributions of @Arezoo-bz. If you're interested in these topics at all, you owe it to yourself to check out the latest iterations of Paul's excellent pedagogy in his CP255 repo. A couple years ago, I wrote this blog post describing our efforts for the course.

DataScienceR - a curated list of R tutorials for Data Science, NLP and Machine Learning

  •    R

This repo contains a curated list of R tutorials and packages for Data Science, NLP and Machine Learning. This also serves as a reference guide for several common data analysis tasks. Curated list of Python tutorials for Data Science, NLP and Machine Learning.

pachyderm - Reproducible Data Science at Scale!

  •    Go

Pachyderm is a tool for production data pipelines. If you need to chain together data scraping, ingestion, cleaning, munging, wrangling, processing, modeling, and analysis in a sane way, then Pachyderm is for you. If you have an existing set of scripts which do this in an ad-hoc fashion and you're looking for a way to "productionize" them, Pachyderm can make this easy for you. Install Pachyderm locally or deploy on AWS/GCE/Azure in about 5 minutes.

DataSciencePython - common data analysis and machine learning tasks using python

  •    Python

This repo contains a curated list of Python tutorials for Data Science, NLP and Machine Learning. Curated list of R tutorials for Data Science, NLP and Machine Learning.

spark-py-notebooks - Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

  •    Jupyter

This is a collection of IPython notebook/Jupyter notebooks intended to train the reader on different Apache Spark concepts, from basic to advanced, by using the Python language. If Python is not your language, and it is R, you may want to have a look at our R on Apache Spark (SparkR) notebooks instead. Additionally, if your are interested in being introduced to some basic Data Science Engineering, you might find these series of tutorials interesting. There we explain different concepts and applications using Python and R.

Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth

  •    Python

Prophet is a procedure for forecasting time series data. It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.Prophet is open source software released by Facebook's Core Data Science team. It is available for download on CRAN and PyPI.

competitive-data-science - Materials for "How to Win a Data Science Competition: Learn from Top Kagglers" course

  •    Jupyter

This repository contains programming assignments notebooks for the course about competitive data science.

OpenML - Open Machine Learning

  •    CSS

We are a group of people who are excited about open science, open data and machine learning. We want to make machine learning and data analysis simple, accessible, collaborative and open with an optimal division of labour between computers and humans. OpenML is an online machine learning platform for sharing and organizing data, machine learning algorithms and experiments. It is designed to create a frictionless, networked ecosystem, that you can readily integrate into your existing processes/code/environments, allowing people all over the world to collaborate and build directly on each other’s latest ideas, data and results, irrespective of the tools and infrastructure they happen to use.

hackermath - Introduction to Statistics and Basics of Mathematics for Data Science - The Hacker's Way

  •    Jupyter

Math literacy, including proficiency in Linear Algebra and Statistics,is a must for anyone pursuing a career in data science. The goal of this workshop is to introduce some key concepts from these domains that get used repeatedly in data science applications. Our approach is what we call the “Hacker’s way”. Instead of going back to formulae and proofs, we teach the concepts by writing code. And in practical applications. Concepts don’t remain sticky if the usage is never taught. The focus will be on depth rather than breadth. Three areas are chosen - Hypothesis Testing, Supervised Learning and Unsupervised Learning. They will be covered to sufficient depth - 50% of the time will be on the concepts and 50% of the time will be spent coding them.

xlearn - High performance, easy-to-use, and scalable ML package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and command line interface

  •    C++

xLearn is a high performance, easy-to-use, and scalable machine learning package, which can be used to solve large-scale machine learning problems. xLearn is especially useful for solving machine learning problems on large-scale sparse data, which is very common in Internet services such as online advertisement and recommender systems in recent years. If you are the user of liblinear, libfm, or libffm, now xLearn is your another better choice. xLearn is developed with high-performance C++ code with careful design and optimizations. Our system is designed to maximize CPU and memory utilization, provide cache-aware computation, and support lock-free learning. By combining these insights, xLearn is 5x-13x faster compared to similar systems.

holoviews - Stop plotting your data - annotate your data and let it visualize itself.

  •    Python

Stop plotting your data - annotate your data and let it visualize itself. HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple. With HoloViews, you can usually express what you want to do in very few lines of code, letting you focus on what you are trying to explore and convey, not on the process of plotting.


  •    HTML

This repository contains walkthroughs, templates and documentation related to Machine Learning & Data Science services and platforms on Azure. Services and platforms include Data Science Virtual Machine, Azure ML, HDInsight, Microsoft R Server, SQL-Server, Azure Data Lake etc.There are also materials from tutorials we have delivered at KDD, Strata etc., using the above services and platforms.