r4ds - R for data science

  •        110

This is code and text behind the R for Data Science book.

http://r4ds.had.co.nz/
https://github.com/hadley/r4ds

Tags
Implementation
License
Platform

   




Related Projects

data-science-at-the-command-line - Data Science at the Command Line

  •    HTML

This repository contains the full text, data, scripts, and custom command-line tools used in the book Data Science at the Command Line. The book is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License. The command-line tools are licensed under the BSD 2-Clause License.

bookdown - Authoring Books and Technical Documents with R Markdown

  •    R

Full documentation at https://bookdown.org/yihui/bookdown, and see "Get Started" at https://bookdown.org to know how to get started with writing a book. You are welcome to send us feedback using Github issues or ask questions on StackOverflow with the bookdown tag.

adv-r - Advanced R programming: a book

  •    TeX

This is code and text behind the Advanced R programming book. The site is built with bookdown.

bookdown-demo - A minimal book example using bookdown

  •    CSS

This is a minimal example of a book based on R Markdown and bookdown (https://github.com/rstudio/bookdown). Please see the page "Get Started" at https://bookdown.org/ for how to compile this example.

tidy-text-mining - Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson

  •    TeX

This is a draft of the book Text Mining with R: A Tidy Approach, by Julia Silge and David Robinson. Please note that this work is being written under a Contributor Code of Conduct and released under a CC-BY-NC-SA license. By participating in this project (for example, by submitting a pull request with suggestions or edits) you agree to abide by its terms.


data-science-your-way - Ways of doing Data Science Engineering and Machine Learning in R and Python

  •    Jupyter

These series of tutorials on Data Science engineering will try to compare how different concepts in the discipline can be implemented in the two dominant ecosystems nowadays: R and Python. We will do this from a neutral point of view. Our opinion is that each environment has good and bad things, and any data scientist should know how to use both in order to be as prepared as posible for job market or to start personal project.

DataScienceR - a curated list of R tutorials for Data Science, NLP and Machine Learning

  •    R

This repo contains a curated list of R tutorials and packages for Data Science, NLP and Machine Learning. This also serves as a reference guide for several common data analysis tasks. Curated list of Python tutorials for Data Science, NLP and Machine Learning.

datascience-box - Data Science Course in a Box

  •    HTML

This introductory data science course that is our (working) answer to these questions. The courses focuses on data acquisition and wrangling, exploratory data analysis, data visualization, and effective communication and approaching statistics from a model-based, instead of an inference-based, perspective. A heavy emphasis is placed on a consitent syntax (with tools from the tidyverse), reproducibility (with R Markdown) and version control and collaboration (with git/GitHub). We help ease the learning curve by avoiding local installation and supplementing out-of-class learning with interactive tools (like learnr tutorials). By the end of the semester teams of students work on fully reproducible data analysis projects on data they acquired, answering questions they care about. This repository serves as a "data science course in a box" containing all materials required to teach (or learn from) the course described above.

feather - Feather: fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow

  •    C++

Feather provides binary columnar serialization for data frames. It is designed to make reading and writing data frames efficient, and to make sharing data across data analysis languages easy. This initial version comes with bindings for python (written by Wes McKinney) and R (written by Hadley Wickham). Feather uses the Apache Arrow columnar memory specification to represent binary data on disk. This makes read and write operations very fast. This is particularly important for encoding null/NA values and variable-length types like UTF8 strings.

metaflow - Build and manage real-life data science projects with ease.

  •    Python

Metaflow is a human-friendly Python/R library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning. For more information, see Metaflow's website and documentation.

targets - Function-oriented Make-like declarative workflows for R

  •    R

The targets package is a Make-like pipeline toolkit for Statistics and data science in R. With targets, you can maintain a reproducible workflow without repeating yourself. targets skips costly runtime for tasks that are already up to date, runs the necessary computation with implicit parallel computing, and abstracts files as R objects. A fully up-to-date targets pipeline is tangible evidence that the output aligns with the code and data, which substantiates trust in the results. Please note that this package is released with a Contributor Code of Conduct.

practical-machine-learning-with-python - Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system

  •    Jupyter

"Data is the new oil" is a saying which you must have heard by now along with the huge interest building up around Big Data and Machine Learning in the recent past along with Artificial Intelligence and Deep Learning. Besides this, data scientists have been termed as having "The sexiest job in the 21st Century" which makes it all the more worthwhile to build up some valuable expertise in these areas. Getting started with machine learning in the real world can be overwhelming with the vast amount of resources out there on the web. "Practical Machine Learning with Python" follows a structured and comprehensive three-tiered approach packed with concepts, methodologies, hands-on examples, and code. This book is packed with over 500 pages of useful information which helps its readers master the essential skills needed to recognize and solve complex problems with Machine Learning and Deep Learning by following a data-driven mindset. By using real-world case studies that leverage the popular Python Machine Learning ecosystem, this book is your perfect companion for learning the art and science of Machine Learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute Machine Learning systems and projects successfully.

plotly - An interactive graphing library for R

  •    R

An R package for creating interactive web graphics via the open source JavaScript graphing library plotly.js.NOTE: The CRAN version of plotly is designed to work with the CRAN version of ggplot2, but at least for the time being, we recommend using the development versions of both plotly and ggplot2 (devtools::install_github("hadley/ggplot2")).

free-data-science-books - Free resources for learning data science

  •    

This list contains free learning resources for data science and big data related concepts, techniques, and applications. Inspired by Free Programming Books. Each entry provides the expected audience for the certain book (beginner, intermediate, or veteran). It may be subjective, but it provides some clue of how difficult the book is.

spark-py-notebooks - Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

  •    Jupyter

This is a collection of IPython notebook/Jupyter notebooks intended to train the reader on different Apache Spark concepts, from basic to advanced, by using the Python language. If Python is not your language, and it is R, you may want to have a look at our R on Apache Spark (SparkR) notebooks instead. Additionally, if your are interested in being introduced to some basic Data Science Engineering, you might find these series of tutorials interesting. There we explain different concepts and applications using Python and R.

plyr - A R package for splitting, applying and combining large problems into simpler problems

  •    R

Considerable effort has been put into making plyr fast and memory efficient, and in many cases plyr is as fast as, or faster than, the built-in equivalents. A detailed introduction to plyr has been published in JSS: "The Split-Apply-Combine Strategy for Data Analysis", http://www.jstatsoft.org/v40/i01/. You can find out more at http://had.co.nz/plyr/, or track development at http://github.com/hadley/plyr. You can ask questions about plyr (and data manipulation in general) on the plyr mailing list. Sign up at http://groups.google.com/group/manipulatr.

dataviz - A book covering the fundamentals of data visualization.

  •    R

A guide to making visualizations that accurately reflect the data, tell a story, and look professional. This repository holds the R Markdown source for the book "Fundamentals of Data Visualization" to be published with O’Reilly Media, Inc. A rendered version of the completed book chapters is available here. The book requires a supporting R package available here.

data-science-from-scratch - code for Data Science From Scratch book

  •    Python

Additionally, I've collected all the links from the book. And, by popular demand, I made an index of functions defined in the book, by chapter and page number. The data is in a spreadsheet, or I also made a toy (experimental) searchable webapp.

ggpubr - 'ggplot2' Based Publication Ready Plots

  •    R

ggplot2 by Hadley Wickham is an excellent and flexible package for elegant data visualization in R. However the default generated plots requires some formatting before we can send them for publication. Furthermore, to customize a ggplot, the syntax is opaque and this raises the level of difficulty for researchers with no advanced R programming skills. The 'ggpubr' package provides some easy-to-use functions for creating and customizing 'ggplot2'- based publication ready plots.

awesome - Awesome resources on Bioinformatics, data science, machine learning, programming language (Python, Golang, R, Perl) and miscellaneous stuff

  •    

Collection of useful resources on Bioinformatics, data science, machine learning, programming language (Python, Golang, R, Perl, etc.) and miscellaneous stuff.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.