- 185

This repository is an aggregator for various R, make and git/github teaching material. Most of the courses are taught at the University of Cambridge, UK, and some have been adapted and exported outside. We would also like to acknowledge contributions from Aleksandra Pawlik, Software Sustainability Institute, Raphael Gottardo, Fred Hutchinson Cancer Research Center and Karl Broman, University of Wisconsin-Madison. Each material subdirectory has its own repository; TeachingMaterial aggregates a snapshot as a central entry point. Aggregation is done using git-subtree (see the administration page for details). The local copies linking to external repositories are prefixed with an underscore.

http://lgatto.github.io/TeachingMaterial/https://github.com/lgatto/TeachingMaterial

Tags | r programming-tutorial oo-programming data-analysis teaching-materials statistics vectorisation proteomics visualisation make makefile |

Implementation | HTML |

License | Public |

Platform |

This introductory data science course that is our (working) answer to these questions. The courses focuses on data acquisition and wrangling, exploratory data analysis, data visualization, and effective communication and approaching statistics from a model-based, instead of an inference-based, perspective. A heavy emphasis is placed on a consitent syntax (with tools from the tidyverse), reproducibility (with R Markdown) and version control and collaboration (with git/GitHub). We help ease the learning curve by avoiding local installation and supplementing out-of-class learning with interactive tools (like learnr tutorials). By the end of the semester teams of students work on fully reproducible data analysis projects on data they acquired, answering questions they care about. This repository serves as a "data science course in a box" containing all materials required to teach (or learn from) the course described above.

rstats r education teaching data-scienceThis is a repository of teaching materials, code, and data for my data analysis and machine learning projects.Each repository will (usually) correspond to one of the blog posts on my web site.

machine-learning data-analysis data-science ipython-notebook evolutionary-algorithmggvis is currently dormant. We fundamentally believe in the ideas that underlie ggvis: reactive programming is the right foundation for interactive visualisation. However, we are not currently working on ggvis because we do not see it as the most pressing issue for the R community as you can only use interactive graphics once you've successfuly tackled the rest of the data analysis process. We hope to come back to ggvis in the future; in the meantime you might want to try out plotly or creating inteactive graphics with shiny.

An educational tutorial and working demonstration pipeline for RNA-seq analysis including an introduction to: cloud computing, next generation sequence file formats, reference genomes, gene annotation, expression analysis, differential expression analysis, alternative splicing analysis, data visualization, and interpretation. This repository is used to store code and certain raw materials for a detailed RNA-seq tutorial. To actually complete this tutorial, go to the RNA-seq tutorial wiki.

Every week*, our data science team @Gnip (aka @TwitterBoulder) gets together for about 50 minutes to learn something. While these started as opportunities to collectively "raise the tide" on common stumbling blocks in data munging and analysis tasks, they have since grown to machine learning, statistics, and general programming topics. Anything that will help us do our jobs better is fair game.

The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chapters of slow, mathematical analysis. The typical text on Bayesian inference involves two to three chapters on probability theory, then enters what Bayesian inference is. Unfortunately, due to mathematical intractability of most Bayesian models, the reader is only shown simple, artificial examples. This can leave the user with a so-what feeling about Bayesian inference. In fact, this was the author's own prior opinion. After some recent success of Bayesian methods in machine-learning competitions, I decided to investigate the subject again. Even with my mathematical background, it took me three straight-days of reading examples and trying to put the pieces together to understand the methods. There was simply not enough literature bridging theory to practice. The problem with my misunderstanding was the disconnect between Bayesian mathematics and probabilistic programming. That being said, I suffered then so the reader would not have to now. This book attempts to bridge the gap.

bayesian-methods pymc mathematical-analysis jupyter-notebook data-science statisticsA curated list of awesome Competitive Programming, Algorithm and Data Structure resources. Please kindly follow CONTRIBUTING.md to get started.

quora learning-materials contest practice reference-materials competitive-programming awesome-list awesome list algorithm programming-contests data-structureChris Fonnesbeck is an Assistant Professor in the Department of Biostatistics at the Vanderbilt University School of Medicine. He specializes in computational statistics, Bayesian methods, meta-analysis, and applied decision analysis. He originally hails from Vancouver, BC and received his Ph.D. from the University of Georgia. This tutorial will introduce the use of Python for statistical data analysis, using data stored as Pandas DataFrame objects. Much of the work involved in analyzing data resides in importing, cleaning and transforming data in preparation for analysis. Therefore, the first half of the course is comprised of a 2-part overview of basic and intermediate Pandas usage that will show how to effectively manipulate datasets in memory. This includes tasks like indexing, alignment, join/merge methods, date/time types, and handling of missing data. Next, we will cover plotting and visualization using Pandas and Matplotlib, focusing on creating effective visual representations of your data, while avoiding common pitfalls. Finally, participants will be introduced to methods for statistical data modeling using some of the advanced functions in Numpy, Scipy and Pandas. This will include fitting your data to probability distributions, estimating relationships among variables using linear and non-linear models, and a brief introduction to bootstrapping methods. Each section of the tutorial will involve hands-on manipulation and analysis of sample datasets, to be provided to attendees in advance.

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible.

language programming-language statistical-language statisticsThis second programming assignment will require you to write an R function that is able to cache potentially time-consuming computations. For example, taking the mean of a numeric vector is typically a fast operation. However, for a very long vector, it may take too long to compute the mean, especially if it has to be computed repeatedly (e.g. in a loop). If the contents of a vector are not changing, it may make sense to cache the value of the mean so that when we need it again, it can be looked up in the cache rather than recomputed. In this Programming Assignment you will take advantage of the scoping rules of the R language and how they can be manipulated to preserve state inside of an R object. In this example we introduce the <<- operator which can be used to assign a value to an object in an environment that is different from the current environment. Below are two functions that are used to create a special object that stores a numeric vector and caches its mean.

This is the main repository for the Programming Historian (http://programminghistorian.org), where we keep the files for the live website. For tutorials in submission, please see: Programming Historian Submissions.

programming-historian text-analysis api data-management data-manipulation data-mining pedagogy linked-open-data mapping network-analysis exhibits scraping dh digital-humanitiesEdward is a Python library for probabilistic modeling, inference, and criticism. It is a testbed for fast experimentation and research with probabilistic models, ranging from classical hierarchical models on small data sets to complex deep probabilistic models on large data sets. Edward fuses three fields: Bayesian statistics and machine learning, deep learning, and probabilistic programming. Edward is built on top of TensorFlow. It enables features such as computational graphs, distributed training, CPU/GPU integration, automatic differentiation, and visualization with TensorBoard.

bayesian-methods deep-learning machine-learning data-science tensorflow neural-networks statistics probabilistic-programmingWelcome to "Bayesian Modelling in Python" - a tutorial for those interested in learning how to apply bayesian modelling techniques in python (PYMC3). This tutorial doesn't aim to be a bayesian statistics tutorial - but rather a programming cookbook for those who understand the fundamental of bayesian statistics and want to learn how to build bayesian models using python. The tutorial sections and topics can be seen below. Statistics is a topic that never resonated with me throughout university. The frequentist techniques that we were taught (p-values etc) felt contrived and ultimately I turned my back on statistics as a topic that I wasn't interested in.

bayesian-statistics tutorial pymcit is a tutorial with many sub projects to learn c# programming, OO programming, and data structure/algorithm

Please cite our JMLR paper [bibtex]. Some parts of the package were created as part of other publications. If you use these parts, please cite the relevant work appropriately. An overview of all mlr related publications can be found here.

machine-learning data-science tuning cran r-package predictive-modeling classification regression statistics r survival-analysis imbalance-correction tutorial mlr learners hyperparameters-optimization feature-selection multilabel-classification clustering stackingHere you'll find all the code (and more!) for the Maker Media book Make: AVR Programming. Most of the projects share a common set of pin defines and a common simple USART serial library in the AVR-Programming-Library directory. The makefiles I've included depend on the directory structure here by default, so don't go moving the folders around unless you also change the path to included files in the makefile.

purrr enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors. If you’ve never heard of FP before, the best place to start is the family of map() functions which allow you to replace many for loops with code that is both more succinct and easier to read. The best place to learn about the map() functions is the iteration chapter in R for data science. The following example uses purrr to solve a fairly realistic problem: split a data frame into pieces, fit a model to each piece, compute the summary, then extract the R2.

r functional-programmingData Science is a new "sexy" buzzword without specific meaning but often used to substitute Statistics, Scientific Computing, Text and Data Mining and Visualization, Machine Learning, Data Processing and Warehousing as well as Retrieval Algorithms of any kind. This curated list comprises awesome tutorials, libraries, information sources about various Data Science applications using the Ruby programming language.

data-science data-visualization data-analysis data-mining data-analytics visualization awesome awesome-list list rubydatascienceProlog is a programming language that is rooted in formal logic. It supports backtracking and unification as built-in features. Prolog allows us to elegantly solve many tasks with short and general programs. The goal of this material is to bridge the gap between the great traditional Prolog textbooks of the past and the language as it currently is, several decades after these books were written. You will see that many limitations of the past are no longer relevant, while several new constructs are now of great importance even though they are not yet covered in any available Prolog book.

prolog book logic-programming teaching-materials constraintstidybayes is an R package that aims to make it easy to integrate popular Bayesian modeling methods into a tidy data + ggplot workflow. Composing data for use with the model. This often means translating data from a data.frame into a list , making sure factors are encoded as numerical data, adding variables to store the length of indices, etc. This package helps automate these operations using the compose_data function, which automatically handles data types like numeric, logical, factor, and ordinal, and allows easy extensions for converting other datatypes into a format the model understands by providing your own implementation of the generic as_data_list.

r tidy-data bayesian-data-analysis r-package visualization ggplot2 stan jags
We have large collection of open source products. Follow the tags from
Tag Cloud >>

Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
**Add Projects.**