tidyr - Easily tidy data with spread and gather functions.

  •        14

Tidy data describes a standard way of storing data that is used wherever possible throughout the tidyverse. If you ensure that your data is tidy, you’ll spend less time fighting with the tools and more time working on your analysis. gather() takes multiple columns, and gathers them into key-value pairs: it makes “wide” data longer.

tidyr.tidyverse.org
https://github.com/tidyverse/tidyr

Tags
Implementation
License
Platform

   




Related Projects

tidytext - Text mining using dplyr, ggplot2, and other tidy tools :sparkles::page_facing_up::sparkles::page_facing_up::sparkles:

  •    R

Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like dplyr, broom, tidyr and ggplot2. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. Check out our book to learn more about text mining using tidy data principles. This function uses the tokenizers package to separate each line into words. The default tokenizing is for words, but other options include characters, n-grams, sentences, lines, paragraphs, or separation around a regex pattern.

tidyquant - Bringing financial analysis to the tidyverse

  •    R

tidyquant integrates the best resources for collecting and analyzing financial data, zoo, xts, quantmod, TTR, and PerformanceAnalytics, with the tidy data infrastructure of the tidyverse allowing for seamless interaction between each. You can now perform complete financial analyses in the tidyverse. Our short introduction to tidyquant on YouTube.

tidytuesday - Repo for initial setup of the #tidytuesday visualization project

  •    

A weekly data project aimed at the R ecosystem. An emphasis will be placed on understanding how to summarize and arrange data to make meaningful charts with ggplot2, tidyr, dplyr, and other tools in the tidyverse ecosystem. We will have many sources of data and want to emphasize that no causation is implied. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our guidelines are to use the data provided to practice your data tidying and plotting techniques. Participants are invited to consider for themselves what nuancing factors might underlie these relationships.

tidyverse - Easily install and load packages from the tidyverse

  •    R

The tidyverse is a set of packages that work in harmony because they share common data representations and API design. The tidyverse package is designed to make it easy to install and load core packages from the tidyverse in a single command. If you’d like to learn how to use the tidyverse effectively, the best place to start is R for data science.

tidy-text-mining - Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson

  •    TeX

This is a draft of the book Text Mining with R: A Tidy Approach, by Julia Silge and David Robinson. Please note that this work is being written under a Contributor Code of Conduct and released under a CC-BY-NC-SA license. By participating in this project (for example, by submitting a pull request with suggestions or edits) you agree to abide by its terms.


forcats - 🐈🐈🐈🐈: tools for working with categorical variables (factors)

  •    R

R uses factors to handle categorical variables, variables that have a fixed and known set of possible values. Historically, factors were much easier to work with than character vectors, so many base R functions automatically convert character vectors to factors. (For historical context, I recommend stringsAsFactors: An unauthorized biography by Roger Peng, and stringsAsFactors = <sigh> by Thomas Lumley. If you want to learn more about other approaches to working with factors and categorical data, I recommend Wrangling categorical data in R, by Amelia McNamara and Nicholas Horton.) These days, making factors automatically is no longer so helpful, so packages in the tidyverse never create them automatically. However, factors are still useful when you have true categorical data, and when you want to override the ordering of character vectors to improve display. The goal of the forcats package is to provide a suite of useful tools that solve common problems with factors. If you’re not familiar with strings, the best place to start is the chapter on factors in R for Data Science.

tidybayes - Bayesian analysis + tidy data + geoms (R package)

  •    R

tidybayes is an R package that aims to make it easy to integrate popular Bayesian modeling methods into a tidy data + ggplot workflow. Composing data for use with the model. This often means translating data from a data.frame into a list , making sure factors are encoded as numerical data, adding variables to store the length of indices, etc. This package helps automate these operations using the compose_data function, which automatically handles data types like numeric, logical, factor, and ordinal, and allows easy extensions for converting other datatypes into a format the model understands by providing your own implementation of the generic as_data_list.

modelr - Helper functions for modelling

  •    R

The modelr package provides functions that help you create elegant pipelines when modelling. It is designed primarily to support teaching the basics of modelling within the tidyverse, particularly in R for Data Science. modelr is stable: it has achieved its goal of making it easier to teach modelling within the tidyverse. For more general modelling tasks, check out the family of “tidymodel” packages like recipes, rsample, parsnip, and tidyposterior.

naniar - Tidy data structures, summaries, and visualisations for missing data

  •    R

For more details on the workflow and theory underpinning naniar, read the vignette Getting started with naniar. For a short primer on the data visualisation available in naniar, read the vignette Gallery of Missing Data Visualisations.

datascience-box - Data Science Course in a Box

  •    HTML

This introductory data science course that is our (working) answer to these questions. The courses focuses on data acquisition and wrangling, exploratory data analysis, data visualization, and effective communication and approaching statistics from a model-based, instead of an inference-based, perspective. A heavy emphasis is placed on a consitent syntax (with tools from the tidyverse), reproducibility (with R Markdown) and version control and collaboration (with git/GitHub). We help ease the learning curve by avoiding local installation and supplementing out-of-class learning with interactive tools (like learnr tutorials). By the end of the semester teams of students work on fully reproducible data analysis projects on data they acquired, answering questions they care about. This repository serves as a "data science course in a box" containing all materials required to teach (or learn from) the course described above.

broom - Convert statistical analysis objects from R into tidy format

  •    R

For a detailed introduction, please see vignette("broom"). broom tidies 100+ models from popular modelling packages and almost all of the model objects in the stats package that comes with base R. vignette("available-methods") lists method availabilty.

broom - Convert statistical analysis objects from R into tidy format

  •    R

The broom package takes the messy output of built-in functions in R, such as lm, nls, or t.test, and turns them into tidy data frames.

tidy-data - A paper on data tidying

  •    TeX

data/: raw datasets, the code to tidy them, and the results, as used in Section 3. Source individual .R files to recreate the tidied data. t-test.r: code used to generate Table 14 (model-1.tex and model-2.tex), comparing data needed for paired t-test vs. a mixed effects model.

infer - An R package for tidyverse-friendly statistical inference

  •    R

The objective of this package is to perform statistical inference using an expressive statistical grammar that coheres with the tidyverse design framework. To install the developmental version of infer, make sure to install remotes first. The pkgdown website for this developmental version is at https://infer.netlify.com.

tibble - A modern re-imagining of the data frame

  •    HTML

A tibble, or tbl_df, is a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not. Tibbles are data.frames that are lazy and surly: they do less (i.e. they don't change variable names or types, and don't do partial matching) and complain more (e.g. when a variable does not exist). This forces you to confront problems earlier, typically leading to cleaner, more expressive code. Tibbles also have an enhanced print() method which makes them easier to use with large datasets containing complex objects. If you are new to tibbles, the best place to start is the tibbles chapter in R for data science.

atom-beautify - :lipstick: Universal beautification package for Atom editor (:warning: Currently migrating to https://github

  •    CoffeeScript

Atom-Beautify respects the core.telemetryConsent configuration option from Atom editor. If you do not wish to have usage data sent to Google Analytics then please set core.telemetryConsent to no or undecided option before using Atom-Beautify. See Anonymous Analytics section of docs for details. Thank you. Atom-Beautify is going to be completely rewritten with Unibeautify at its core! See unibeautify branch for work in progress and Issue #1174.

janitor - simple tools for data cleaning in R

  •    R

Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets. janitor has simple functions for examining and cleaning dirty data. It was built with beginning and intermediate R users in mind and is optimized for user-friendliness. Advanced R users can already do everything covered here, but with janitor they can do it faster and save their thinking for the fun stuff.

readr - Read flat files (csv, tsv, fwf) into R

  •    R

The goal of readr is to provide a fast and friendly way to read rectangular data (like csv, tsv, and fwf). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes. If you are new to readr, the best place to start is the data import chapter in R for data science. To accurately read a rectangular dataset with readr you combine two pieces: a function that parses the overall file, and a column specification. The column specification describes how each column should be converted from a character vector to the most appropriate data type, and in most cases it's not necessary because readr will guess it for you automatically.

SQL-Server-R-Services-Samples - Advanced analytics samples and templates using SQL Server R Services

  •    R

In these examples, we will demonstrate how to develop and deploy end-to-end advanced analytics solutions with SQL Server 2016 R Services.Develop models in R IDE. SQL Server 2016 R services allows Data Scientists to develop solutions in an R IDE (such as RStudio, Visual Studio R Tools) with Open Source R or Microsoft R Server, using data residing in SQL Server, and computing done in-database.

statistics-for-data-scientists - Code and data associated with the book "Statistics for Data Scientists: 50 Essential Concepts"

  •    R

The scripts are stored by chapter and replicate most of the figures and code snippets. HOW TO GET THE DATA: Run R script: The data is not saved on github and you will need to download the data. You can do this in R using the sript src/download_data.r. This will copy the data into the data directory ~/statistics-for-data-scientists/data.