Displaying 1 to 20 from 25 results

tidy-text-mining - Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson

  •    TeX

This is a draft of the book Text Mining with R: A Tidy Approach, by Julia Silge and David Robinson. Please note that this work is being written under a Contributor Code of Conduct and released under a CC-BY-NC-SA license. By participating in this project (for example, by submitting a pull request with suggestions or edits) you agree to abide by its terms.

tidyverse - Easily install and load packages from the tidyverse

  •    R

The tidyverse is a set of packages that work in harmony because they share common data representations and API design. The tidyverse package is designed to make it easy to install and load core packages from the tidyverse in a single command. If you’d like to learn how to use the tidyverse effectively, the best place to start is R for data science.

tidytext - Text mining using dplyr, ggplot2, and other tidy tools :sparkles::page_facing_up::sparkles::page_facing_up::sparkles:

  •    R

Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like dplyr, broom, tidyr and ggplot2. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. Check out our book to learn more about text mining using tidy data principles. This function uses the tokenizers package to separate each line into words. The default tokenizing is for words, but other options include characters, n-grams, sentences, lines, paragraphs, or separation around a regex pattern.

forcats - 🐈🐈🐈🐈: tools for working with categorical variables (factors)

  •    R

R uses factors to handle categorical variables, variables that have a fixed and known set of possible values. Historically, factors were much easier to work with than character vectors, so many base R functions automatically convert character vectors to factors. (For historical context, I recommend stringsAsFactors: An unauthorized biography by Roger Peng, and stringsAsFactors = <sigh> by Thomas Lumley. If you want to learn more about other approaches to working with factors and categorical data, I recommend Wrangling categorical data in R, by Amelia McNamara and Nicholas Horton.) These days, making factors automatically is no longer so helpful, so packages in the tidyverse never create them automatically. However, factors are still useful when you have true categorical data, and when you want to override the ordering of character vectors to improve display. The goal of the forcats package is to provide a suite of useful tools that solve common problems with factors. If you’re not familiar with strings, the best place to start is the chapter on factors in R for Data Science.




tidyquant - Bringing financial analysis to the tidyverse

  •    R

tidyquant integrates the best resources for collecting and analyzing financial data, zoo, xts, quantmod, TTR, and PerformanceAnalytics, with the tidy data infrastructure of the tidyverse allowing for seamless interaction between each. You can now perform complete financial analyses in the tidyverse. Our short introduction to tidyquant on YouTube.

r4ds-exercise-solutions - Solutions to the exercises in "R for Data Science"

  •    R

This repository contains the code and text behind the Solutions for R for Data Science, which, as its name suggests, has solutions to the the exercises in R for Data Science by Garrett Grolemund and Hadley Wickham.

resamplr - R package cross-validation, bootstrap, permutation, and rolling window resampling techniques for the tidyverse

  •    R

The resamplr package provides functions that implement resampling methods including the bootstrap, jackknife, random test/train sets, k-fold cross-validation, leave-one-out and leave-p-out cross-validation, time-series cross validation, time-series k-fold cross validation, permutations, rolling windows. These functions generate data frames with resample objects that work with the modelling pipeline of modelr and the tidyverse. The resamplr package includes functions to generate data frames of lazy resample objects, as introduced in the tidyverse modelr package. The resample class stores the a "pointer" to the original dataset and a vector of row indices. The object can be coerced to a dataframe with as.data.frame and the row indices with as.integer.

tidyverse - Introduction à R et au tidyverse

  •    CSS

Le document est généré grâce à l'excellente extension bookdown de Yihui Xie. Il est mis à disposition selon les termes de la Licence Creative Commons Attribution - Pas d’Utilisation Commerciale - Partage dans les Mêmes Conditions 4.0 International. Toutes suggestions et corrections sont les bienvenues.


osqueryr - ⁇ 'osquery' 'DBI' and 'dbplyr' Interface for R

  •    R

But, so far it seems to work pretty well. NOTE: You need to install osquery for this to work.

decryptr - An extensible API for breaking captchas

  •    R

decryptr is an R package to break captchas. It is also an extensible tool built in a way that enables anyone to contribute with their own captcha-breaking code. Simple, right? The decrypt() funcion is this package's workhorse: it is able to take a captcha (either the path to a captcha file or a captcha object read with read_captcha()) and break it with a model (either the name of a known model, the path to a model file or a model object created with train_model()).

tidygraph - A tidy API for graph manipulation

  •    R

This package provides a tidy API for graph/network manipulation. While network data itself is not tidy, it can be envisioned as two tidy tables, one for node data and one for edge data. tidygraph provides a way to switch between the two tables and provides dplyr verbs for manipulating them. Furthermore it provides access to a lot of graph algorithms with return values that facilitate their use in a tidy workflow. tidygraph is a huge package that exports 280 different functions and methods. It more or less wraps the full functionality of igraph in a tidy API giving you access to almost all of the dplyr verbs plus a few more, developed for use with relational data.

influxdbr - R Interface for InfluxDB

  •    R

This package allows you to fetch and write time series data from/to an InfluxDB server. Additionally, handy wrappers for the Influx Query Language (IQL) to manage and explore a remote database are provided. This is a basic example which shows you how to communicate (i.e. query and write data) with the InfluxDB server.

sweep - Extending broom for time series forecasting

  •    R

The sweep package extends the broom tools (tidy, glance, and augment) for performing forecasts and time series analysis in the "tidyverse". The package is geared towards "tidying" the forecast workflow used with Rob Hyndman's forecast package. model tidiers: sw_tidy, sw_glance, sw_augment, sw_tidy_decomp functions extend tidy, glance, and augment from the broom package specifically for models (ets(), Arima(), bats(), etc) used for forecasting.

timetk - A toolkit for working with time series in R

  •    R

An example of the forecasting capabilities as shown in vignette TK03 - Forecasting Using a Time Series Signature with timetk. Get an index: tk_index returns the time series index of time series objects, models. The argument timetk_idx can be used to return a special timetk "index" attribute for regularized ts objects that returns a non-regularized date / date-time index if present.

tidyweek - Repo dedicated to #tidyweek & Mentorship pilot

  •    HTML

A repo dedicated to a #makeovermonday style weekly projects as result of collaboration between learners and mentors @ R4DS community. It is a framework and theory by Hadley Wickham that has grown from its humble origins and expanded to data, tools and workflows for working in data science.

website - Public repository for the R4DS community website.

  •    CSS

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us. Pull requests shall be submitted with the target branch develop.

data-science-toolkit - Collection of stats, modeling, and data science tools in Python and R.

  •    Jupyter

Welcome! The purpose of this repository is to serve as stockpile of statistical methods, modeling techniques, and data science tools. The content itself includes everything from educational vignettes on specific topics to tailored functions built to enhance and optimize analyses. This is and will remain a work in progress, and I welcome all contributions and constructive criticism. If you have a suggestion or request, please make use of the "Issues" tab and I will respond expeditiously. All are welcome and encouraged to contribute to this repository. My only request is that you include a detailed description of your contribution, that your code be thoroughly-commented, and that you test your contribution locally with the most recent version of the Master branch integrated prior to submitting the PR.