Displaying 1 to 20 from 58 results

nextflow - A DSL for data-driven computational pipelines

  •    Groovy

With the rise of big data, techniques to analyse and run experiments on large datasets are increasingly necessary. Parallelization and distributed computing are the best ways to tackle this kind of problem, but the tools commonly available to the bioinformatics community traditionally lack good support for these techniques, or provide a model that fits badly with the specific requirements in the bioinformatics domain and, most of the time, require the knowledge of complex tools or low-level APIs.

fma - FMA: A Dataset For Music Analysis

  •    Jupyter

Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2. The dataset is a dump of the Free Music Archive (FMA), an interactive library of high-quality, legal audio downloads. Below the abstract from the paper.

labnotebook - LabNotebook is a tool that allows you to flexibly monitor, record, save, and query all your machine learning experiments

  •    Jupyter

All you need to do is to modify your code to include labnotebook.start_experiment() and labnotebook.stop_experiment() and pass the info you would like to save to the database as arguments. As an option, you can save information for each training step by using labnotebook.step_experiment(). You can see a very simple example notebook here.




ITK - Insight Segmentation and Registration Toolkit -- Mirror

  •    C++

The National Library of Medicine Insight Segmentation and Registration Toolkit (ITK), or Insight Toolkit, is an open-source, cross-platform C++ toolkit for segmentation and registration. Segmentation is the process of identifying and classifying data found in a digitally sampled representation. Typically the sampled representation is an image acquired from such medical instrumentation as CT or MRI scanners. Registration is the task of aligning or developing correspondences between data. For example, in the medical environment, a CT scan may be aligned with a MRI scan in order to combine the information contained in both. The toolkit may be built from source using CMake.

VTKPythonPackage - A setup script to generate VTK Python Wheels

  •    Python

This project provides a setup.py script to build VTK Python wheels. VTK is an open-source, cross-platform system that provides developers with an extensive suite of software tools for 3D computer graphics, image processing, and visualization.

drake - An R-focused pipeline toolkit for reproducibility and high-performance computing

  •    HTML

The drake R package is a workflow manager and computational engine for data science projects. Its primary objective is to keep results up to date with the underlying code and data. When it runs a project, drake detects any pre-existing output and refreshes the pieces that are outdated or missing. Not every runthrough starts from scratch, and the final answers are reproducible. With a user-friendly R-focused interface, comprehensive documentation, and extensive implicit parallel computing support, drake surpasses the analogous functionality in similar tools such as Make, remake, memoise, and knitr. The R community emphasizes reproducibility. Traditional themes include scientific replicability, literate programming with knitr, and version control with git. But internal consistency is important too. Reproducibility carries the promise that your output matches the code and data you say you used.

liftr - Containerize R Markdown Documents for Continuous Reproducibility

  •    R

liftr aims to solve the problem of persistent reproducible reporting. To achieve this goal, it extends the R Markdown metadata format, and uses Docker to containerize and render R Markdown documents. Browse the vignettes or the demo video for a quick-start.


Sarek - Detect germline or somatic variants from normal or tumour/normal whole-genome sequencing data

  •    Nextflow

Previously known as the Cancer Analysis Workflow (CAW), Sarek is a workflow tool designed to run analyses on WGS data from regular samples or tumour / normal pairs, including relapse samples if required. It's built using Nextflow, a bioinformatics domain specific language for workflow building. Software dependencies are handled using Docker or Singularity - container technologies that provide excellent reproducibility and ease of use. Singularity has been designed specifically for high-performance computing environments. This means that although Sarek has been primarily designed for use with the Swedish UPPMAX HPC systems, it should be able to run on any system that supports these two tools.

pander - An R Pandoc Writer

  •    R

The main aim of the pander R package is to provide a minimal and easy tool for rendering R objects into Pandoc's markdown. The package is also capable of exporting/converting complex Pandoc documents (reports) in various ways. Regarding the difference between pander and other packages for exporting R objects to different file formats, please refer to this section. Current build and test coverage status: .

itk-jupyter-widgets - Interactive Jupyter widgets to visualize images in 2D and 3D

  •    Python

Interactive Jupyter widgets to visualize images in 2D and 3D. These widgets are designed to support image analysis with the Insight Toolkit (ITK), but they also work with other spatial analysis tools in the scientific Python ecosystem.

ITKPythonPackage - A setup script to generate ITK Python Wheels

  •    Shell

This project provides a setup.py script to build ITK Python wheels and infrastructure to build ITK external module Python wheels. ITK is an open-source, cross-platform system that provides developers with an extensive suite of software tools for image analysis.

CD4-csaw - Reproducible reanalysis of a combined ChIP-Seq & RNA-Seq data set

  •    R

This is the code for a re-analysis of a GEO dataset that I originally analyzed for this paper using statistical methods that were not yet available at the time, such as the csaw Bioconductor package, which provides a principled way to normalize windowed counts of ChIP-Seq reads and test them for differential binding. The original paper only analyzed binding within pre-defined promoter regions. In addition, some improvements have also been made to the RNA-seq analysis using newer features of limma such as quality weights. This workflow downloads the sequence data and sample metadata from the public GEO/SRA release, so anyone can download and run this code to reproduce the full analysis.

workshops - Brain imaging workshops

  •    Jupyter

This project contains directories with scripts used at various workshops.

researchcompendium - NOTE: This repo is archived

  •    R

This repository is our research compendium for our analysis of xxxx. The compendium contains all data, code, and text associated with the publication. The Rmd files in the analysis/paper/ directory contain details of how all the analyses reported in the paper were conducted, as well as instructions on how to rerun the analysis to reproduce the results. The data/ directory in the analysis/ directory contains all the raw data. This repository is organized as an R package. There are no actual R functions in this package - all the R code is in the Rmd file. I simply used the R package structure to help manage dependencies, to take advantage of continuous integration for automated code testing, and so I didn't have to think too much about how to organise the files.

rrtools - rrtools: Tools for Writing Reproducible Research in R

  •    R

The goal of rrtools is to provide instructions, templates, and functions for making a basic compendium suitable for writing reproducible research with R. This package documents the key steps and provides convenient functions for quickly creating a new research compendium. The approach is based generally on Kitzes et al. (2017), and more specifically on Marwick (2017), Marwick et al. (2017), and Wickham’s (2017) work using the R package structure as the basis for a research compendium. rrtools provides a template for doing scholarly writing in a literate programming environment using R Markdown and bookdown. It also allows for isolation of your computational environment using Docker, package versioning using MRAN, and continuous integration using Travis. It makes a convenient starting point for writing a journal article, report, or thesis.

DIME-LaTeX-Templates - DIME's LaTeX templates and LaTeX exercises teaching anyone new to LaTeX how to use LaTeX and how to use DIME's templates

  •    TeX

This repository contains resources that will help you make your research more reproducible. This will save you a substantial amount of time, significantly reduce the risk for human errors when exporting results to your papers, and make your research more transparent. We have prepared exercises that will make it easy for you to start using these resources. All code is MIT licensed, and the text content is CC-BY. Please feel free to send edits and updates via Pull Requests.

prodigenr - Project directory generator R package

  •    R

This R package is part of a series of (planned) packages that are aimed at creating a toolkit for doing reproducible and open science. Many researchers (especially in biomedicine, medicine, or health, which is my area of research) have little to no knowledge on what open science is or what reproducibility is, let alone how to do it. My goal is create an (opinionated) toolkit to automate and simplify the process of doing open and reproducible science. This specific package is a project directory generator (prodigenr). It will create a standardized project folder structure with the necessary template files for managing and analyzing data and for creating common scientific output (posters, slides, abstracts, manuscripts). Because of the standardized structure and because of the focus on a "one project, one scientific output", this allows the final code and documents to be fairly modular, self-contained, easy to share and make public... and be as reproducible as possible. This folder structure also makes use of the existing and established applications and workflows (RStudio, devtools, and usethis). This package aims to make it easier to adhere to open scientific practices by following a standard, consistent, and established folder and file structure for data analysis projects.