SARTools - Statistical Analysis of RNA-Seq Tools

  •        27

SARTools is a R package dedicated to the differential analysis of RNA-seq data. It provides tools to generate descriptive and diagnostic graphs, to run the differential analysis with one of the well known DESeq2 or edgeR packages and to export the results into easily readable tab-delimited files. It also facilitates the generation of a HTML report which displays all the figures produced, explains the statistical methods and gives the results of the differential analysis. Note that SARTools does not intend to replace DESeq2 or edgeR: it simply provides an environment to go with them. For more details about the methodology behind DESeq2 or edgeR, the user should read their documentations and papers. SARTools is distributed with two R script templates (template_script_DESeq2.r and template_script_edgeR.r) which use functions of the package. For a more fluid analysis and to avoid possible bugs when creating the final HTML report, the user is encouraged to use them rather than writing a new script. Two other scripts are available (template_script_DESeq2_CL.r and template_script_edgeR_CL.r) to run SARTools in a shell with the Rscript command. In that case, the optparse R package must be available to interpret the command line parameters.



Related Projects

rnaseq_tutorial - Informatics for RNA-seq: A web resource for analysis on the cloud

  •    R

An educational tutorial and working demonstration pipeline for RNA-seq analysis including an introduction to: cloud computing, next generation sequence file formats, reference genomes, gene annotation, expression analysis, differential expression analysis, alternative splicing analysis, data visualization, and interpretation. This repository is used to store code and certain raw materials for a detailed RNA-seq tutorial. To actually complete this tutorial, go to the RNA-seq tutorial wiki.

awesome-single-cell - List of software packages for single-cell data analysis, including RNA-seq, ATAC-seq, etc


List of software packages (and the people developing these methods) for single-cell data analysis, including RNA-seq, ATAC-seq, etc. Contributions welcome... Gender bias at conferences is a well known problem ( Creating a list of potential speakers can help mitigate this bias and a community of people developing and maintaining helps to further diversify this list beyond smaller networks.

bcbio-nextgen - Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

  •    Python

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your inputs and analysis parameters. This input drives a parallel run that handles distributed execution, idempotent processing restarts and safe transactional steps. bcbio provides a shared community resource that handles the data processing component of sequencing analysis, providing researchers with more time to focus on the downstream biology. producing an editable system configuration file referencing the installed software, data and system information.

RNA-seq-analysis - RNAseq analysis notes from Ming Tang

  •    Python

Normalization is essential for RNAseq analysis. However, one needs to understand the underlining assumptions for each methods. Most methods assume there is no global changes between conditions (e.g. TMM normalization). However, this may not be true when global effect occurs. For example, if you delete a gene that controls transcription, you expect to see global gene expression reduction. In that case, other normalization methods need to be considered. (e.g. spike-in controls). The same principle applies to other high-throughput sequencing data such as ChIPseq. To estimate the library size, simply taking the total number of (mapped or unmapped) reads is, in our experience, not a good idea. Sometimes, a few very strongly expressed genes are differentially expressed, and as they make up a good part of the total counts, they skew this number. After you divide by total counts, these few strongly expressed genes become equal, and the whole rest looks differentially expressed.

Trinity RNA-Seq Assembly

  •    Java

The Trinity RNA-Seq Assembly project provides software solutions targeted to the reconstruction of full-length transcripts and alternatively spliced isoforms from Illumina RNA-Seq data.

trinityrnaseq - Trinity RNA-Seq de novo transcriptome assembly

  •    Perl

Trinity RNA-Seq de novo transcriptome assembly

kallisto - Near-optimal RNA-Seq quantification

  •    C

NL Bray, H Pimentel, P Melsted and L Pachter, Near optimal probabilistic RNA-seq quantification, Nature Biotechnology 34, p 525--527 (2016). Scripts reproducing all the results of the paper are available here.

orgpaper - Reproducible Research Papers using Org-mode and R: A Guide

  •    Emacs

This guide introduces an open-source toolkit for writing research papers and monographs. The main features of this toolkit centered around Emacs and Org-mode are: 1) embedding R code in the document that allows for statistical results to be revised and reproduced, 2) bibliographic citations from an integrated database, 3) formatting using well defined styles with minimal markup, 4) support for production of final output as pdf, odt, docx, html and many other formats.

ReScience - The ReScience journal. Reproducible Science is Good. Replicated Science is better.


ReScience is a peer-reviewed journal that targets computational research and encourages the explicit replication of already published research promoting new and open-source implementations in order to ensure the original research is reproducible. To achieve such a goal, the whole editing chain is radically different from any other traditional scientific journal. ReScience lives on github where each new implementation is made available together with the comments, explanations and tests. Each submission takes the form of a pull request that is publicly reviewed and tested in order to guarantee any researcher can re-use it. If you ever replicated computational result from the literature, ReScience is the perfect place to publish this new implementation. Reproducible Science is Good. Replicated Science is better.

datascience-box - Data Science Course in a Box

  •    HTML

This introductory data science course that is our (working) answer to these questions. The courses focuses on data acquisition and wrangling, exploratory data analysis, data visualization, and effective communication and approaching statistics from a model-based, instead of an inference-based, perspective. A heavy emphasis is placed on a consitent syntax (with tools from the tidyverse), reproducibility (with R Markdown) and version control and collaboration (with git/GitHub). We help ease the learning curve by avoiding local installation and supplementing out-of-class learning with interactive tools (like learnr tutorials). By the end of the semester teams of students work on fully reproducible data analysis projects on data they acquired, answering questions they care about. This repository serves as a "data science course in a box" containing all materials required to teach (or learn from) the course described above.



Applications for analyzing next generation sequencing data from Illumina, SOLiD, and 454 platforms. ChIP-seq, RNA-seq, Bis-seq, re-sequencing, SNP INDELs, capture array design tools, IGB/ DAS2 graph manipulation tools.... GUI and cmd line interface.

Awesome-Bioinformatics - A curated list of awesome Bioinformatics libraries and software.


Sequence Processing includes tasks such as demultiplexing raw read data, and trimming low quality bases. The following tools can be used to visualize genomic data or for constructing customized visualizations of genomic data including sequence data from DNA-Seq, RNA-Seq, and ChIP-Seq, variants, and more.

DifferentialEquations.jl - Julia suite for high-performance solvers of differential equations

  •    Julia

The well-optimized DifferentialEquations solvers benchmark as the some of the fastest implementations, using classic algorithms and ones from recent research which routinely outperform the "standard" C/Fortran methods, and include algorithms optimized for high-precision and HPC applications. At the same time, it wraps the classic C/Fortran methods, making it easy to switch over to them whenever necessary. It integrates with the Julia package sphere, for example using Juno's progress meter, automatic plotting, built-in interpolations, and wraps other differential equation solvers so that many different methods for solving the equations can be accessed by simply switching a keyword argument. It utilizes Julia's generality to be able to solve problems specified with arbitrary number types (types with units like Unitful, and arbitrary precision numbers like BigFloats and ArbFloats), arbitrary sized arrays (ODEs on matrices), and more. This gives a powerful mixture of speed and productivity features to help you solve and analyze your differential equations faster. For information on using the package, see the stable documentation. Use the latest documentation for the version of the documentation which contains the un-released features.

fma - FMA: A Dataset For Music Analysis

  •    Jupyter

Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2. The dataset is a dump of the Free Music Archive (FMA), an interactive library of high-quality, legal audio downloads. Below the abstract from the paper.

slidify - Generate reproducible html5 slides from R markdown

  •    R

Generate reproducible html5 slides from R markdown

quanteda - An R package for the Quantitative Analysis of Textual Data

  •    R

An R package for managing and analyzing text, created by Kenneth Benoit. Supported by the European Research Council grant ERC-2011-StG 283794-QUANTESS. For more details, see

ITK - Insight Segmentation and Registration Toolkit -- Mirror

  •    C++

The National Library of Medicine Insight Segmentation and Registration Toolkit (ITK), or Insight Toolkit, is an open-source, cross-platform C++ toolkit for segmentation and registration. Segmentation is the process of identifying and classifying data found in a digitally sampled representation. Typically the sampled representation is an image acquired from such medical instrumentation as CT or MRI scanners. Registration is the task of aligning or developing correspondences between data. For example, in the medical environment, a CT scan may be aligned with a MRI scan in order to combine the information contained in both. The toolkit may be built from source using CMake.

knowledge-repo - A next-generation curated knowledge sharing platform for data scientists and other technical professions

  •    Python

The Knowledge Repository project is focused on facilitating the sharing of knowledge between data scientists and other technical roles using data formats and tools that make sense in these professions. It provides various data stores (and utilities to manage them) for "knowledge posts", with a particular focus on notebooks (R Markdown and Jupyter / IPython Notebook) to better promote reproducible research.Check out this Medium Post for the inspiration for the project.

nextflow - A DSL for data-driven computational pipelines

  •    Groovy

With the rise of big data, techniques to analyse and run experiments on large datasets are increasingly necessary. Parallelization and distributed computing are the best ways to tackle this kind of problem, but the tools commonly available to the bioinformatics community traditionally lack good support for these techniques, or provide a model that fits badly with the specific requirements in the bioinformatics domain and, most of the time, require the knowledge of complex tools or low-level APIs.