Displaying 1 to 20 from 100 results

deepvariant - DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data

  •    Python

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.DeepVariant is a suite of Python/C++ programs that run on any Unix-like operating system. For convenience the documentation refers to building and running DeepVariant on Google Cloud Platform, but the tools themselves can be built and run on any standard Linux computer, including on-premise machines. Note that DeepVariant currently requires Python 2.7 and does not yet work with Python 3.

deep-review - A collaboratively written review paper on deep learning, genomics, and precision medicine

  •    CSS

This repository is home to the Deep Review, a review article on deep learning in precision medicine. The Deep Review is collaboratively written on GitHub using a tool called Manubot (see below). The project operates on an open contribution model, welcoming contributions from anyone (see CONTRIBUTING.md or an existing example for more info). To see what's incoming, check the open pull requests. For project discussion and planning see the Issues. As of writing, we are aiming to publish an update of the deep review each year, with the next such release occurring in June 2019. We will continue to make project preprints available on bioRxiv, and aim to continue publishing the finished reviews in a peer-reviewed venue as well. Like the initial release, we are planning for an open and collaborative effort. Please see issue #810 to contribute to the discussion of future plans, and help decide how to best continue this project.

galaxy - Data intensive science for everyone.

  •    Python

You may wish to make changes from the default configuration. This can be done in the config/galaxy.ini file. Note that not all dependencies for the tools provided in the tool_conf.xml.sample are included. To install them please visit "Manage dependencies" in the admin interface.

gatk - Official code repository for GATK versions 4 and up

  •    Java

Please see the GATK website, where you can download a precompiled executable, read documentation, ask questions, and receive technical support. This repository contains the next generation of the Genome Analysis Toolkit (GATK). The contents of this repository are 100% open source and released under the BSD 3-Clause license (see LICENSE.TXT).




hail - Scalable genomic data analysis.

  •    Scala

Hail is an open-source, scalable framework for exploring and analyzing genomic data. The Hail project began in Fall 2015 to empower the worldwide genetics community to harness the flood of genomes to discover the biology of human disease. Since then, Hail has expanded to enable analysis of large-scale datasets beyond the field of genomics.

nucleus - Python and C++ code for reading and writing genomics data.

  •    Python

Nucleus is a library of Python and C++ code designed to make it easy to read, write and analyze data in common genomics file formats like SAM and VCF. In addition, Nucleus enables painless integration with the TensorFlow machine learning framework, as anywhere a genomics file is consumed or produced, a TensorFlow tfrecords file may be substituted. For all other systems, you will need to first install CLIF by following the instructions at https://github.com/google/clif#installation before running install.sh.

jbrowse - A modern genome browser built with JavaScript and HTML5.

  •    Javascript

To install jbrowse, most users should visit http://jbrowse.org/install and download a zip file such as JBrowse-1.13.0.zip. See instructions at https://jbrowse.org/code/latest-release/docs/tutorial/ for a tutorial on setting up a sample instance. Once you have an instance up and running, http://gmod.org/wiki/JBrowse_Configuration_Guide is the comprehensive reference guide to JBrowse configuration.

bionode - Modular and universal bioinformatics

  •    Javascript

To use bionode as a command line tool, you can install it globally with -g. Or, if you want to use it as a JavaScript library, you need to install it in your local project folder inside the node_modules directory by doing the same command without -g.


biomartr - Genomic Data Retrieval with R

  •    R

This package is born out of my own frustration to automate the genomic data retrieval process to create computationally reproducible scripts for large-scale genomics studies. Since I couldn't find easy-to-use and fully reproducible software libraries that would allow others and me to write transparent and easy to reproduce code, I sat down and tried to implement a framework that would enable anyone to automate the genomic data retrieval process. Personally, I strongly support and believe in reproducible research, and I truly hope that this package might be useful to others as well and that it helps to promote reproducible research in genomics studies.I happily welcome anyone who wishes to contribute to this project :) Just drop me an email.

ngless - NGLess: NGS with less work

  •    Haskell

Ngless is a domain-specific language for NGS (next-generation sequencing data) processing. Note: This is pre-release software, currently in beta (testing) It is stable enough to use, but there may still be some (minor) changes before an official release. For questions, you can also use the ngless mailing list.

ngCGH - Tools for producing pseudo-cgh of next-generation sequencing data

  •    Python

Next-generation sequencing of tumor/normal pairs provides a good opportunity to examine large-scale copy number variation in the tumor relative to the normal sample. In practice, this concept seems to extend even to exome-capture sequencing of pairs of tumor and normal. This library consists of a single script, ngCGH, that computes a pseudo-CGH using simple coverage counting on the tumor relative to the normal. I have chosen to use a fixed number of reads in the normal sample as the "windowing" approach. This has the advantage of producing copy number estimates that should have similar variance at each location. The algorithm will adaptively deal with inhomogeneities across the genome such as those associated with exome-capture technologies (to the extent that the capture was similar in both tumor and normal). The disadvantage is that the pseudo-probes will be at different locations for every "normal control" sample.

wdlRunR - Elastic, reproducible, and reusable genomic data science tools from R backed by cloud resources

  •    R

Follow development at github. This package leverages all the typical data munging and analysis capabilities of R and Bioconductor, but adds the ability to orchestrate nearly arbitrarily large and complex workflows described using WDL (that are portable and written outside of this package).

minimap2 - A versatile pairwise aligner for genomic and spliced nucleotide sequences

  •    C

Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database. Typical use cases include: (1) mapping PacBio or Oxford Nanopore genomic reads to the human genome; (2) finding overlaps between long reads with error rate up to ~15%; (3) splice-aware alignment of PacBio Iso-Seq or Nanopore cDNA or Direct RNA reads against a reference genome; (4) aligning Illumina single- or paired-end reads; (5) assembly-to-assembly alignment; (6) full-genome alignment between two closely related species with divergence below ~15%. For ~10kb noisy reads sequences, minimap2 is tens of times faster than mainstream long-read mappers such as BLASR, BWA-MEM, NGMLR and GMAP. It is more accurate on simulated long reads and produces biologically meaningful alignment ready for downstream analyses. For >100bp Illumina short reads, minimap2 is three times as fast as BWA-MEM and Bowtie2, and as accurate on simulated data. Detailed evaluations are available from the minimap2 preprint.

jvarkit - Java utilities for Bioinformatics

  •    Java

Each tool is compiled independently of each other. See the documentation for each tool at http://lindenb.github.io/jvarkit/. All the pages should include a paragraph titled 'Download and Compile' You shouldn't try to compile all the tools because some of them are not tested, deprecated, or just too specific to my lab.

ohmnet - OhmNet: Representation learning in multi-layer graphs

  •    Python

The OhmNet algorithm learns feature representations for nodes in any (un)directed, (un)weighted multi-layer network. Please check the project page for more details. Results are saved to output directory specified by the out_dir option.

bluegenes - A friendly next-generation interface for Genomic data discovery powered by InterMine

  •    Clojure

BlueGenes is designed to make searching and analysing genomic data easy. It's powered by InterMine web services, meaning that the data from nearly 30 InterMines worldwide can be accessed from the same familiar interface. If you wish to track pages hits, set up Google analytics for your domain, then add your google analytics id to your config.edn files (mentioned above) or environment variables. This is completely optional.

bio-pipeline - My collection of light bioinformatics analysis pipelines for specific tasks

  •    C

A collection of light bioinformatics analysis pipelines for specific tasks. Dive into specific folder to view more detailed usage on each application.

goatools - Python scripts to find enrichment of GO terms

  •    Python

Process over- and under-representation of certain GO terms, based on Fisher's exact test. With numerous multiple correction routines including locally implemented routines for Bonferroni, Sidak, Holm, and false discovery rate. Also included are multiple test corrections from statsmodels: FDR Benjamini/Hochberg, FDR Benjamini/Yekutieli, Holm-Sidak, Simes-Hochberg, Hommel, FDR 2-stage Benjamini-Hochberg, FDR 2-stage Benjamini-Krieger-Yekutieli, FDR adaptive Gavrilov-Benjamini-Sarkar, Bonferroni, Sidak, and Holm. Process the obo-formatted file from Gene Ontology website. The data structure is a directed acyclic graph (DAG) that allows easy traversal from leaf to root.