Displaying 1 to 15 from 15 results

hail - Scalable genomic data analysis.

  •    Scala

Hail is an open-source, scalable framework for exploring and analyzing genomic data. The Hail project began in Fall 2015 to empower the worldwide genetics community to harness the flood of genomes to discover the biology of human disease. Since then, Hail has expanded to enable analysis of large-scale datasets beyond the field of genomics.

bionode - Modular and universal bioinformatics

  •    Javascript

To use bionode as a command line tool, you can install it globally with -g. Or, if you want to use it as a JavaScript library, you need to install it in your local project folder inside the node_modules directory by doing the same command without -g.

gggenes - ➡️️➡️️⬅️️➡️️ Draw gene arrow maps in ggplot2

  •    R

geom_gene_arrow is a ‘ggplot2’ geom that represents genes with arrows. The start and end locations of the genes within their molecule(s) are mapped to the xmin and xmax aesthetics respectively. These start and end locations are used to determine the directions in which the arrows point. The y aesthetic must be mapped to the molecule(s). If you are drawing more than one molecule, and the numerical locations of the genes are not similar across molecules, you almost certainly want to facet the plot with scales = "free" to avoid drawing ridiculously large molecules with ridiculously tiny genes. Because the resulting plot can look cluttered, a ‘ggplot2’ theme theme_genes is provided with some sensible defaults.

locuszoom - A Javascript/d3 embeddable plugin for interactively visualizing statistical genetic data from customizable sources

  •    Javascript

LocusZoom is a Javascript/d3 embeddable plugin for interactively visualizing statistical genetic data from customizable sources. See github.com/statgen/locuszoom/wiki for full documentation and API reference.

locuszoom-standalone - Create regional association plots from GWAS or meta-analysis

  •    Python

This repository is for the command line (standalone) version of LocusZoom, an application for creating regional plots from genome-wide association studies built in Python and R. This version of LocusZoom is no longer under active development. Bug fixes and small updates may be made, though it is unlikely.

rvtests - Rare variant test software for next generation sequencing data

  •    C++

Rvtests, which stands for Rare Variant tests, is a flexible software package for genetic association analysis for sequence datasets. Since its inception, rvtests was developed as a comprehensive tool to support genetic association analysis and meta-analysis. It can analyze both unrelated individual and related (family-based) individuals for both quantitative and binary outcomes. It includes a variety of association tests (e.g. single variant score test, burden test, variable threshold test, SKAT test, fast linear mixed model score test). It takes VCF/BGEN/PLINK format as genotype input file and takes PLINK format phenotype file and covariate file. With new implementation of the BOLT-LMM/MINQUE algorithm as well as a series of software engineering optimizations, our software package is capable of analyzing datasets of up to 1,000,000 individuals in linear mixed models on a computer workstation, which makes our tool one of the very few options for analyzing large biobank scale datasets, such as UK Biobank. RVTESTS supports both single variant and gene-level tests. It also allows for highly effcient generation of covariance matrices between score statistics in RAREMETAL format, which can be used to support the next wave of meta-analysis that incorporates large biobank datasets.

Structure_threader - A wrapper program to parallelize and automate runs of "Structure", "fastStructure" and "MavericK"

  •    Python

A program to parallelize the runs of Structure, fastStructure and MavericK software. Structure_threader is available on Pypi. It can be installed by simply running the above command. If you are on a *nix like platform, you can use the --user option if you can't or don't want to install the program as root user. Binaries for STRUCTURE, fastStructure and MavericK are also distributed for GNU/Linux and Mac OS X. For more details, please check the manual.

molgenis - MOLGENIS - for scientific data: management, exploration, integration and analysis.

  •    Java

MOLGENIS is a collaborative open source project on a mission to generate great software infrastructure for life science research

TriFusion - Streamlining phylogenomic data gathering, processing and visualization

  •    Python

TriFusion is a modern GUI and command line application designed to make the life of anyone with proteome and/or alignment sequence data easier and more pleasurable. Regardless of your experience in bioinformatics, TriFusion is easy to use and offers a wide array of powerfull features to help you deal with your data. At the same time, it was developed to handle the enormous amount of data that is generated nowadays. TriFusion is an open source, cross-platform application written in Python 2.7 and using the Kivy framework to build the graphical interface.

agi-bio - Genomic and Proteomic data exploration and pattern mining

  •    Scheme

Prototype project utilizing the OpenCog framework for genomic research. In particular it aims at experimenting with cognitive synergy between MOSES, PLN and other OpenCog components.

KEGG-Crawler - A parallel web crawler for the retrieval of KEGG genomics data.

  •    Python

KEGG Crawler is a Python script that uses KEGGs REST API to first attain a list of pathways, as well as their respective chemical reactions and metabolites. It utilizes its threading module to make the crawler process parallel, minimizing bandwidth latency issues. It utilizes 8 threads with stacks on each thread of the target crawls. It also utilizes the urllib2 module to initiate the crawls. The last module it uses is HTMLParser which comes from the Beautiful Soup library. This requires a pip install of beautiful soup (after pip is installed, this can be done through the cmd "pip install beautifulsoup"). The main crawler has a progress indicator and presents a message when each thread makes 50% progress and when it reaches completion. The script itself runs for approximately 20-30 minutes on a cable connection.

OSGenome - An Open Source Library and ToolKit of Genetic Data (SNPs) using 23AndMe and Data Crawling Technologies

  •    Python

OS Genome is an open source web application that allows users to gather the information they need to make sense of their own genome without needing to rely on outside services with unknown privacy policies. OS Genome's goal is to crawl various sources and give meaning to an individual's genome. It creates a Responsive Grid of the user's specific genome. This allows for everything from filtering to excel exporting. All of which using Flask, Kendo, and Python programming. SNP, pronounced “snip,” stands for single-nucleotide polymorphism, which represents a substitution of one base for another, e.g., C to T or A to G. SNP is the most common variation in the human genome and occurs approximately once every 100 to 300 bases. SNP is terminologically distinguished from mutation based on an arbitrary population frequency cutoff value: 1%, with SNP [greater than] 1% and mutation [less than] 1%. A key aspect of research in genetics is associating sequence variations with heritable phenotypes. Because SNPs are expected to facilitate large-scale association genetics studies, there has been an increasing interest in SNP discovery and detection.

VerifyBamID - A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method

  •    mupad

Motivation: Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis. Across many existing models, allele frequency usually is used to calculate prior genotype probability. Lack of accurate assignment of allele frequency could result in underestimation of contamination level. Hence we propose this ancestry-agnostic DNA contamination estimation method. Results: We applied our method to 1000 Genomes datasets by simulating contamination levels from 1% to 20% and comparing the contamination estimates obtain from different methods. When using pooled allele frequencies, as opposed to population-specific allele frequencies, we observed that the contamination levels are underestimated by 20%, 40%, 51%, and 73% for CEU, YRI, FIN, and CHS populations, respectively. Using our new method, the underestimation bias was reduced to 2-5%.

node-23andme-export - 👉 export your data from the 23andMe API for safe-keeping

  •    Javascript

A simple Express app to facilitate downloading of 23andMe data in CSV and JSON. Also provides a zip file generated in client-side JavaScript. Supports multiple 23andMe profiles per account. Licensed under the MIT license.

bionode-seq - Module for DNA, RNA and protein sequences manipulation

  •    Javascript

Alternatively, just include bionode-seq.min.js via a <script/> in your page. Please read the documentation for the methods exposed by bionode-seq.