sourmash - Compute and compare MinHash signatures for DNA data sets.

  •        9

Compute MinHash signatures for nucleotide (DNA/RNA) and protein sequences. Sourmash is published on JOSS.

http://sourmash.readthedocs.io/en/latest/
https://github.com/dib-lab/sourmash

Tags
Implementation
License
Platform

   




Related Projects

khmer - In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more

  •    Python

The official source code repository is at https://github.com/dib-lab/khmer and project documentation is available online at http://khmer.readthedocs.io. See http://khmer.readthedocs.io/en/stable/introduction.html for an overview of the khmer project. khmer is research software, so you should cite us when you use it in scientific publications! Please see the CITATION file for citation information.

datasketch - MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++

  •    Python

datasketch gives you probabilistic data structures that can process and search very large amount of data super fast, with little loss of accuracy. datasketch must be used with Python 2.7 or above and NumPy 1.11 or above. Scipy is optional, but with it the LSH initialization can be much faster.

STJUDE-SRM

  •    Java

STJUDE-SRM is a laboratory management system designed to support shared resource facility (or core lab) activities. It was designed and developed by the Hartwell Center for Bioinformatics and Biotechnology at St. Jude Children's Research Hospital.

BOW - Bioinformatics On Windows

  •    

A group of tools run on Windows for Bioinformatics. Include ported tools from Linux (e.g. BWA, SAMTOOLS), and later original Windows applications.


vuong-mediapp: Multimedia BioInformatics

  •    Java

Multimedia, Medicine Computing and BioInformatics --- This Project is a collection of several subprojects for Solutions in Multimedia, Medicine Computing and BioInformatics focus on video-,EEG- amp; Multichanels-signals developped in Web 20, J2EE.

Bio4j - Bioinformatics Graph based DB

  •    Java

Bio4j is a bioinformatics graph based DB including most data available in Uniprot KB (SwissProt + Trembl), Gene Ontology (GO), UniRef (50,90,100), RefSeq, NCBI Taxonomy, and Expasy Enzyme DB. Bio4j provides a completely new and powerful framework for protein related information querying and management. Since it relies on a high-performance graph engine, data is stored in a way that semantically represents its own structure.

Awesome-Bioinformatics - A curated list of awesome Bioinformatics libraries and software.

  •    

Sequence Processing includes tasks such as demultiplexing raw read data, and trimming low quality bases. The following tools can be used to visualize genomic data or for constructing customized visualizations of genomic data including sequence data from DNA-Seq, RNA-Seq, and ChIP-Seq, variants, and more.

nextflow - A DSL for data-driven computational pipelines

  •    Groovy

With the rise of big data, techniques to analyse and run experiments on large datasets are increasingly necessary. Parallelization and distributed computing are the best ways to tackle this kind of problem, but the tools commonly available to the bioinformatics community traditionally lack good support for these techniques, or provide a model that fits badly with the specific requirements in the bioinformatics domain and, most of the time, require the knowledge of complex tools or low-level APIs.

bioinformatics - :microscope: Path to a free self-taught education in Bioinformatics!

  •    

This is a solid path for those of you who want to complete a Bioinformatics course on your own time, for free, with courses from the best universities in the World. In our curriculum, we give preference to MOOC (Massive Open Online Course) style courses because these courses were created with our style of learning in mind.

MultiQC - Aggregate results from bioinformatics analyses across many samples into a single report.

  •    Python

MultiQC is a tool to create a single report with interactive plots for multiple bioinformatics analyses across many samples. MultiQC is written in Python (tested with v2.7, 3.4, 3.5 and 3.6). It is available on the Python Package Index and through conda using Bioconda.

Bioinformatics-Training - Bioinformatics training resources

  •    R

This is a resource for learning more about PacBio data and bioinformatics analysis. THIS WEBSITE AND CONTENT AND ALL SITE-RELATED SERVICES, INCLUDING ANY DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THIS SITE, ALL SITE-RELATED SERVICES, AND ANY THIRD PARTY WEBSITES OR APPLICATIONS. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACIFIC BIOSCIENCES.

csvtk - A cross-platform, efficient and practical CSV/TSV toolkit in Golang

  •    Go

Similar to FASTA/Q format in field of Bioinformatics, CSV/TSV formats are basic and ubiquitous file formats in both Bioinformatics and data sicence. People usually use spreadsheet softwares like MS Excel to do process table data. However it's all by clicking and typing, which is not automatically and time-consuming to repeat, especially when we want to apply similar operations with different datasets or purposes.

dash-bio - Open-source bioinformatics components for Dash

  •    Python

Dash Bio is a suite of bioinformatics components built to work with Dash. Learn more about Dash at https://plotly.com/products/dash/.

patternlab-php - The PHP version of Pattern Lab

  •    PHP

The Pattern Lab Standard Edition for Mustache is the evolution of Pattern Lab 1. Pattern Lab is still, at its core, a prototyping tool focused on encouraging communication between content creators, designers, devs, and clients. It combines platform-agnostic assets, like the Mustache-based patterns, with a PHP-based "builder." Pattern Lab 2 introduces the beginnings of an ecosystem that will allow teams to mix, match and extend Pattern Lab to meet their specific needs. It will also make it easier for the Pattern Lab team to push out new features. Pattern Lab Standard Edition for Mustache is just one of the four PHP-based Editions currently available. You can play with a demo of the front-end of Pattern Lab at demo.patternlab.io.

TFastDIB

  •    Pascal

TFastDIB consists of a DIB class and units with optimized routines for common graphics tasks like resampling, blending, quantizing amp; dithering, color space conversions, bit depth conversions, rendering primitives, and a collection of filters and effects.

patternlab-node - The Node version of Pattern Lab

  •    Javascript

This monorepo contains the core of Pattern Lab / Node and all related engines, UI kits, plugins and utilities. Pattern Lab helps you and your team build thoughtful, pattern-driven user interfaces using atomic design principles. If you'd like to see what a front-end project built with Pattern Lab looks like, check out this online demo of Pattern Lab output.

SetSimilaritySearch - All-pair set similarity search on millions of sets in Python and on a laptop (faster than MinHash LSH)

  •    Python

Efficient set similarity search algorithms in Python. For even better performance see the Go Implementation. A popular way to measure the similarity between two sets is Jaccard similarity, which gives a fractional score between 0 and 1.0.

Mash - Fast genome and metagenome distance estimation using MinHash

  •    C++

Mash is normally distributed as a dependency-free binary for Linux or OSX (see https://github.com/marbl/Mash/releases). This source distribution is intended for other operating systems or for development. Mash requires c++14 to build, which is available in and GCC >= 5 and XCode >= 6. See http://mash.readthedocs.org for more information.

DetectionLab - Vagrant & Packer scripts to build a lab environment complete with security tooling and logging best practices

  •    HTML

This lab has been designed with defenders in mind. Its primary purpose is to allow the user to quickly build a Windows domain that comes pre-loaded with security tooling and some best practices when it comes to system logging configurations. It can easily be modified to fit most needs or expanded to include additional hosts.NOTE: This lab has not been hardened in any way and runs with default vagrant credentials. Please do not connect or bridge it to any networks you care about. This lab is deliberately designed to be insecure; the primary purpose of it is to provide visibility and introspection into each host.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.