vcf2tsv - Genomic VCF to tab-separated values

  •        70

Running vcf2tsv requires Python 3. It also requires that you have cyvcf2 and numpy installed. This can be achieved through the use of pip.

https://github.com/sigven/vcf2tsv

Tags
Implementation
License
Platform

   




Related Projects

vcflib - a simple C++ library for parsing and manipulating VCF files, + many command-line utilities

  •    C++

The Variant Call Format (VCF) is a flat-file, tab-delimited textual format intended to concisely describe reference-indexed variations between individuals. VCF provides a common interchange format for the description of variation in individuals and populations of samples, and has become the defacto standard reporting format for a wide array of genomic variant detectors. The API itself provides a quick and extremely permissive method to read and write VCF files. Extensions and applications of the library provided in the included utilities (*.cpp) comprise the vast bulk of the library's utility for most users.

freebayes - Bayesian haplotype-based genetic polymorphism discovery and genotyping.

  •    C++

FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment. FreeBayes uses short-read alignments (BAM files with Phred+33 encoded quality scores, now standard) for any number of individuals from a population and a reference genome (in FASTA format) to determine the most-likely combination of genotypes for the population at each position in the reference. It reports positions which it finds putatively polymorphic in variant call file (VCF) format. It can also use an input set of variants (VCF) as a source of prior information, and a copy number variant map (BED) to define non-uniform ploidy variation across the samples under analysis.

picard - A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF

  •    Java

A set of Java command line tools for manipulating high-throughput sequencing (HTS) data and formats. Picard is implemented using the HTSJDK Java library HTSJDK to support accessing file formats that are commonly used for high-throughput sequencing data such as SAM and VCF.

bioawk - BWK awk modified for biological data

  •    C

Bioawk is an extension to Brian Kernighan's awk, adding the support of several common biological data formats, including optionally gzip'ed BED, GFF, SAM, VCF, FASTA/Q and TAB-delimited formats with column names. It also adds a few built-in functions and an command line option to use TAB as the input/output delimiter. When the new functionality is not used, bioawk is intended to behave exactly the same as the original BWK awk. The original awk requires a YACC-compatible parser generator (e.g. Byacc or Bison). Bioawk further depends on zlib so as to work with gzip'd files.


gemini - a lightweight db framework for exploring genetic variation.

  •    Python

The intent of GEMINI (GEnome MINIing) is to provide a simple, flexible, and powerful framework for exploring genetic variation for personal and medical genetics. GEMINI is unique in that it integrates genetic variation (from VCF files) with a wealth of genome annotations into a unified database framework. Using this integrated database as the analysis framework, we aim to leverage the expressive power of SQL for data analysis, while attempting to overcome the fundamental challenges associated with using databases for very large (e.g. 1,000,000 variants times 1,000 samples yields one billion genotypes) datasets. In addition, by defining sample relationships with a PED file, GEMINI allows one to explore and test for variants that meet specific inheritance models (e.g., recessive, dominant, etc.). The following is a video of a high-level talk from SciPy 2013 describing GEMINI.

VCF Builder IDE

  •    C++

The VCF Builder is an advanced development tool for creating C++ applications, and supporting a wide number of plugins for enhancing it's functionality. While the VCF Builder is capable of creating generic C++ applications, it's forte is building GUI ap

Am-I-affected-by-Meltdown - Meltdown Exploit / Proof-of-concept / checks whether system is affected by Variant 3: rogue data cache load (CVE-2017-5754), a

  •    C++

Checks whether system is affected by Variant 3: rogue data cache load (CVE-2017-5754), a.k.a MELTDOWN. The basic idea is that user will know whether or not the running system is properly patched with something like KAISER patchset (https://lkml.org/lkml/2017/10/31/884) for example.

bcbio-nextgen - Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

  •    Python

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your inputs and analysis parameters. This input drives a parallel run that handles distributed execution, idempotent processing restarts and safe transactional steps. bcbio provides a shared community resource that handles the data processing component of sequencing analysis, providing researchers with more time to focus on the downstream biology. producing an editable system configuration file referencing the installed software, data and system information.

nucleus - Python and C++ code for reading and writing genomics data.

  •    Python

Nucleus is a library of Python and C++ code designed to make it easy to read, write and analyze data in common genomics file formats like SAM and VCF. In addition, Nucleus enables painless integration with the TensorFlow machine learning framework, as anywhere a genomics file is consumed or produced, a TensorFlow tfrecords file may be substituted. For all other systems, you will need to first install CLIF by following the instructions at https://github.com/google/clif#installation before running install.sh.

htslib - C library for high-throughput sequencing data formats

  •    C

HTSlib is an implementation of a unified C library for accessing common file formats, such as SAM, CRAM and VCF, used for high-throughput sequencing data, and is the core library used by samtools and bcftools. HTSlib only depends on zlib. It is known to be compatible with gcc, g++ and clang. HTSlib implements a generalized BAM index, with file extension .csi (coordinate-sorted index). The HTSlib file reader first looks for the new index and then for the old if the new index is absent.

hail - Scalable genomic data analysis.

  •    Scala

Hail is an open-source, scalable framework for exploring and analyzing genomic data. The Hail project began in Fall 2015 to empower the worldwide genetics community to harness the flood of genomes to discover the biology of human disease. Since then, Hail has expanded to enable analysis of large-scale datasets beyond the field of genomics.

VariantCodec

  •    Java

The variant-codec library provides a generic way to serialize objects. It uses a standardized structure, the variant, the encapsulate all data types.

Mach7 - Functional programming style pattern-matching library for C++

  •    C++

Pattern matching is an abstraction mechanism that can greatly simplify source code. Commonly, pattern matching is built into a language to provide better syntax, faster code, correctness guarantees and improved diagnostics. Mach7 is a library solution to pattern matching in C++ that maintains many of these features. All the patterns in Mach7 are user-definable, can be stored in variables, passed among functions, and allow the use of open class hierarchies. Next example demonstrates that the library can deal efficiently and in a type-safe manner with non-polymorphic classes like boost::variant as well.

Vc - SIMD Vector Classes for C++

  •    C++

Recent generations of CPUs, and GPUs in particular, require data-parallel codes for full efficiency. Data parallelism requires that the same sequence of operations is applied to different input data. CPUs and GPUs can thus reduce the necessary hardware for instruction decoding and scheduling in favor of more arithmetic and logic units, which execute the same instructions synchronously. On CPU architectures this is implemented via SIMD registers and instructions. A single SIMD register can store N values and a single SIMD instruction can execute N operations on those values. On GPU architectures N threads run in perfect sync, fed by a single instruction decoder/scheduler. Each thread has local memory and a given index to calculate the offsets in memory for loads and stores. Current C++ compilers can do automatic transformation of scalar codes to SIMD instructions (auto-vectorization). However, the compiler must reconstruct an intrinsic property of the algorithm that was lost when the developer wrote a purely scalar implementation in C++. Consequently, C++ compilers cannot vectorize any given code to its most efficient data-parallel variant. Especially larger data-parallel loops, spanning over multiple functions or even translation units, will often not be transformed into efficient SIMD code.

OpenCC - A project for conversion between Traditional and Simplified Chinese

  •    C++

Open Chinese Convert (OpenCC, 開放中文轉換) is an opensource project for conversion between Traditional Chinese and Simplified Chinese, supporting character-level conversion, phrase-level conversion, variant conversion and regional idioms among Mainland China, Taiwan and Hong kong.

VCF Viewer

  •    

Shows you all VCF files, that are inside chosen folder. You can view and read them, nothing more (at least now).

GnuAccounting

  •    Java

An open-source java accounting application that integrates OpenOffice, Apache Derby and HBCI/FinTS to create and manage invoices, credit memos, delivery notes, bills etc. Imports from kTimeTracker, Task Coach, VCF, Hibiscus, Moneyplex, Starmoney, exports to Winston, VCF, openTrans et. al.

t-digest - A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means

  •    Java

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means. The t-digest algorithm is also very parallel friendly making it useful in map-reduce and parallel streaming applications. The t-digest construction algorithm uses a variant of 1-dimensional k-means clustering to produce a data structure that is related to the Q-digest. This t-digest data structure can be used to estimate quantiles or compute other rank statistics. The advantage of the t-digest over the Q-digest is that the t-digest can handle floating point values while the Q-digest is limited to integers. With small changes, the t-digest can handle any values from any ordered set that has something akin to a mean. The accuracy of quantile estimates produced by t-digests can be orders of magnitude more accurate than those produced by Q-digests in spite of the fact that t-digests are more compact when stored on disk.