Informatics Research k-mer Tools

  •        0

Suite of tools for DNA sequence analysis - searching (EST, mRNA, sequencer reads); aligning (ESTs, mRNA, whole genome); and analysis (repeats, kmers).



Related Projects

Bioformatica - A bioinformatics toolkit written in Java.

The bioformatica toolkit provides ready-to-run sequence analysis and visualization programs and a Java programming library for developers. It is designed for simplicity, speed, efficiency, and extensibility. The first release will include library classes for efficient reading of sequences in FASTA format, and basic sequence annotation tools: base frequency, k-mer frequency, unusual motifs, prokaryotic gene-finding, etc. Status: this project is brand-new. There is code in the repository, but func

Frhit - Metagenome Fragment Recruitment at High Identity with Tolerance

FR-HIT is an efficient fragment recruitment program for next generation sequences against microbial reference genomes. It produces similar sensitivity of BLASTN, but runs at a 100 times higher speed. The algorithm of FR-HIT adopts a seeding heuristic strategy with overlapping k-mer hashing to locate candidate matching blocks on the reference sequences, and then apply an effective filtering within the candidate blocks to filter out blocks that do not meet the minimum criteria for containing an al

Seqdiverse - Measuring the positional diversity of Next-Generation Sequencing Reads

Determining the quality of FASTQ sequences is an important step in the next generation sequencing analysis pipeline. Sequencing artifacts such as adapter reading, not-so-random priming, barcoded sequences, and PCR biases can affect one's ability to discern the true nature of the data. Pre-alignment steps such as trimming may need to be done to improve the sensitivity of alignment. SeqDiverse is a program that can display diversity measures for sequencing data that can help identify these artifac

Nephele - Nephele - Genotyping via Complete Composition Vectors and Mapreduce

The classic genotyping approach has been based on phylogenetic analysis, starting with a multiple sequence alignment. Genotypes are then established by expert examination of phylogenetic trees. However, such methods are suboptimal for a rapidly growing dataset, because they require significant human effort, and because they increase in computational complexity quickly with the number of sequences. This project uses a method for genotyping that does not depend on multiple sequence alignment. It u

Biokanga - Suite of high performance bioinformatics applications

BioKangaA suite of high performance bioinformatics applications targeting the challenges of next generation sequencing analytics. Kanga is an acronym standing for 'K-mer Adaptive Next Generation Aligner' and is the primary application. The latest release (1.11.0) contains many performance enhancements and now includes two processes targeting de Novo assembly from NGS read datasets. Why YAL (Yet Another Aligner)BioKanga is a highly efficient short-read aligner that incorporates an empirically der

Hku-idba - A Practical Iterative de Bruijn Graph De Novo Assembler

What Is IDBA?A Practical Iterative De Bruijn Graph De Novo Assembler related to Sequence assembly problem in bioinfomatics. Current ReleaseIDBA tool kit 0.17 for 64-bit Linux Released Aug,2010 Remove boost requirement from IDBA toolkit. Download current release For more releases download... Please follow the Installation Guide and User Guide to run the software. Project DescriptionIDBA is a open source de novo assembler for next-generation short read sequences. It is fast, parallel and capable o

Align2rawsignal - A.K.A. WIGGLER: Creates genome-wide raw or normalized signal tracks from aligned s

Author: Anshul KundajeEmail: akundaje _at_ stanford _dot_ eduDate: March 2011====================Introduction====================align2rawsignal (aka. WIGGLER .. because it generates wiggle files) reads in a set of tagAlign/BAM files, filters out multi-mapping tags and creates a consolidated genome-wide signal file using various tag-shift and smoothing parametersas well as various normalization schemesThe method accounts for the following:- depth of sequencing- the mappabilty of the genome (base

KmerFilter - Tools for kmer analysis.

Tools for kmer analysis.

kmap - Yet another kmer mapper

Yet another kmer mapper

Pomoda - Pomoda: Peak Oriented Motif Discovery Algorithm

/ Pomoda: Peak Oriented Motif Discovery Algorithm Function: generates position weight matrixes (PWM) for motifs concentrate around the ChIP-Seq peaks Syntax: POMODA -i inputFasta -o outputDIR [-w weightFile, -ot overlapThreshold -pdt PWM_Divergence_Threshold -ratio minSupportRatio, -FDR p-valueCutoff, -maxlen maxMotifLength, -seedlen SeedLength,-rs resolution_bp, -mbr min_binding_range -n numberOfMotifs] Comments: inputFasta: input data, using fasta