Displaying 1 to 16 from 16 results

PyMC3 - Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

  •    Python

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning which focuses on advanced Markov chain Monte Carlo and variational fitting algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.Note: Running pip install pymc will install PyMC 2.3, not PyMC3, from PyPI.

git-quick-stats - ▁▅▆▃▅ Git quick statistics is a simple and efficient way to access various statistics in git repository

  •    Shell

git quick-stats is a simple and efficient way to access various statistics in git repository. Any git repository contains tons of information about commits, contributors, and files. Extracting this information is not always trivial, mostly because of a gadzillion options to a gadzillion git commands – I don’t think there is a single person alive who knows them all. Probably not even Linus Torvalds himself :).

bogofilter -- Fast Bayesian Spam Filter

  •    Perl

Bogofilter is a mail filter that classifies mail as spam or ham (non-spam) by a statistical analysis of the message's header and content (body). The program is able to learn from the user's classifications and corrections. Bogofilter provides processing for plain text and HTML. It supports multi-part MIME messages with decoding of base64, quoted-printable, and uuencoded text and ignores attachments, such as images.




pycm - Multi-class confusion matrix library in Python

  •    Python

PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters. PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and an accurate evaluation of large variety of classifiers. threshold is added in version 0.9 for real value prediction.

atsd-use-cases - Axibase Time Series Database: Usage Examples and Research Articles

  •    Vue

Use Cases documentation demonstrates solutions to real-world data problems using Axibase Time Series Database (ATSD) and contains in-depth guides for programmatic integration with commonly-used enterprise software systems and services, as well as tutorials for data transformation and visualizations created with ATSD. Interactive visualizations tracking interesting datasets from a variety of sources.

expan - A Python library for statistical analysis of randomised control trials (A/B tests)

  •    HTML

A/B tests (a.k.a. Randomized Controlled Trials or Experiments) have been widely applied in different industries to optimize business processes and user experience. ExpAn (Experiment Analysis) is a Python library developed for the statistical analysis of such experiments and to standardise the data structures used.The data structures and functionality of ExpAn are generic such that they can be used by both data scientists optimizing a user interface and biologists running wet-lab experiments. The library is also standalone and can be imported and used from within other projects and from the command line.


hdrsample - A port of HdrHistogram to Rust

  •    Rust

HdrSample is a port of Gil Tene's HdrHistogram to native Rust. It provides recording and analyzing of sampled data value counts across a large, configurable value range with configurable precision within the range. The resulting "HDR" histogram allows for fast and accurate analysis of the extreme ranges of data with non-normal distributions, like latency. What follows is a description from the HdrHistogram website. Users are encourages to read the documentation from the original Java implementation, as most of the concepts translate directly to the Rust port.

HdrHistogram_rust - A port of HdrHistogram to Rust

  •    Rust

HdrSample is a port of Gil Tene's HdrHistogram to native Rust. It provides recording and analyzing of sampled data value counts across a large, configurable value range with configurable precision within the range. The resulting "HDR" histogram allows for fast and accurate analysis of the extreme ranges of data with non-normal distributions, like latency. What follows is a description from the HdrHistogram website. Users are encouraged to read the documentation from the original Java implementation, as most of the concepts translate directly to the Rust port.

ck-autotuning - Collective Knowledge extension to let users implement customizable, portable, multi-dimensional and multi-objective SW/HW auto-tuning workflows using Collective Knowledge Framework

  •    Python

This is a stable repository for universal, customizable, multi-dimensional, multi-objective SW/HW autotuning with JSON API across Linux, Android, MacOS and Windows-based machines using Collective Knowledge Framework. Please, check out examples in this demo directory and notes about CK portable and customizable workflows.

Microscope - ChIP-seq/RNA-seq analysis software suite for gene expression heatmaps

  •    R

We propose a user-friendly ChIP-seq and RNA-seq software suite for the interactive visualization and analysis of genomic data, including integrated features to support differential expression analysis, interactive heatmap production, principal component analysis, gene ontology analysis, and dynamic network visualization. MicroScope is financially supported by the United States Department of Defense (DoD) through the National Defense Science and Engineering Graduate Fellowship (NDSEG) Program. This research was conducted with Government support under and awarded by DoD, Army Research Office (ARO), National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a.

ggstatsplot - Collection of functions to enhance ggplot2 plots with results from statistical tests.

  •    HTML

ggstatsplot is an extension of ggplot2 package for creating graphics with details from statistical tests included in the plots themselves and targeted primarily at behavioral sciences community to provide a one-line code to produce information-rich plots. In a typical exploratory data analysis workflow, data visualization and statistical modelling are two different phases: visualization informs modelling, and modelling in its turn can suggest a different visualization method, and so on and so forth. The central idea of ggstatsplot is simple: combine these two phases into one in the form of graphics with statistical details, which makes data exploration simpler and faster. Currently, it supports only the most common types of statistical tests (parametric, nonparametric, and robust versions of t-test, anova, and correlation analyses, contingency table analysis, and regression analyses).

data-science-toolkit - Collection of stats, modeling, and data science tools in Python and R.

  •    Jupyter

Welcome! The purpose of this repository is to serve as stockpile of statistical methods, modeling techniques, and data science tools. The content itself includes everything from educational vignettes on specific topics to tailored functions built to enhance and optimize analyses. This is and will remain a work in progress, and I welcome all contributions and constructive criticism. If you have a suggestion or request, please make use of the "Issues" tab and I will respond expeditiously. All are welcome and encouraged to contribute to this repository. My only request is that you include a detailed description of your contribution, that your code be thoroughly-commented, and that you test your contribution locally with the most recent version of the Master branch integrated prior to submitting the PR.

groupedstats - Grouped statistical analysis in a tidy way

  •    R

This package is still work in progress and it currently supports only the most basic statistical operations (from stats and lme4 package). The next releases will expand on the existing functionality (e.g., ordinal). There is a dedicated website to groupedstats, which is updated after every new commit: https://indrajeetpatil.github.io/groupedstats/.