PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning which focuses on advanced Markov chain Monte Carlo and variational fitting algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.Note: Running pip install pymc will install PyMC 2.3, not PyMC3, from PyPI.
statistical-analysis bayesian-inference mcmc variational-inference theano probabilistic-programming bayesianMiller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON. With Miller, you get to use named fields without needing to count positional indices, using familiar formats such as CSV, TSV, JSON, and positionally-indexed.
data-processing data-cleaning csv csv-files csv-format csv-reader streaming-data streaming-algorithms tsv json json-data data-reduction data-regression statistics statistical-analysis devops devops-tools tabular-data command-line command-line-toolsgit quick-stats is a simple and efficient way to access various statistics in git repository. Any git repository contains tons of information about commits, contributors, and files. Extracting this information is not always trivial, mostly because of a gadzillion options to a gadzillion git commands – I don’t think there is a single person alive who knows them all. Probably not even Linus Torvalds himself :).
bash git statistics reviewer history stats shell-script git-addons statistical-analysis shell suggestion contributors changelog commits detailed gitlog agile meeting git-pathspec reviewBogofilter is a mail filter that classifies mail as spam or ham (non-spam) by a statistical analysis of the message's header and content (body). The program is able to learn from the user's classifications and corrections. Bogofilter provides processing for plain text and HTML. It supports multi-part MIME messages with decoding of base64, quoted-printable, and uuencoded text and ignores attachments, such as images.
mail-filter spam-analysis spam statistical-analysis spam-filterPyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters. PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and an accurate evaluation of large variety of classifiers. threshold is added in version 0.9 for real value prediction.
machine-learning confusion-matrix matrix statistics statistical-analysis accuracy ml ai mathematics data-mining data-analysis classification classifier data-science data neural-network multiclass-classification deep-learning artificial-intelligence deeplearningJupyter Notebooks for Springer book "Python for Probability, Statistics, and Machine Learning"
jupyter-notebook machine-learning book books probability probability-theory statistics statistics-course statistical-analysis statistical-learning statistical-testsUse Cases documentation demonstrates solutions to real-world data problems using Axibase Time Series Database (ATSD) and contains in-depth guides for programmatic integration with commonly-used enterprise software systems and services, as well as tutorials for data transformation and visualizations created with ATSD. Interactive visualizations tracking interesting datasets from a variety of sources.
dataset axibase atsd socrata open-data time-series statistical-analysis database visualization time-series-database time-series-analysisA/B tests (a.k.a. Randomized Controlled Trials or Experiments) have been widely applied in different industries to optimize business processes and user experience. ExpAn (Experiment Analysis) is a Python library developed for the statistical analysis of such experiments and to standardise the data structures used.The data structures and functionality of ExpAn are generic such that they can be used by both data scientists optimizing a user interface and biologists running wet-lab experiments. The library is also standalone and can be imported and used from within other projects and from the command line.
statistics ab-testing abtesting abtest statistical-analysisHdrSample is a port of Gil Tene's HdrHistogram to native Rust. It provides recording and analyzing of sampled data value counts across a large, configurable value range with configurable precision within the range. The resulting "HDR" histogram allows for fast and accurate analysis of the extreme ranges of data with non-normal distributions, like latency. What follows is a description from the HdrHistogram website. Users are encourages to read the documentation from the original Java implementation, as most of the concepts translate directly to the Rust port.
hdrhistogram sampling profiling statistical-analysisHdrSample is a port of Gil Tene's HdrHistogram to native Rust. It provides recording and analyzing of sampled data value counts across a large, configurable value range with configurable precision within the range. The resulting "HDR" histogram allows for fast and accurate analysis of the extreme ranges of data with non-normal distributions, like latency. What follows is a description from the HdrHistogram website. Users are encouraged to read the documentation from the original Java implementation, as most of the concepts translate directly to the Rust port.
hdrhistogram sampling profiling statistical-analysisThis is a stable repository for universal, customizable, multi-dimensional, multi-objective SW/HW autotuning with JSON API across Linux, Android, MacOS and Windows-based machines using Collective Knowledge Framework. Please, check out examples in this demo directory and notes about CK portable and customizable workflows.
autotuning auto-tuning performance-optimization customizable-autotuning statistical-analysis multiple-dimensions multiple-objectives json-api pareto portable-workflows androidA practical nonparametric statistical tests library for JavaScript
nonparametric statistics tests statistical-analysis library nemene testWe propose a user-friendly ChIP-seq and RNA-seq software suite for the interactive visualization and analysis of genomic data, including integrated features to support differential expression analysis, interactive heatmap production, principal component analysis, gene ontology analysis, and dynamic network visualization. MicroScope is financially supported by the United States Department of Defense (DoD) through the National Defense Science and Engineering Graduate Fellowship (NDSEG) Program. This research was conducted with Government support under and awarded by DoD, Army Research Office (ARO), National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a.
chip-seq rna-seq heatmap gene-ontology principal-component-analysis differential-expression network-analysis gene-expression computational-biology r-programming statistical-analysisggstatsplot is an extension of ggplot2 package for creating graphics with details from statistical tests included in the plots themselves and targeted primarily at behavioral sciences community to provide a one-line code to produce information-rich plots. In a typical exploratory data analysis workflow, data visualization and statistical modelling are two different phases: visualization informs modelling, and modelling in its turn can suggest a different visualization method, and so on and so forth. The central idea of ggstatsplot is simple: combine these two phases into one in the form of graphics with statistical details, which makes data exploration simpler and faster. Currently, it supports only the most common types of statistical tests (parametric, nonparametric, and robust versions of t-test, anova, and correlation analyses, contingency table analysis, and regression analyses).
ggplot-extension statistical-tests dataviz r statistical-analysis statistical-inference data visualization datascience violin-plot vignette badge parametric robust plotWelcome! The purpose of this repository is to serve as stockpile of statistical methods, modeling techniques, and data science tools. The content itself includes everything from educational vignettes on specific topics to tailored functions built to enhance and optimize analyses. This is and will remain a work in progress, and I welcome all contributions and constructive criticism. If you have a suggestion or request, please make use of the "Issues" tab and I will respond expeditiously. All are welcome and encouraged to contribute to this repository. My only request is that you include a detailed description of your contribution, that your code be thoroughly-commented, and that you test your contribution locally with the most recent version of the Master branch integrated prior to submitting the PR.
r machine-learning statistics data-science data-visualization tidyverse modeling reinforcement-learning hypothesis-testing classification regression kolmogorov-smirnov logistic-regression ggplot2 distributional-analysis econometrics statistical-analysis data-mining natural-language-processingThis package is still work in progress and it currently supports only the most basic statistical operations (from stats and lme4 package). The next releases will expand on the existing functionality (e.g., ordinal). There is a dedicated website to groupedstats, which is updated after every new commit: https://indrajeetpatil.github.io/groupedstats/.
statistics r statistical-analysis tidy linear-regression groupingLisp-Stat provides support for vectorized mathematical operations, and a comprehensive set of statistical methods that are implemented using the latest numerical algorithms. In addition, Common Lisp provides a dynamic programming environment (REPL), an excellent object-oriented facility (CLOS) and meta-object protocol (MOP). Lisp-Stat is fully functional today, and most of the XLISP-STAT libraries can be ported with the aid of a compatibility package XLS-compat. This gives Lisp-Stat a leg up on ecosystem development.
common-lisp statistical-analysisThermodynamic Analytics Toolkit is a sampling-based approach to understand the effectiveness of neural networks training and investigate their loss manifolds. It uses Tensorflow (https://www.tensorflow.org/) as neural network framework and implements advanced sampling algorithms on top of it. It contains both a rapid prototyping platform for new sampling methods and also an analysis framework to understand the intricacies of the loss manifold in terms of averages, covariance, diffusion maps, and free energy.
statistical-analysis thermodynamics neural-networks sampling diffusion-maps langevin-dynamics hamiltonian-dynamics
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.