StatsBase.jl - Basic statistics for Julia

  •        246

StatsBase.jl is a Julia package that provides basic support for statistics. Particularly, it implements a variety of statistics-related functions, such as scalar statistics, high-order moment computation, counting, ranking, covariances, sampling, and empirical density estimation.

https://github.com/JuliaStats/StatsBase.jl

Tags
Implementation
License
Platform

   




Related Projects

RLSeq2Seq - Deep Reinforcement Learning For Sequence to Sequence Models

  •    Python

NOTE: THE CODE IS UNDER DEVELOPMENT, PLEASE ALWAYS PULL THE LATEST VERSION FROM HERE. In recent years, sequence-to-sequence (seq2seq) models are used in a variety of tasks from machine translation, headline generation, text summarization, speech to text, to image caption generation. The underlying framework of all these models are usually a deep neural network which contains an encoder and decoder. The encoder processes the input data and a decoder receives the output of the encoder and generates the final output. Although simply using an encoder/decoder model would, most of the time, produce better result than traditional methods on the above-mentioned tasks, researchers proposed additional improvements over these sequence to sequence models, like using an attention-based model over the input, pointer-generation models, and self-attention models. However, all these seq2seq models suffer from two common problems: 1) exposure bias and 2) inconsistency between train/test measurement. Recently a completely fresh point of view emerged in solving these two problems in seq2seq models by using methods in Reinforcement Learning (RL). In these new researches, we try to look at the seq2seq problems from the RL point of view and we try to come up with a formulation that could combine the power of RL methods in decision-making and sequence to sequence models in remembering long memories. In this paper, we will summarize some of the most recent frameworks that combines concepts from RL world to the deep neural network area and explain how these two areas could benefit from each other in solving complex seq2seq tasks. In the end, we will provide insights on some of the problems of the current existing models and how we can improve them with better RL models. We also provide the source code for implementing most of the models that will be discussed in this paper on the complex task of abstractive text summarization.

covid19model - Code for modelling estimated deaths and cases for COVID19.

  •    Stan

Code for modelling estimated deaths and infections for COVID-19 from "Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe", Flaxman, Mishra, Gandy et al, Nature, 2020, the published version of our original Report 13. If you are looking for the individual based model used in Imperial's Report 9, Ferguson, Laydon, Nedjati-Gilani et al, please look here.

Julia - Language for Technical Computing

  •    Julia

Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. This computation is automatically distributed across all available compute nodes, and the result, reduced by summation (+), is returned at the calling node.

Julia.jl - Curated decibans of Julia language.

  •    Julia

Julia.jl aggregates and curates decibans of knowledge resources for programming in Julia, an all-purpose programming language that addresses the needs of high-performance numerical analysis and computational science. For Base packages, check if the package you seek is listed in the built-in package manager on github, or check METADATA for registered Julia packages, then use the built-in package manager to install it after checking the requirements for respective versions. Pkg3.jl is an alpha next-generation package manager for Julia that creates a Manifest.toml file that records the exact versions of each dependency and their transitive dependencies.

statistical-analysis-python-tutorial - Statistical Data Analysis in Python

  •    HTML

Chris Fonnesbeck is an Assistant Professor in the Department of Biostatistics at the Vanderbilt University School of Medicine. He specializes in computational statistics, Bayesian methods, meta-analysis, and applied decision analysis. He originally hails from Vancouver, BC and received his Ph.D. from the University of Georgia. This tutorial will introduce the use of Python for statistical data analysis, using data stored as Pandas DataFrame objects. Much of the work involved in analyzing data resides in importing, cleaning and transforming data in preparation for analysis. Therefore, the first half of the course is comprised of a 2-part overview of basic and intermediate Pandas usage that will show how to effectively manipulate datasets in memory. This includes tasks like indexing, alignment, join/merge methods, date/time types, and handling of missing data. Next, we will cover plotting and visualization using Pandas and Matplotlib, focusing on creating effective visual representations of your data, while avoiding common pitfalls. Finally, participants will be introduced to methods for statistical data modeling using some of the advanced functions in Numpy, Scipy and Pandas. This will include fitting your data to probability distributions, estimating relationships among variables using linear and non-linear models, and a brief introduction to bootstrapping methods. Each section of the tutorial will involve hands-on manipulation and analysis of sample datasets, to be provided to attendees in advance.


HDF5.jl - Saving and loading Julia variables

  •    Julia

Saving and loading Julia variables

probability - Probabilistic reasoning and statistical analysis in TensorFlow

  •    Jupyter

TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. As part of the TensorFlow ecosystem, TensorFlow Probability provides integration of probabilistic methods with deep networks, gradient-based inference via automatic differentiation, and scalability to large datasets and models via hardware acceleration (e.g., GPUs) and distributed computation. Our probabilistic machine learning tools are structured as follows.

pycm - Multi-class confusion matrix library in Python

  •    Python

PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters. PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and an accurate evaluation of large variety of classifiers. threshold is added in version 0.9 for real value prediction.

learn-julia-the-hard-way - Learn Julia the hard way!

  •    Makefile

The Julia base package is pretty big, although at the same time, there are lots of other packages around to expand it with. The result is that on the whole, it is impossible to give a thorough overview of all that Julia can do in just a few brief exercises. Therefore, I had to adopt a little 'bias', or 'slant' if you please, in deciding what to focus on and what to ignore. Julia is a technical computing language, although it does have the capabilities of any general purpose language and you'd be hard-pressed to find tasks it's completely unsuitable for (although that does not mean it's the best or easiest choice for any of them). Julia was developed with the occasional reference to R, and with an avowed intent to improve upon R's clunkiness. R is a great language, but relatively slow, to the point that most people use it to rapid prototype, then implement the algorithm for production in Python or Java. Julia seeks to be as approachable as R but without the speed penalty.

urbansim - Platform for building statistical models of cities and regions

  •    Python

UrbanSim is a platform for building statistical models of cities and regions. These models help forecast long-range patterns in real estate development, demographics, and related outcomes, under various policy scenarios. This urbansim Python library is a core component. It contains tools for statistical estimation and simulation; domain-specific logic about housing markets, household relocation, and other processes; and frameworks and utilities for assembling a model.

Gadfly.jl - Crafty statistical graphics for Julia.

  •    Julia

Gadfly is a plotting and data visualization system written in Julia. It's influenced heavily by Leland Wilkinson's book The Grammar of Graphics and Hadley Wickham's refinement of that grammar in ggplot2.

NLP-Models-Tensorflow - Gathers machine learning and Tensorflow deep learning models for NLP problems, 1

  •    Jupyter

NLP-Models-Tensorflow, Gathers machine learning and tensorflow deep learning models for NLP problems, code simplify inside Jupyter Notebooks 100%. I will attached github repositories for models that I not implemented from scratch, basically I copy, paste and fix those code for deprecated issues.

patsy - Describing statistical models in Python using symbolic formulas

  •    Python

Patsy is a Python library for describing statistical models (especially linear models, or models that have a linear component) and building design matrices. Patsy brings the convenience of R "formulas" to Python.

JProGraM

  •    Java

JProGraM (PRObabilistic GRAphical Models in Java) is a statistical machine learning library. It supports statistical modeling and data analysis along three main directions: (1) probabilistic graphical models (Bayesian networks, Markov random fields, dependency networks, hybrid random fields); (2) parametric, semiparametric, and nonparametric density estimation (Gaussian models, nonparanormal estimators, Parzen windows, Nadaraya-Watson estimator); (3) generative models for random networks (

R Language - Project for Statistical Computing

  •    C

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible.

transformers - ๐Ÿค—Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX

  •    Python

๐Ÿค— Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation and more in over 100 languages. Its aim is to make cutting-edge NLP easier to use for everyone. ๐Ÿค— Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.

BreakoutDetection - Breakout Detection via Robust E-Statistics

  •    C++

BreakoutDetection is an open-source R package that makes breakout detection simple and fast. The BreakoutDetection package can be used in wide variety of contexts. For example, detecting breakout in user engagement post an A/B test, detecting behavioral change, or for problems in econometrics, financial engineering, political and social sciences.The underlying algorithm – referred to as E-Divisive with Medians (EDM) – employs energy statistics to detect divergence in mean. Note that EDM can also be used detect change in distribution in a given time series. EDM uses robust statistical metrics, viz., median, and estimates the statistical significance of a breakout through a permutation test.

Oceananigans.jl - ๐ŸŒŠ Fast and friendly fluid dynamics on CPUs and GPUs

  •    Julia

We strive for a user interface that makes Oceananigans.jl`as friendly and intuitive to use as possible, allowing users to focus on the science. Internally, we have attempted to write the underlying algorithm so that the code runs as fast as possible for the configuration chosen by the user --- from simple two-dimensional setups to complex three-dimensional simulations --- and so that as much code as possible is shared between the different architectures, models, and grids. Note: The latest version of Oceananigans requires at least Julia v1.6 to run. Installing Oceananigans with an older version of Julia will install an older version of Oceananigans (the latest version compatible with your version of Julia).

git-quick-stats - โ–โ–…โ–†โ–ƒโ–… Git quick statistics is a simple and efficient way to access various statistics in git repository

  •    Shell

git quick-stats is a simple and efficient way to access various statistics in git repository. Any git repository contains tons of information about commits, contributors, and files. Extracting this information is not always trivial, mostly because of a gadzillion options to a gadzillion git commands – I don’t think there is a single person alive who knows them all. Probably not even Linus Torvalds himself :).






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.