Displaying 1 to 20 from 40 results

Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth

  •    Python

Prophet is a procedure for forecasting time series data. It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.Prophet is open source software released by Facebook's Core Data Science team. It is available for download on CRAN and PyPI.

knowledge-repo - A next-generation curated knowledge sharing platform for data scientists and other technical professions

  •    Python

The Knowledge Repository project is focused on facilitating the sharing of knowledge between data scientists and other technical roles using data formats and tools that make sense in these professions. It provides various data stores (and utilities to manage them) for "knowledge posts", with a particular focus on notebooks (R Markdown and Jupyter / IPython Notebook) to better promote reproducible research.Check out this Medium Post for the inspiration for the project.

xarray - N-D labeled arrays and datasets in Python

  •    Python

xarray (formerly xray) is an open source project and Python package that aims to bring the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures. Our goal is to provide a pandas-like and pandas-compatible toolkit for analytics on multi-dimensional arrays, rather than the tabular data for which pandas excels. Our approach adopts the Common Data Model for self- describing scientific data in widespread use in the Earth sciences: xarray.Dataset is an in-memory representation of a netCDF file.




pachyderm - Reproducible Data Science at Scale!

  •    Go

Pachyderm is a tool for production data pipelines. If you need to chain together data scraping, ingestion, cleaning, munging, wrangling, processing, modeling, and analysis in a sane way, then Pachyderm is for you. If you have an existing set of scripts which do this in an ad-hoc fashion and you're looking for a way to "productionize" them, Pachyderm can make this easy for you. Install Pachyderm locally or deploy on AWS/GCE/Azure in about 5 minutes.

plotnine - A grammar of graphics for Python

  •    Python

plotnine is an implementation of a grammar of graphics in Python, it is based on ggplot2. The grammar allows users to compose plots by explicitly mapping data to the visual objects that make up the plot. Plotting with a grammar is powerful, it makes custom (and otherwise complex) plots easy to think about and then create, while the simple plots remain simple.

Scikit Learn - Machine Learning in Python

  •    Python

scikit-learn is a Python module for machine learning built on top of SciPy. It is simple and efficient tools for data mining and data analysis. It supports automatic classification, clustering, model selection, pre processing and lot more.

mlcourse_open - OpenDataScience Machine Learning course. Both in English and Russian

  •    Python

This is the list of published articles on medium.com 🇬🇧, habr.com 🇷🇺, and jqr.com 🇨🇳. Icons are clickable. Also, links to Kaggle Kernels (in English) are given. This way one can reproduce everything without installing a single package. Assignments will be announced each week. Meanwhile, you can pratice with demo versions. Solutions will be discussed in the upcoming run of the course.


pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data

  •    Python

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal. Binary installers for the latest released version are available at the Python package index and on conda.

Pilosa - Distributed bitmap index that dramatically accelerates queries across multiple, massive data sets

  •    Go

Pilosa is an open source, distributed bitmap index that dramatically accelerates continuous analysis across multiple, massive data sets. Pilosa collapses the speed and batch layer by abstracting the index from data storage, optimizing it for massive scale, and making your data instantly queryable and continuously analyzable.

gonum - Gonum is a set of numeric libraries for the Go programming language

  •    Go

The core packages of the gonum suite are written in pure Go with some assembly. Installation is done using go get.The gonum packages use a variety of build tags to set non-standard build conditions. Building gonum applications will work without knowing how to use these tags, but they can be used during testing and to control the use of assembly and CGO code.

Mesa is a agent-based modeling framework in Python

  •    Python

Mesa is an Apache2 licensed agent-based modeling (or ABM) framework in Python.It allows users to quickly create agent-based models using built-in core components (such as spatial grids and agent schedulers) or customized implementations; visualize them using a browser-based interface; and analyze their results using Python's data analysis tools. Its goal is to be the Python 3-based alternative to NetLogo, Repast, or MASON.

GoAccess - Real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser

  •    C

GoAccess is an open source real-time web log analyzer and interactive viewer that runs in a terminal on *nix systems or through your browser. It provides fast and valuable HTTP statistics for system administrators that require a visual server report on the fly. It supports nearly all web log formats (Apache, Nginx, Amazon S3, Elastic Load Balancing, CloudFront, etc)

Pandas - Powerful Python Data Analysis Toolkit

  •    Python

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions. It supports aggregating or transforming data with a powerful group by engine allowing split-apply-combine operations on data sets, High performance merging and joining of data sets, Time series-functionality, Hierarchical axis indexing and lot more.

data-science-with-ruby - Practical Data Science with Ruby based tools.

  •    Ruby

Data Science is a new "sexy" buzzword without specific meaning but often used to substitute Statistics, Scientific Computing, Text and Data Mining and Visualization, Machine Learning, Data Processing and Warehousing as well as Retrieval Algorithms of any kind. This curated list comprises awesome tutorials, libraries, information sources about various Data Science applications using the Ruby programming language.

pycm - Multi-class confusion matrix library in Python

  •    Python

PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters. PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and an accurate evaluation of large variety of classifiers. threshold is added in version 0.9 for real value prediction.

rumale - Rumale is a machine learning library in Ruby

  •    Ruby

Rumale (Ruby machine learning) is a machine learning library in Ruby. Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python. Rumale supports Linear / Kernel Support Vector Machine, Logistic Regression, Linear Regression, Ridge, Lasso, Kernel Ridge, Factorization Machine, Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor classifier, K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, SNN, Power Iteration Clustering, Mutidimensional Scaling, t-SNE, Principal Component Analysis, Kernel PCA and Non-negative Matrix Factorization. This project was formerly known as "SVMKit". If you are using SVMKit, please install Rumale and replace SVMKit constants with Rumale.

python_moztelemetry - Spark bindings for Mozilla Telemetry

  •    Python

After having your PR reviewed and merged create a new release on github. A new pypi release will be automatically triggered by Travis.

ospi - Open Source Presence Infographic of Indian Startups

  •    Python

Technically speaking, organizations used in this report are no more only a startup now, but I hope you people won't mind this and aren't gonna launch a drone on me.I think, something is clear from the name itself, is it? Well! it should.





We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.