Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science. Build an app in a few lines of code with magically simple API. Adding a widget is the same as declaring a variable. No need to write a backend, define routes etc. Effortlessly share, manage, and collaborate on your apps directly from Streamlit. Apps are deployed directly from Github repo.

Streamlit - The fastest way to build data apps in Python

Prophet is a procedure for forecasting time series data. It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.Prophet is open source software released by Facebook's Core Data Science team. It is available for download on CRAN and PyPI.
 
Prophet is used in many applications across Facebook for producing reliable forecasts for planning and goal setting. The Prophet procedure includes many possibilities for users to tweak and adjust forecasts. You can use human-interpretable parameters to improve your forecast by adding your domain knowledge.

Prophet is a procedure for forecasting time series data. It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.Prophet is open source software released by Facebook's Core Data Science team. It is available for download on CRAN and PyPI.

Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth

scikit-learn is a Python module for machine learning built on top of SciPy. It is simple and efficient tools for data mining and data analysis. It supports automatic classification, clustering, model selection, pre processing and lot more.

Scikit Learn - Machine Learning in Python

Pilosa is an open source, distributed bitmap index that dramatically accelerates continuous analysis across multiple, massive data sets.  Pilosa collapses the speed and batch layer by abstracting the index from data storage, optimizing it for massive scale, and making your data instantly queryable and continuously analyzable. 

Pilosa - Distributed bitmap index that dramatically accelerates queries across multiple, massive data sets

Metabase is the easy, open source way for everyone in your company to ask questions and learn from data. Get a real-time glimpse into what your company is learning about your data. Activity helps people in your company find an answer, jump start their own exploration, or improve existing questions. 

Metabase - The simplest, fastest way to get business intelligence and analytics  to everyone in your company

OpenRefine (previously Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.



OpenRefine - Powerful tool for working with messy data

CyberChef is a simple, intuitive web app for carrying out all manner of "cyber" operations within a web browser. These operations include simple encoding like XOR or Base64, more complex encryption like AES, DES and Blowfish, creating binary and hexdumps, compression and decompression of data, calculating hashes and checksums, IPv6 and X.509 parsing, changing character encodings, and much more.
 
The tool is designed to enable both technical and non-technical analysts to manipulate data in complex ways without having to deal with complex tools or algorithms. It was conceived, designed, built and incrementally improved by an analyst in their 10% innovation time over several years.

CyberChef is a simple, intuitive web app for carrying out all manner of "cyber" operations within a web browser. These operations include simple encoding like XOR or Base64, more complex encryption like AES, DES and Blowfish, creating binary and hexdumps, compression and decompression of data, calculating hashes and checksums, IPv6 and X.509 parsing, changing character encodings, and much more.

CyberChef - The Cyber Swiss Army Knife for Developers (DevToolbox)

Ephyra is a modular and extensible framework for open domain question answering (QA). The system retrieves accurate answers to natural language questions from the Web and other sources. The goal is to give researchers the opportunity to develop new QA techniques without worrying about the end-to-end system.

Ephyra - Question Answering System

Cascalog is a fully-featured data processing and querying library for Clojure or Java. The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer. Cascalog is a replacement for tools like Pig, Hive, and Cascading and operates at a significantly higher level of abstraction than those tools.

Cascalog - Data processing on Hadoop

GoldenOrb is a cloud-based project for massive-scale graph analysis, built upon Apache Hadoop and modeled after Google's Pregel architecture. It provides solutions to complex data problems, remove limits to innovation and contribute to the emerging ecosystem that spans all aspects of big data analysis. It enables users to run analytics on entire data sets instead of samples. 
 Source code: <A HREF="https://github.com/raveldata/goldenorb" target="_blank">https://github.com/raveldata/goldenorb</A>

GoldenOrb is a cloud-based project for massive-scale graph analysis, built upon Apache Hadoop and modeled after Google's Pregel architecture. It provides solutions to complex data problems, remove limits to innovation and contribute to the emerging ecosystem that spans all aspects of big data analysis. It enables users to run analytics on entire data sets instead of samples. 

GoldenOrb - Scalable Graph Analysis

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions. It supports aggregating or transforming data with a powerful group by engine allowing split-apply-combine operations on data sets, High performance merging and joining of data sets, Time series-functionality, Hierarchical axis indexing and lot more. 

Pandas - Powerful Python Data Analysis Toolkit

MLlib is a Spark implementation of some common machine learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction and lot more.

MLIB - Apache Spark's scalable machine learning library

Mesa is an Apache2 licensed agent-based modeling (or ABM) framework in Python.It allows users to quickly create agent-based models using built-in core components (such as spatial grids and agent schedulers) or customized implementations; visualize them using a browser-based interface; and analyze their results using Python's data analysis tools. Its goal is to be the Python 3-based alternative to NetLogo, Repast, or MASON.

Mesa is a agent-based modeling framework in Python

GoAccess is an open source real-time web log analyzer and interactive viewer that runs in a terminal on *nix systems or through your browser. It provides fast and valuable HTTP statistics for system administrators that require a visual server report on the fly. It supports nearly all web log formats (Apache, Nginx, Amazon S3, Elastic Load Balancing, CloudFront, etc) 

GoAccess - Real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser

Kapacitor is a open source framework for processing, monitoring, and alerting on time series data. Kapacitor imports (stream or batch) time series data, and then transform, analyze, and act on the data. It uses Telegraf to collect system metrics on your local machine and store them in InfluxDB.
 
Kapacitor’s alerting system follows a publish-subscribe design pattern. Alerts are published to topics and handlers subscribe to a topic. This pub/sub model and the ability for these to call User Defined Functions make Kapacitor very flexible to act as the control plane in your environment, performing tasks like auto-scaling, stock reordering, and IoT device control.

Kapacitor is a open source framework for processing, monitoring, and alerting on time series data. Kapacitor imports (stream or batch) time series data, and then transform, analyze, and act on the data. It uses Telegraf to collect system metrics on your local machine and store them in InfluxDB.

Kapacitor - Open source framework for processing, monitoring, and alerting on time series data

Discover open source projects across all platforms

Projects