Displaying 1 to 20 from 124 results

Hypertable - A high performance, scalable, distributed storage and processing system for structured


Hypertable is based on Google's Bigtable Design, which is a proven scalable design that powers hundreds of Google services. Many of the current scalable NoSQL database offerings are based on a hash table design which means that the data they manage is not kept physically ordered. Hypertable keeps data physically sorted by a primary key and it is well suited for Analytics.

Solr - Blazing-fast, open source enterprise search platform


Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.

stats - A statistics package with common functions that are missing from the Golang standard library


A statistics package with many functions missing from the Golang standard library. See the CHANGELOG.md for API changes and tagged releases you can vendor into your projects.Protip: go get -u github.com/montanaflynn/stats updates stats to the latest version.




Pinot - A realtime distributed OLAP datastore


Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally, so that it can scale to larger data sets and higher query rates as needed.

ElasticSearch - Distributed, RESTful search and analytics engine


Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

Kibana - Analytics and search dashboard for Elasticsearch


Kibana provides flexible analytics and visualization platform for Elasticsearch. It understands large volume of data and easily create bar charts, line and scatter plots, histograms, pie charts, and maps. It can provide real-time summary and charting of streaming data. Kibana is a snap to setup and start using. Kibana strives to be easy to get started with, while also being flexible and powerful, just like Elasticsearch.



Autotrack - Automatic and enhanced Google Analytics tracking for common user interactions on the web


The default JavaScript tracking snippet for Google Analytics runs when a web page is first loaded and sends a pageview hit to Google Analytics. If you want to know about more than just pageviews (e.g. where the user clicked, how far they scroll, did they see certain elements, etc.), you have to write code to capture that information yourself.Since most website owners care about a lot of the same types of user interactions, web developers end up writing the same code over and over again for every new site they build.

disc - :chart_with_upwards_trend: Visualise the module tree of browserify project bundles and track down bloat


Disc is a tool for analyzing the module tree of browserify project bundles. It's especially handy for catching large and/or duplicate modules which might be either bloating up your bundle or slowing down the build process.The demo included on disc's github page is the end result of running the tool on browserify's own code base.

dazzle - :rocket: Dashboards made easy in React JS.


Dazzle is a library for building dashboards with React JS. Dazzle does not depend on any front-end libraries but it makes it easier to integrate with them.Dazzle's goal is to be flexible and simple. Even though there are some UI components readily available out of the box, you have the complete control to override them as you wish with your own styles and layout.

Timescaledb - An open-source time-series database optimized for fast ingest and complex queries


TimescaleDB is an open-source database designed to make SQL scalable for time-series data. It is engineered up from PostgreSQL, providing automatic partitioning across time and space (partitioning key), as well as full SQL support. TimescaleDB is packaged as a PostgreSQL extension and released under the Apache 2 open-source license.

pachyderm - Reproducible Data Science at Scale!


Pachyderm is a tool for production data pipelines. If you need to chain together data scraping, ingestion, cleaning, munging, wrangling, processing, modeling, and analysis in a sane way, then Pachyderm is for you. If you have an existing set of scripts which do this in an ad-hoc fashion and you're looking for a way to "productionize" them, Pachyderm can make this easy for you. Install Pachyderm locally or deploy on AWS/GCE/Azure in about 5 minutes.

Spark - Fast Cluster Computing


Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.

Presto - Distributed SQL query engine for big data


Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. It allows querying data from relational / nosql databases. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization. It is developed by Facebook.

Zeppelin - Multi-purpose Notebook


A web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more.

Druid IO - Real Time Exploratory Analytics on Large Datasets


Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations. Druid can load both streaming and batch data.

Jupyter - Web-based notebook environment for interactive computing


The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more. It supports over 40 programming languages.

Countly - Web and Mobile Analytics


Countly is an innovative, real-time, open source mobile & web analytics, push notifications and crash reporting platform powering nearly 3000 mobile applications. It collects data from mobile phones, tablets, Apple Watch and other internet-connected devices, and visualizes this information to analyze mobile application usage and end-user behavior.