Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.

Spark - Fast Cluster Computing

A web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more. 

Zeppelin - Multi-purpose Notebook

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more. It supports over 40 programming languages. 
 
Code can produce rich output such as images, videos, LaTeX, and JavaScript. Interactive widgets can be used to manipulate and visualize data in realtime.

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more. It supports over 40 programming languages. 

Jupyter - Web-based notebook environment for interactive computing

LightDash is an open source BI for your whole team. It is an open source alternative to Looker. It helps everybody in your company to answer their own questions using data. Connect Lightdash to your dbt project, add metrics directly in your data transformation layer, then create and share your insights with your team.

LightDash- Open source BI for your whole team

TrailDB is a library, implemented in C, which allows you to query series of events at blazing speed. TrailDB is also optimized for speed of development: Use its simple API with your favorite language, in your favorite environment. TrailDB's secret sauce is data compression. It leverages predictability of time-based data to compress your data to a fraction of its original size. In contrast to traditional compression, you can query the encoded data directly, decompressing only the parts you need. 
 
Since 2014, AdRoll has used TrailDB to store and query over tens of trillions events originating from the web.

 TrailDB is a library, implemented in C, which allows you to query series of events at blazing speed. TrailDB is also optimized for speed of development: Use its simple API with your favorite language, in your favorite environment.  TrailDB's secret sauce is data compression. It leverages predictability of time-based data to compress your data to a fraction of its original size. In contrast to traditional compression, you can query the encoded data directly, decompressing only the parts you need. 

TrailDB - Efficient tool for storing and querying series of events

KNIME, pronounced [naim], is a modern data analytics platform that allows you to perform sophisticated statistics and data mining on your data to analyze trends and predict potential results. Its visual workbench combines data access, data transformation, initial investigation, powerful predictive analytics and visualization. KNIME also provides the ability to develop reports based on your information or automate the application of new insight back into production systems.

Knime - Data Analytics Platform

Lens provides an Unified Analytics interface. Lens aims to cut the Data Analytics silos by providing a single view of data across multiple tiered data stores and optimal execution environment for the analytical query. It provides a simple metadata layer which provides an abstract view over tiered data stores.

Lens - Unified Analytics interface

Dremio is a self-service data platform that empowers users to discover, curate, accelerate, and share any data at any time, regardless of location, volume, or structure. Modern data is managed by a wide range of technologies, including relational databases, NoSQL datastores, file systems, Hadoop, and others. Many of the newer datastores are often more agile and provide improved scalability, but at a cost to speed and ease of access via traditional SQL-based analysis tools. Additionally, raw data found in these stores is often too complex or inconsistent for analysis to use with business intelligence tools.
 
ETL pipelines that load a subset of data into relational databases provide one answer, but aside from the burden these solutions impose on data engineers and IT staff, data becomes stale by the time it is available to analysts and data scientists.

Dremio is a self-service data platform that empowers users to discover, curate, accelerate, and share any data at any time, regardless of location, volume, or structure. Modern data is managed by a wide range of technologies, including relational databases, NoSQL datastores, file systems, Hadoop, and others. Many of the newer datastores are often more agile and provide improved scalability, but at a cost to speed and ease of access via traditional SQL-based analysis tools. Additionally, raw data found in these stores is often too complex or inconsistent for analysis to use with business intelligence tools.

Dremio - The missing link in modern data

Superset is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts. It easily integrates your data, using either our simple no-code viz builder or state of the art SQL IDE. Superset can query data from any SQL-speaking datastore or data engine (e.g. Presto or Athena) that has a Python DB-API driver and a SQLAlchemy dialect.
 
Superset provides: 
<ul>
<li>An intuitive interface for visualizing datasets and
crafting interactive dashboards</li>
<li>A wide array of beautiful visualizations to showcase your data</li>
<li>Code-free visualization builder to extract and present datasets</li>
<li>A world-class SQL IDE for preparing data for visualization, including a rich metadata browser</li>
<li>A lightweight semantic layer which empowers data analysts to quickly define custom dimensions and metrics</li>
<li>Out-of-the-box support for most SQL-speaking databases</li>
<li>Seamless, in-memory asynchronous caching and queries</li>
<li>An extensible security model that allows configuration of very intricate rules on
on who can access which product features and datasets.</li>
<li>Integration with major
authentication backends (database, OpenID, LDAP, OAuth, REMOTE_USER, etc)</li>
<li>The ability to add custom visualization plugins</li>
<li>An API for programmatic customization</li>
<li>A cloud-native architecture designed from the ground up for scale</li>
</ul>

Superset is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts. It easily integrates your data, using either our simple no-code viz builder or state of the art SQL IDE. Superset can query data from any SQL-speaking datastore or data engine (e.g. Presto or Athena) that has a Python DB-API driver and a SQLAlchemy dialect.

Apache Superset is a Data Visualization and Data Exploration Platform 

Discover open source projects across all platforms

Projects