MapD - The MapD Core database

  •        210

MapD Core is an in-memory, column store, SQL relational database that was designed from the ground up to run on GPUs. MapD Core is the foundational element of a larger data exploration platform that emphasizes speed at scale. By taking advantage of the parallel processing power of the hardware, MapD Core can query billions of rows in milliseconds. Furthermore, by using the graphics pipelines of GPUs, MapD Core can render graphics directly from the server.

MapD can operate as a standalone database using the command line or its administration console. MapD can also output to its native interface, MapD Immerse, or to third-party software such as Tableau, Qlik, or Birst, using a variety of connectors.

Within a single server, MapD Core can handle datasets 1.5-3TB in raw size in GPU RAM, or 10-15TB in CPU RAM. In distributed deployments, that number scales linearly.

https://www.mapd.com
https://github.com/mapd/mapd-core

Tags
Implementation
License
Platform

   




Related Projects

mapd-charting - Dimensional charting built to work natively with crossfilter rendered using d3.js

  •    Javascript

Dimensional charting built to work natively with crossfilter rendered using d3.js. MapD-Charting is a superfast charting library that works natively with crossfilter that is based off dc.js. It is designed to work with MapD-Connector and MapD-Crossfilter to create charts instantly with our MapD-Core SQL Database. Please see examples for further understanding to quickly create interactive charts.

Pinot - A realtime distributed OLAP datastore

  •    Java

Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally, so that it can scale to larger data sets and higher query rates as needed.

pygdf - GPU Data Frame

  •    Jupyter

PyGDF implements the Python interface to access and manipulate the GPU Dataframe of GPU Open Analytics Initialive (GOAI). We aim to provide a simple interface that similar to the Pandas dataframe and hide the details of GPU programming.

EventQL - The database for large-scale event analytics

  •    C++

EventQL is a distributed, column-oriented database built for large-scale event collection and analytics. It runs super-fast SQL and MapReduce queries. Its features include Automatic partitioning, Columnar storage, Standard SQL support, Scales to petabytes, Timeseries and relational data, Fast range scans and lot more.

InfiniDB - Scale-up analytics database engine for data warehousing and business intelligence

  •    C++

InfiniDB Community Edition is a scale-up, column-oriented database for data warehousing, analytics, business intelligence and read-intensive applications. InfiniDB's data warehouse columnar engine is multi-terabyte capable and accessed via MySQL.


ClickHouse - Columnar DBMS and Real Time Analytics

  •    C++

ClickHouse is an open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries. It is Linearly Scalable, Blazing Fast, Highly Reliable, Fault Tolerant, Data compression, Real time query processing, Web analytics, Vectorized query execution, Local and distributed joins. It can process hundreds of millions to more than a billion rows and tens of gigabytes of data per single server per second.

FiloDB - Distributed. Columnar. Versioned. Streaming. SQL.

  •    Scala

High-performance distributed analytical database + Spark SQL queries + built for streaming. Columnar, versioned layers of data wrapped in a yummy high-performance analytical database engine.

MonetDB

  •    C

MonetDB is a high-performance SQL- and XQuery- column-store database management system with automatic index management, flexible optimizer infrastructure, and programmable backend functionality.

Infobright - The Database for Analytics

  •    C++

Infobright combines a columnar database with our Knowledge Grid architecture to deliver a self-managing, self-tuning database optimized for analytics. Infobright eliminates the need to create indexes, partition data, or do any manual tuning to achieve fast response for queries and reports.

palladium - Framework for setting up predictive analytics services

  •    Python

Palladium provides means to easily set up predictive analytics services as web services. It is a pluggable framework for developing real-world machine learning solutions. It provides generic implementations for things commonly needed in machine learning, such as dataset loading, model training with parameter search, a web service, and persistence capabilities, allowing you to concentrate on the core task of developing an accurate machine learning model. Having a well-tested core framework that is used for a number of different services can lead to a reduction of costs during development and maintenance due to harmonization of different services being based on the same code base and identical processes. Palladium has a web service overhead of a few milliseconds only, making it possible to set up services with low response times. A configuration file lets you conveniently tie together existing components with components that you developed. As an example, if what you want to do is to develop a model where you load a dataset from a CSV file or an SQL database, and train an SVM classifier to predict one of the rows in the data given the others, and then find out about your model's accuracy, then that's what Palladium allows you to do without writing a single line of code. However, it is also possible to independently integrate own solutions.

Hypertable - A high performance, scalable, distributed storage and processing system for structured

  •    C++

Hypertable is based on Google's Bigtable Design, which is a proven scalable design that powers hundreds of Google services. Many of the current scalable NoSQL database offerings are based on a hash table design which means that the data they manage is not kept physically ordered. Hypertable keeps data physically sorted by a primary key and it is well suited for Analytics.

cstore_fdw - Columnar store for analytics with Postgres, developed by Citus Data

  •    C

Cstore_fdw is an open source columnar store extension for PostgreSQL. Columnar stores provide notable benefits for analytics use cases where data is loaded in batches. Cstore_fdw’s columnar nature delivers performance by only reading relevant data from disk, and it may compress data 6x-10x to reduce space requirements for data archival. Cstore_fdw is developed by Citus Data and can be used in combination with Citus, a postgres extension that intelligently distributes your data and queries across many nodes so your database can scale and your queries are fast. If you have any questions about how Citus can help you scale or how to use Citus in combination with cstore_fdw, please let us know.

snappydata - SnappyData - The Spark Database. Stream, Transact, Analyze, Predict in one cluster

  •    Scala

Apache Spark is a general purpose parallel computational engine for analytics at scale. At its core, it has a batch design center and is capable of working with disparate data sources. While this provides rich unified access to data, this can also be quite inefficient and expensive. Analytic processing requires massive data sets to be repeatedly copied and data to be reformatted to suit Spark. In many cases, it ultimately fails to deliver the promise of interactive analytic performance. For instance, each time an aggregation is run on a large Cassandra table, it necessitates streaming the entire table into Spark to do the aggregation. Caching within Spark is immutable and results in stale insight. At SnappyData, we take a very different approach. SnappyData fuses a low latency, highly available in-memory transactional database (GemFireXD) into Spark with shared memory management and optimizations. Data in the highly available in-memory store is laid out using the same columnar format as Spark (Tungsten). All query engine operators are significantly more optimized through better vectorization and code generation. The net effect is, an order of magnitude performance improvement when compared to native Spark caching, and more than two orders of magnitude better Spark performance when working with external data sources.

Druid IO - Real Time Exploratory Analytics on Large Datasets

  •    Java

Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations. Druid can load both streaming and batch data.

Coolstorage - ORM library for .NET

  •    CSharp

The main strength of Vici CoolStorage is the ease of use. Most ORM tools still require a lot of unneeded code to accomplish basic data persistence tasks, but Vici CoolStorage is designed to relieve the programmer from these tedious and error-prone tasks, making it very intuitive to use.

Apache Arrow - Powering Columnar In-Memory Analytics

  •    Java

Apache arrow is an open source, low latency SQL query engine for Hadoop and NoSQL. Arrow enables execution engines to take advantage of the latest SIMD (Single input multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout of data also allows for a better use of CPU caches by placing all data relevant to a column operation in as compact of a format as possible.

HBase - Hadoop database

  •    Java

HBase provides support to handle BigTable - billions of rows X millions of columns. It is a scalable, distributed, versioned, column-oriented store modeled after Google's Bigtable and runs on top of HDFS (Hadoop Distributed Filesystem). It features compression, in-memory operation per-column. Data could be replicated between the nodes. HBase is used in Facebook and Twitter.

Metabase - The simplest, fastest way to get business intelligence and analytics to everyone in your company

  •    Clojure

Metabase is the easy, open source way for everyone in your company to ask questions and learn from data. Get a real-time glimpse into what your company is learning about your data. Activity helps people in your company find an answer, jump start their own exploration, or improve existing questions.

CockroachDB - Cloud-native SQL database.

  •    Go

CockroachDB is a cloud-native SQL database for building global, scalable cloud services that survive disasters.CockroachDB is a distributed SQL database built on a transactional and strongly-consistent key-value store. It scales horizontally; survives disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention; supports strongly-consistent ACID transactions; and provides a familiar SQL API for structuring, manipulating, and querying data.