Apache arrow is an open source, low latency SQL query engine for Hadoop and NoSQL. Arrow enables execution engines to take advantage of the latest SIMD (Single input multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout of data also allows for a better use of CPU caches by placing all data relevant to a column operation in as compact of a format as possible.
http://arrow.apache.org/Tags | in-memory analytics query-engine nosql no-sql |
Implementation | Java |
License | Apache |
Platform | OS-Independent |
OmniSciDB is the foundation of the OmniSci platform. OmniSciDB is SQL-based, relational, columnar and specifically developed to harness the massive parallelism of modern CPU and GPU hardware. OmniSciDB can query up to billions of rows in milliseconds, and is capable of unprecedented ingestion speeds, making it the ideal SQL engine for the era of big, high-velocity data.
database visualization machine-learning real-time sql interactive gpu llvm olap mapd geodata distributed-database gpu-database analytics analytics-databaseSuperset is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts. It easily integrates your data, using either our simple no-code viz builder or state of the art SQL IDE. Superset can query data from any SQL-speaking datastore or data engine (e.g. Presto or Athena) that has a Python DB-API driver and a SQLAlchemy dialect.
react flask data-science bi analytics superset apache data-visualization data-engineering business-intelligence data-viz data-analytics data-analysis sql-editor asf business-analyticsDataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format. DataFusion supports both an SQL and a DataFrame API for building logical query plans as well as a query optimizer and execution engine capable of parallel execution against partitioned data sources (CSV and Parquet) using threads.
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. It allows querying data from relational / nosql databases. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization. It is developed by Facebook.
query-engine database-tool analytics big-data distributedMagellan is a distributed execution engine for geospatial analytics on big data. It is implemented on top of Apache Spark and deeply leverages modern database techniques like efficient data layout, code generation and query optimization in order to optimize geospatial queries. The application developer writes standard sql or data frame queries to evaluate geometric expressions while the execution engine takes care of efficiently laying data out in memory during query processing, picking the right query plan, optimizing the query execution with cheap and efficient spatial indices while presenting a declarative abstraction to the developer.
geospatial-analytics sparksql spark geometric-algorithms geojson shapefile geospatial geospatial-processing geospatial-analysis big-data magellanSnappyData (aka TIBCO ComputeDB) is a distributed, in-memory optimized analytics database. SnappyData delivers high throughput, low latency, and high concurrency for unified analytics workload. By fusing an in-memory hybrid database inside Apache Spark, it provides analytic query processing, mutability/transactions, access to virtually all big data sources and stream processing all in one unified cluster. One common use case for SnappyData is to provide analytics at interactive speeds over large volumes of data with minimal or no pre-processing of the dataset. For instance, there is no need to often pre-aggregate/reduce or generate cubes over your large data sets for ad-hoc visual analytics. This is made possible by smartly managing data in-memory, dynamically generating code using vectorization optimizations and maximizing the potential of modern multi-core CPUs. SnappyData enables complex processing on large data sets in sub-second timeframes.
spark stream scale analytics transaction memory-database snappydataTrino is a highly parallel and distributed query engine, that is built from the ground up for efficient, low latency analytics. It is an ANSI SQL compliant query engine, that works with BI tools such as R, Tableau, Power BI, Superset and many others. It helps to natively query data in Hadoop, S3, Cassandra, MySQL, and many others, without the need for complex, slow, and error-prone processes for copying the data.
distributed-systems data-science sql database big-data presto hive hadoop analytics jdbc databases distributed-database query-engine datalake prestodb trinoApache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, original contributed from eBay Inc. It is designed to reduce query latency on Hadoop for 10+ billions of rows of data. It offers ANSI SQL on Hadoop and supports most ANSI SQL query functions.
etl olap olap-engine metadata-engine query-engine storage-engineBlobCity DB is an All-in-One Database. It offers support for natively storing 17 different formats of data, including JSON, XML, CSV, PDF, Word, Excel, Log, GIS, Image amongst others. It run two full feature storage engines. One that stores data in memory and the other that stores data on disk. In-memory storage offers sheer performance for real-time analytics, while the disk storage make BlobCity an excellent alternative for DataLakes.
database dbaas nosql multi-model no-sqlApache Spark is a general purpose parallel computational engine for analytics at scale. At its core, it has a batch design center and is capable of working with disparate data sources. While this provides rich unified access to data, this can also be quite inefficient and expensive. Analytic processing requires massive data sets to be repeatedly copied and data to be reformatted to suit Spark. In many cases, it ultimately fails to deliver the promise of interactive analytic performance. For instance, each time an aggregation is run on a large Cassandra table, it necessitates streaming the entire table into Spark to do the aggregation. Caching within Spark is immutable and results in stale insight. At SnappyData, we take a very different approach. SnappyData fuses a low latency, highly available in-memory transactional database (GemFireXD) into Spark with shared memory management and optimizations. Data in the highly available in-memory store is laid out using the same columnar format as Spark (Tungsten). All query engine operators are significantly more optimized through better vectorization and code generation. The net effect is, an order of magnitude performance improvement when compared to native Spark caching, and more than two orders of magnitude better Spark performance when working with external data sources.
snappydata spark memory-database analytics stream transaction scaleOpenSearch is a community-driven, open source search and analytics suite derived from Apache 2.0 licensed Elasticsearch 7.10.2 & Kibana 7.10.2. It consists of a search engine daemon, OpenSearch, and a visualization and user interface, OpenSearch Dashboards. OpenSearch enables people to easily ingest, secure, search, aggregate, view, and analyze data. These capabilities are popular for use cases such as application search, log analytics, and more.
search-engine searchengine full-text-search realtime-analytics analytics log-aggregation aggregation clickstream-analyticsAresDB is a GPU-powered real-time analytics storage and query engine. It features low query latency, high data freshness and highly efficient in-memory and on disk storage management.
analytics real-time-analytics gpu aggregation time-seriesCrate is an open source, highly scalable, shared-nothing distributed SQL database. Crate offers the scalability and performance of a modern No-SQL database with the power of Standard SQL. Crate’s distributed SQL query engine lets you use the same syntax that already exists in your applications or integrations, and have queries seamlessly executed across the crate cluster, including any aggregations, if needed.
sql distributed-sql database scalable distributed searchengine full-text-search analyticsQuasar is an open source NoSQL analytics engine that can be used as a library or through a REST API to power advanced analytics across a growing range of data sources and databases, including MongoDB. SQL² is the dialect of SQL that Quasar understands.
Shark is an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users. It runs Hive queries up to 100x faster in memory, or 10x on disk. it is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive.
distributed-sql hive big-dataDataFusion is a SQL parser, planner, and query execution library for Rust. A DataFrame API is also provided. DataFusion can be used as a crate dependency in your project to add SQL support for custom data sources.
spark sql dataframe cluster compute data arrowAsterixDB is a BDMS (Big Data Management System) with a rich feature set that sets it apart from other Big Data platforms. Its feature set makes it well-suited to modern needs such as web data warehousing and social data storage and analysis. It is a highly scalable data management system that can store, index, and manage semi-structured data, but it also supports a full-power query language with the expressiveness of SQL (and more).
database parallel-database big-data data-warehouse nosql no-sql analyticsQuicksql is a SQL query product which can be used for specific datastore queries or multiple datastores correlated queries. It supports relational databases, non-relational databases and even datastore which does not support SQL (such as Elasticsearch, Druid) . In addition, a SQL query can join or union data from multiple datastores in Quicksql. For example, you can perform unified SQL query on one situation that a part of data stored on Elasticsearch, but the other part of data stored on Hive. The most important is that QSQL is not dependent on any intermediate compute engine, users only need to focus on data and unified SQL grammar to finished statistics and analysis. An architecture diagram helps you access Quicksql more easily.
hive spark sql flinkMemgraph is a streaming graph application platform that helps you wrangle your streaming data, build sophisticated models that you can query in real-time, and develop graph applications.
kafka graph graph-algorithms nosql stream-processing graph-database kafka-streams cypher graph-analysis streaming-data opencypherKibi extends Kibana 5.5.2 with data intelligence features; the core feature of Kibi is the capability to join and filter data from multiple Elasticsearch indexes and from SQL/NOSQL data sources ("external queries").In addition, Kibi provides UI features and visualizations like dashboard groups, tabs, cross entity relational navigation buttons, an enhanced search results table, analytical aggregators, HTML templates on query results, and much more.
kibana elasticsearch relational-browsing logstash analytics visualizations dashboards dashboarding data-intelligence relational sql sparql kibana-alternative
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.