MLCraft is an open-source low-code business intelligence tool and a data science workflow. MLCraft was designed to query the data from several data warehouses and run machine learning experiments. Cube.js is used as a primary query layer and makes it suitable for handling trillions of data points. It is a full-stack data science platform that provides everything you need to build, manage and automate machine learning
https://github.com/mlcraft-io/mlcraftTags | mysql bigquery big-data spark presto hive athena analytics clickhouse postgresql business-intelligence redshift |
Implementation | Javascript |
License | MIT |
Platform | NodeJS |
Cube.js is an open-source analytical API platform. It is primarily used to build internal business intelligence tools or add customer-facing analytics to existing applications. Cube.js was designed to work with serverless data warehouses and query engines like Google BigQuery and AWS Athena. A multi-stage querying approach makes it suitable for handling trillions of data points. Most modern RDBMS work with Cube.js as well and can be further tuned for performance.
analytics mysql bigquery chart spark presto hive microservice serverless athena postgresql cubeRedash is our take on freeing the data within our company in a way that will better fit our culture and usage patterns. Prior to Redash, we tried to use traditional BI suites and discovered a set of bloated, technically challenged and slow tools/flows. What we were looking for was a more hacker'ish way to look at data, so we built one.
redash visualization analytics bi angular redshift bigquery athena mysql postgresql dashboardSuperset is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts. It easily integrates your data, using either our simple no-code viz builder or state of the art SQL IDE. Superset can query data from any SQL-speaking datastore or data engine (e.g. Presto or Athena) that has a Python DB-API driver and a SQLAlchemy dialect.
react flask data-science bi analytics superset apache data-visualization data-engineering business-intelligence data-viz data-analytics data-analysis sql-editor asf business-analyticsMetabase is the easy, open source way for everyone in your company to ask questions and learn from data. Get a real-time glimpse into what your company is learning about your data. Activity helps people in your company find an answer, jump start their own exploration, or improve existing questions.
analytics business-intelligence dashboard reporting slack database postgres postgresql mysql bi visualization data data-analysis sql-editor data-visualization postgresql-client postgresql-gui postgresql-admin postgresql-managementTrino is a highly parallel and distributed query engine, that is built from the ground up for efficient, low latency analytics. It is an ANSI SQL compliant query engine, that works with BI tools such as R, Tableau, Power BI, Superset and many others. It helps to natively query data in Hadoop, S3, Cassandra, MySQL, and many others, without the need for complex, slow, and error-prone processes for copying the data.
distributed-systems data-science sql database big-data presto hive hadoop analytics jdbc databases distributed-database query-engine datalake prestodb trinoSpecialised plugins for Hadoop, Big Data & NoSQL technologies, written by a former Clouderan (Cloudera was the first Hadoop Big Data vendor) and modern Hortonworks partner/consultant. Supports a a wide variety of compatible Enterprise Monitoring systems.
nagios-plugins zookeeper hadoop hbase cloudera hbase-client jenkins travis-ci nagios-plugin hortonworks ambari cassandra elasticsearch docker kafka solr redis rabbitmq consul datastaxShark is an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users. It runs Hive queries up to 100x faster in memory, or 10x on disk. it is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive.
distributed-sql hive big-dataTabix is a SQL Editor & Open source simple business intelligence for Clickhouse. No need to install, it works from the browser. It provides support to Draw charts, Maps of the world, Metrics RealTime charts from system.metrics, Displays database and tables as tree and lot more.
clickhouse sql-query dashboard data-visualization bi data-analysis business-intelligence sql-editor database-managementAnimated Investment Management Research at Sov.ai — Sponsoring open source AI, Machine learning, and Data Science initiatives. I don't know the other areas that well, send my your thought leaders by pull request.
data-science machine-learning big-data analytics resources career business-intelligence business-analyticsInfiniDB Community Edition is a scale-up, column-oriented database for data warehousing, analytics, business intelligence and read-intensive applications. InfiniDB's data warehouse columnar engine is multi-terabyte capable and accessed via MySQL.
database column-store data-mining relational column-database no-sql mysql-forkGenie is a federated job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.See the official website to find documentation about Genie and specific documentation for various releases.
big-data bigdata orchestration configuration configuration-management spring-boot distributed-systems netflixossSnowplow is an enterprise-strength marketing and product analytics platform. It identifies your users, and tracks the way they engage with your website or application. It stores your users' behavioural data in a scalable "event data warehouse" you control: in Amazon S3 and (optionally) Amazon Redshift or Postgres. Lets you leverage the biggest range of tools to analyze that data, including big data tools (e.g. Spark) via EMR or more traditional tools e.g. Looker, Mode, Superset, Re:dash to analyze that behavioural data.
analytics event-analytics cloud kafka awsGimel provides unified Data API to access data from any storage like HDFS, GS, Alluxio, Hbase, Aerospike, BigQuery, Druid, Elastic, Teradata, Oracle, MySQL, etc.
spark spark-streaming big-data paypal data-api kafka cassandra hbase aerospike elasticsearch jdbc teradata streaming-sql data-connectorPresto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. It allows querying data from relational / nosql databases. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization. It is developed by Facebook.
query-engine database-tool analytics big-data distributedGrowth Book is an open source experimentation platform designed for companies that want to bring A/B testing in-house, but don't want to build it themselves. It supports experimentation with deep code integration or using a visual front-end editor.
bigquery statistics analytics clickhouse google-analytics mixpanel snowflake experimentation redshift abtesting abtest split-testing optimizely vwo google-optimize testing testing-platform testing-toolsAlluxio (formerly known as Tachyon) is a virtual distributed storage system. It bridges the gap between computation frameworks and storage systems, enabling computation applications to connect to numerous storage systems through a common interface.
distributed-storage big-data memory-speed hadoop spark virtual-file-system presto tensorflow storage object-storeClickHouse is an open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries. It is Linearly Scalable, Blazing Fast, Highly Reliable, Fault Tolerant, Data compression, Real time query processing, Web analytics, Vectorized query execution, Local and distributed joins. It can process hundreds of millions to more than a billion rows and tens of gigabytes of data per single server per second.
database columnar-database column-oriented columnar analytics real-time big-dataMagellan is a distributed execution engine for geospatial analytics on big data. It is implemented on top of Apache Spark and deeply leverages modern database techniques like efficient data layout, code generation and query optimization in order to optimize geospatial queries. The application developer writes standard sql or data frame queries to evaluate geometric expressions while the execution engine takes care of efficiently laying data out in memory during query processing, picking the right query plan, optimizing the query execution with cheap and efficient spatial indices while presenting a declarative abstraction to the developer.
geospatial-analytics sparksql spark geometric-algorithms geojson shapefile geospatial geospatial-processing geospatial-analysis big-data magellanToroDB helps to transform your NoSQL data from a MongoDB replica set into a relational database in PostgreSQL. There are other solutions that are able to store the JSON document in a relational table using PostgreSQL JSON support, but it doesn't solve the real problem of 'how to really use that data'. ToroDB Stampede replicates the document structure in different relational tables and stores the document data in different tuples using those tables.
database nosql analytics mongodb-to-sql mongodb-to-postgres
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.