Displaying 1 to 6 from 6 results

lakeFS - Git-like capabilities for your object storage

  •    Go

lakeFS is an open source layer that delivers resilience and manageability to object-storage based data lakes. With lakeFS you can build repeatable, atomic and versioned data lake operations - from complex ETL jobs to data science and analytics.

Trino - A query engine that runs at ludicrous speed

  •    Java

Trino is a highly parallel and distributed query engine, that is built from the ground up for efficient, low latency analytics. It is an ANSI SQL compliant query engine, that works with BI tools such as R, Tableau, Power BI, Superset and many others. It helps to natively query data in Hadoop, S3, Cassandra, MySQL, and many others, without the need for complex, slow, and error-prone processes for copying the data.

Apache Hudi - Streaming Data Lake Platform

  •    Java

Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). As an organization, Hudi can help you build an efficient data lake, solving some of the most complex, low-level storage management problems, while putting data into hands of your data analysts, engineers and scientists much quicker.

aws-data-wrangler - The missing link between AWS services and the most popular Python data libraries

  •    Python

The missing link between AWS services and the most popular Python data libraries. AWS Data Wrangler aims to fill a gap between AWS Analytics Services (Glue, Athena, EMR, Redshift) and the most popular Python libraries for lightweight workloads.

pancloud - Python idiomatic SDK for Cortex™.

  •    Python

Python idiomatic SDK for the Palo Alto Networks Cortex™ platform. The Palo Alto Networks Cloud Python SDK (or pancloud for short) was created to assist developers with programmatically interacting with the Palo Alto Networks Cortex™ platform.

aws-orbit-workbench - A Data Platform built for AWS, powered by Kubernetes.

  •    Python

Orbit Workbench is an open framework for building team-based secured data environment. Orbit workbench is built on Kubernetes using Amazon Managed Kubernetes Service (EKS), and provides both a command line tool for rapid deployment as well as Python SDK, Jupyter Plugins and more to accelerate data analysis and ML by integration with AWS analytics services such as Amazon Redshift, Amazon Athena, Amazon EMR, Amazon SageMaker and more. Orbit Workbench deploys secured team spaces that are mapped to Kubernetes namespaces and span into AWS cloud resources. Each team is a secured zone where only members of the team can access allowed data and share data and code freely within the team. Orbit automatically creates file storage for each team using Amazon EFS, security group and IAM role for each team , as well as their own JupyterHub and Jupyter Server. Orbit workbench users are also capable of launching python code or Jupyter Notebooks as Kubernetes containers or as Amazon Fargate containers. Orbit workbench provides CLI tool for users to build their own custom images and use it to deploy containers or customize their Jupyter environment.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.