Displaying 1 to 8 from 8 results

ibis - A pandas-like deferred expression system, with first-class SQL support

  •    Python

Ibis is a toolbox to bridge the gap between local Python environments, remote storage, execution systems like Hadoop components (HDFS, Impala, Hive, Spark) and SQL databases. Its goal is to simplify analytical workflows and make you more productive. Learn more about using the library at http://ibis-project.org.

seed_rl - SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference

  •    Python

This repository contains an implementation of distributed reinforcement learning agent where both training and inference are performed on the learner.

ansible-cloudera-hadoop - ansible playbook to deploy cloudera hadoop components to the cluster

  •    Shell

The playbook is composed according to official cloudera guides with a primary purpose of production deployment in mind. High availability for HDFS and Yarn is implemented when a sufficient number of resources(hosts) is configured. From the other side, all of the components can be also deployed on a single host. It’s only required to place hostname(s) to the appropriate group in the hosts file, and the required services will be setup.




bloomery - Web UI for Impala

  •    Javascript

Bloomery is an open source query execution tool for Impala. It uses node-impala which provides communication between Impala and Node client using the Beeswax Service. Bloomery has ability to show tables of database, columns of tables, saved queries, recent queries and so on. Bloomery communicates with node-impala using Express Rest API. Express Rest API maps URLs with node-impala’s connect and query methods. Then the actions like executeQuery, showTables, showColumns sends query with request parameters to Rest API then Express forwards that query to node-impala which handles and returns results using Thrift. Eventually, Express puts this results to response body which we present to users inside the table under results tab of ui menu.

node-impala - Node Client for Impala using Apache Thrift

  •    Thrift

See the issue before using this module. Bloomery: Web UI for Impala and uses this client to execute queries.

implyr - SQL backend to dplyr for Impala

  •    R

implyr is a SQL backend to dplyr for Apache Impala, the massively parallel processing query engine for Apache Hadoop. Impala enables low-latency SQL queries on large datasets stored in HDFS, Apache HBase, Apache Kudu, Amazon S3, Microsoft ADLS, and Dell EMC Isilon. implyr is designed to work with any DBI-compatible interface to Impala. implyr does not provide the underlying connectivity to Impala, nor does it require that you use one particular R package for connectivity to Impala. Currently, two packages that can provide this connectivity are odbc and RJDBC. Future packages may provide other options for connectivity.

getl - A tool for developing and testing ETL and ELT processes for automating the capture, delivery and processing of information in data warehouses on the MicroFocus Vertica platform

  •    Groovy

Groovy ETL (Getl) - open source project on Groovy, developed since 2012 to automate loading and processing data from different sources. IBM DB2, FireBird, H2 Database, Hadoop Hive, Cloudera Impala, MS SQLServer, MySql, IBM Netezza, NetSuite, Oracle, PostgreSql, Micro Focus Vertica.







We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.