waggle-dance - Hive federation service

  •        394

Hive Metastore federation service.

https://github.com/HotelsDotCom/waggle-dance

Dependencies:

io.spring.platform:platform-bom:Brussels-SR5
io.dropwizard:dropwizard-bom:1.0.5
org.apache.hadoop:hadoop-common:2.7.2
org.apache.hadoop:hadoop-mapreduce-client-core:2.7.2
org.apache.hive:hive-common:2.3.0
org.apache.hive:hive-metastore:2.3.0

Tags
Implementation
License
Platform

   




Related Projects

Hive-JSON-Serde - Read - Write JSON SerDe for Apache Hive.

  •    Java

This library enables Apache Hive to read and write in JSON format. It includes support for serialization and deserialization (SerDe) as well as JSON conversion UDF. Download the latest binaries (json-serde-X.Y.Z-jar-with-dependencies.jar and json-udf-X.Y.Z-jar-with-dependencies.jar) from congiu.net/hive-json-serde. Choose the correct verson for CDH 4, CDH 5 or Hadoop 2.3. Place the JARs into hive/lib or use ADD JAR in Hive.

Shark - Hive on Spark

  •    Scala

Shark is an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users. It runs Hive queries up to 100x faster in memory, or 10x on disk. it is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive.

Apache Hive - The Apache Hive (TM) data warehouse software facilitates querying and managing large d

  •    Java

The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage.

Hive Games Reviewer

  •    CSharp

BoardSpace.net Hive Games Reviewer (Ultimate Edition)

cdh-twitter-example - Example application for analyzing Twitter data using CDH - Flume, Oozie, Hive

  •    Java

This repository contains an example application for analyzing Twitter data using a variety of CDH components, including Flume, Oozie, and Hive. Before you get started with the actual application, you'll first need CDH4 installed. Specifically, you'll need Hadoop, Flume, Oozie, and Hive. The easiest way to get the core components is to use Cloudera Manager to set up your initial environment. You can download Cloudera Manager from the Cloudera website, or install CDH manually.


DVD Hive amp; Swarm

  •    

DVD Hive amp; Swarm converts the source mpegs; auto-generates each title's DVD menus; and compiles each title to a DVD ISO image suitable for burning.

hive - A platform for backing crowdsourcing websites, built in golang for elasticsearch

  •    Go

A platform for backing crowdsourcing websites, built in Go for Elasticsearch.Hive requires elasticsearch version 1.3 or higher. Where you install it is up to you, as you can tell hive the domain and port for accessing elasticsearch at startup.

elephant-bird - Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code

  •    Java

Elephant Bird is Twitter's open source library of LZO, Thrift, and/or Protocol Buffer-related Hadoop InputFormats, OutputFormats, Writables, Pig LoadFuncs, Hive SerDe, HBase miscellanea, etc. The majority of these are in production at Twitter running over data every day.

Luigi - Python module that helps you build complex pipelines of batch jobs

  •    Python

The purpose of Luigi is to address all the plumbing typically associated with long-running batch processes. You want to chain many tasks, automate them, and failures will happen. These tasks can be anything, but are typically long running things like Hadoop jobs, dumping data to/from databases, running machine learning algorithms, or anything else.

hive - Mirror of Apache Hive

  •    Java

Query execution using Apache Hadoop MapReduce, Apache Tez or Apache Spark frameworks.Hive provides standard SQL functionality, including many of the later 2003 and 2011 features for analytics. These include OLAP functions, subqueries, common table expressions, and more. Hive's SQL can also be extended with user code via user defined functions (UDFs), user defined aggregates (UDAFs), and user defined table functions (UDTFs).

Apache Tez - A Framework for YARN-based, Data Processing Applications In Hadoop

  •    Java

Apache Tez is an extensible framework for building high performance batch and interactive data processing applications, coordinated by YARN in Apache Hadoop. Tez improves the MapReduce paradigm by dramatically improving its speed, while maintaining MapReduce’s ability to scale to petabytes of data. Important Hadoop ecosystem projects like Apache Hive and Apache Pig use Apache Tez, as do a growing number of third party data access applications developed for the broader Hadoop ecosystem.

PyHive - Python interface to Hive and Presto. 🐝

  •    Python

PyHive is a collection of Python DB-API and SQLAlchemy interfaces for Presto and Hive.First install this package to register it with SQLAlchemy (see setup.py).

Hive

  •    Java

Hive attempts to be a fully open platform for autonomous and mobile agents

Hive Mind LOIC

  •    CSharp

Hive Mind LOIC is a version of the Low Ordbit Ion Cannon made by Praetox, which was adapted for centralized control by NewEraCracker, when the project was then taken on by me. The amongst a few fixes I added RSS control (Such as via Twitter).

impyla - Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)

  •    Python

Python client for HiveServer2 implementations (e.g., Impala, Hive) for distributed query engines. For higher-level Impala functionality, including a Pandas-like interface over distributed data sets, see the Ibis project.

hadoop-ansible - Ansible playbook that installs a Hadoop cluster, with HBase, Hive, Presto for analytics, and Ganglia, Smokeping, Fluentd, Elasticsearch and Kibana for monitoring and centralized log indexing

  •    Shell

Ansible playbook that installs a CDH 4.6.0 Hadoop cluster (running on Java 7, supported from CDH 4.4), with HBase, Hive, Presto for analytics, and Ganglia, Smokeping, Fluentd, Elasticsearch and Kibana for monitoring and centralized log indexing. Follow @analytically. Browse the CI build screenshots.

Quicksql - Simpler, Safer, Faster Unified SQL Analytics Engine for Multi-Datasources

  •    Java

Quicksql is a SQL query product which can be used for specific datastore queries or multiple datastores correlated queries. It supports relational databases, non-relational databases and even datastore which does not support SQL (such as Elasticsearch, Druid) . In addition, a SQL query can join or union data from multiple datastores in Quicksql. For example, you can perform unified SQL query on one situation that a part of data stored on Elasticsearch, but the other part of data stored on Hive. The most important is that QSQL is not dependent on any intermediate compute engine, users only need to focus on data and unified SQL grammar to finished statistics and analysis. An architecture diagram helps you access Quicksql more easily.

brickhouse - Hive UDF's for the data warehouse

  •    Java

Extensions of Hive for the Data Developer

blinkdb - BlinkDB: Sub-Second Approximate Queries on Very Large Data.

  •    Scala

BlinkDB is a large-scale data warehouse system built on Shark and Spark and is designed to be compatible with Apache Hive. It can answer HiveQL queries up to 200-300 times faster than Hive by executing them on user-specified samples of data and providing approximate answers that are augmented with meaningful error bars. BlinkDB 0.1.0 is an alpha developer release that supports creating/deleting samples on any input table and/or materialized view and executing approximate HiveQL queries with those aggregates that have statistical closed forms (i.e., AVG, SUM, COUNT, VAR and STDEV).

SQL Azure Federation Data Migration Wizard

  •    

SQL Azure Federation Data Migration Wizard simplifies the process of migrating data from a single database to multiple federation members in SQL Azure Federation.