Hue - The open source Apache Hadoop UI

  •        1935

Hue is a Web application for interacting with Apache Hadoop. It supports a FileBrowser for accessing HDFS, JobBrowser for accessing MapReduce jobs (MR1/MR2-YARN), Job Designer for creating MapReduce/Streaming/Java jobs, HBase Browser for exploring and modifying HBase tables and data, Oozie App for submitting and scheduling workflows and bundles, A Pig/HBase/Sqoop2 shell, Beeswax application for executing Hive queries, Search app for querying Solr and Solr Cloud.

http://gethue.com
https://github.com/cloudera/hue

Tags
Implementation
License
Platform

   




Related Projects

Apache Tez - A Framework for YARN-based, Data Processing Applications In Hadoop

  •    Java

Apache Tez is an extensible framework for building high performance batch and interactive data processing applications, coordinated by YARN in Apache Hadoop. Tez improves the MapReduce paradigm by dramatically improving its speed, while maintaining MapReduce’s ability to scale to petabytes of data. Important Hadoop ecosystem projects like Apache Hive and Apache Pig use Apache Tez, as do a growing number of third party data access applications developed for the broader Hadoop ecosystem.

Apache Trafodion - Webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop.

  •    C++

Apache Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop. Trafodion builds on the scalability, elasticity, and flexibility of Hadoop. Trafodion extends Hadoop to provide guaranteed transactional integrity, enabling new kinds of big data applications to run on Hadoop.

gis-tools-for-hadoop - The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data

  •    

The GIS Tools for Hadoop are a collection of GIS tools that leverage the Spatial Framework for Hadoop for spatial analysis of big data. The tools make use of the Geoprocessing Tools for Hadoop toolbox, to provide access to the Hadoop system from the ArcGIS Geoprocessing environment. Start out by navigating to samples and following the instructions provided with each sample.There are also tutorials for using the GP tools and aggregation methods.

Hadoop Common

  •    Java

Apache Hadoop is a framework for running applications on large clusters built of commodity hardware. Hadoop common supports other Hadoop subprojects


Cascalog - Data processing on Hadoop

  •    Clojure

Cascalog is a fully-featured data processing and querying library for Clojure or Java. The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer. Cascalog is a replacement for tools like Pig, Hive, and Cascading and operates at a significantly higher level of abstraction than those tools.

HPCC System - Hadoop alternative

  •    C++

HPCC is a proven and battle-tested platform for manipulating, transforming, querying and data warehousing Big Data. It supports two type of configuration. Thor is responsible for consuming vast amounts of data, transforming, linking and indexing that data. It functions as a distributed file system with parallel processing power spread across the nodes. Roxie, the Data Delivery Engine, provides separate high-performance online query processing and data warehouse capabilities.

Kylin - Extreme OLAP Engine for Big Data

  •    Java

Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, original contributed from eBay Inc. It is designed to reduce query latency on Hadoop for 10+ billions of rows of data. It offers ANSI SQL on Hadoop and supports most ANSI SQL query functions.

cm_api - Cloudera Manager API Client

  •    Java

Cloudera Manager is the market-leading management platform for CDH. As the industry’s first end-to-end management application for Apache Hadoop, Cloudera Manager sets the standard for enterprise deployment by delivering granular visibility into and control over every part of CDH – empowering operators to improve cluster performance, enhance quality of service, increase compliance and reduce administrative costs. This project contains all the source, examples and documentation you need to easily build a Cloudera Manager client in Java or Python.

Apache Tajo - A big data warehouse system on Hadoop

  •    Java

Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources.

Spark - Fast Cluster Computing

  •    Scala

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.

mrjob - Run MapReduce jobs on Hadoop or Amazon Web Services

  •    Python

mrjob is a Python 2.7/3.3+ package that helps you write and run Hadoop Streaming jobs. It fully supports Amazon's Elastic MapReduce (EMR) service, which allows you to buy time on a Hadoop cluster on an hourly basis. mrjob has basic support for Google Cloud Dataproc (Dataproc) which allows you to buy time on a Hadoop cluster on a minute-by-minute basis. It also works with your own Hadoop cluster.

mahout - Mirror of Apache Mahout

  •    Java

Mahout's goal is to build scalable machine learning libraries. With scalable we mean: Scalable to reasonably large data sets. Our core algorithms for clustering, classification and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm. However we do not restrict contributions to Hadoop based implementations: Contributions that run on a single node or on a non-Hadoop cluster are welcome as well. The core libraries are highly optimized to allow for good performance also for non-distributed algorithms. Scalable to support your business case. Mahout is distributed under a commercially friendly Apache Software license. Scalable community. The goal of Mahout is to build a vibrant, responsive, diverse community to facilitate discussions not only on the project itself but also on potential use cases. Come to the mailing lists to find out more. Currently Mahout supports mainly four use cases: Recommendation mining takes users' behavior and from that tries to find items users might like. Clustering takes e.g. text documents and groups them into groups of topically related documents. Classification learns from existing categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. Frequent itemset mining takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.

Sqoop - Transfers data between Hadoop and Datastores

  •    Java

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. You can use Sqoop to import data from external structured datastores into Hadoop Distributed File System or related systems like Hive and HBase. Conversely, Sqoop can be used to extract data from Hadoop and export it to external structured datastores such as relational databases and enterprise data warehouses.

cdh-twitter-example - Example application for analyzing Twitter data using CDH - Flume, Oozie, Hive

  •    Java

This repository contains an example application for analyzing Twitter data using a variety of CDH components, including Flume, Oozie, and Hive. Before you get started with the actual application, you'll first need CDH4 installed. Specifically, you'll need Hadoop, Flume, Oozie, and Hive. The easiest way to get the core components is to use Cloudera Manager to set up your initial environment. You can download Cloudera Manager from the Cloudera website, or install CDH manually.

Apache Gora - Provides persistence to column stores, key value stores, document stores and RDBMS

  •    Java

The Apache Gora open source framework provides an in-memory data model and persistence for big data. Gora supports persisting to column stores, key value stores, document stores and RDBMSs, and analyzing the data with extensive Apache Hadoop MapReduce support.

Ranger - Manage Data Security across the Hadoop Platform

  •    Java

Ranger is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. It provides centralized security administration to manage all security related tasks in a central UI or using REST APIs, Fine grained authorization, Centralize auditing of user access within Apache Hadoop, Apache Hive, Apache HBase and other Apache components.

Cascading - Data Processing Workflows on Hadoop

  •    Java

Cascading is a Data Processing API, Process Planner, and Process Scheduler used for defining and executing complex, scale-free, and fault tolerant data processing workflows on an Apache Hadoop cluster. It is a thin Java library and API that sits on top of Hadoop's MapReduce layer and is executed from the command line like any other Hadoop application.

elasticsearch-hadoop - :elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop

  •    Java

Elasticsearch real-time search and analytics natively integrated with Hadoop. Supports Map/Reduce, Cascading, Apache Hive, Apache Pig, Apache Spark and Apache Storm.See project page and documentation for detailed information.

Scalding - A Scala API for Cascading

  •    Scala

Scalding is a Scala library that makes it easy to specify Hadoop MapReduce jobs. Scalding is built on top of Cascading, a Java library that abstracts away low-level Hadoop details. Scalding is comparable to Pig, but offers tight integration with Scala, bringing advantages of Scala to your MapReduce jobs.