Displaying 1 to 18 from 18 results

gleam - Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly

  •    Go

Gleam is a high performance and efficient distributed execution system, and also simple, generic, flexible and easy to customize.Gleam is built in Go, and the user defined computation can be written in Go, Unix pipe tools, or any streaming programs.

HPCC System - Hadoop alternative

  •    C++

HPCC is a proven and battle-tested platform for manipulating, transforming, querying and data warehousing Big Data. It supports two type of configuration. Thor is responsible for consuming vast amounts of data, transforming, linking and indexing that data. It functions as a distributed file system with parallel processing power spread across the nodes. Roxie, the Data Delivery Engine, provides separate high-performance online query processing and data warehouse capabilities.

Hadoop Common

  •    Java

Apache Hadoop is a framework for running applications on large clusters built of commodity hardware. Hadoop common supports other Hadoop subprojects

Scalding - A Scala API for Cascading

  •    Scala

Scalding is a Scala library that makes it easy to specify Hadoop MapReduce jobs. Scalding is built on top of Cascading, a Java library that abstracts away low-level Hadoop details. Scalding is comparable to Pig, but offers tight integration with Scala, bringing advantages of Scala to your MapReduce jobs.




Spark - Fast Cluster Computing

  •    Scala

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.

mrjob - Run MapReduce jobs on Hadoop or Amazon Web Services

  •    Python

mrjob is a Python 2.7/3.3+ package that helps you write and run Hadoop Streaming jobs. It fully supports Amazon's Elastic MapReduce (EMR) service, which allows you to buy time on a Hadoop cluster on an hourly basis. mrjob has basic support for Google Cloud Dataproc (Dataproc) which allows you to buy time on a Hadoop cluster on a minute-by-minute basis. It also works with your own Hadoop cluster.

Apache Trafodion - Webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop.

  •    C++

Apache Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop. Trafodion builds on the scalability, elasticity, and flexibility of Hadoop. Trafodion extends Hadoop to provide guaranteed transactional integrity, enabling new kinds of big data applications to run on Hadoop.


Apache Tez - A Framework for YARN-based, Data Processing Applications In Hadoop

  •    Java

Apache Tez is an extensible framework for building high performance batch and interactive data processing applications, coordinated by YARN in Apache Hadoop. Tez improves the MapReduce paradigm by dramatically improving its speed, while maintaining MapReduce’s ability to scale to petabytes of data. Important Hadoop ecosystem projects like Apache Hive and Apache Pig use Apache Tez, as do a growing number of third party data access applications developed for the broader Hadoop ecosystem.

Hue - The open source Apache Hadoop UI

  •    Java

Hue is a Web application for interacting with Apache Hadoop. It supports a FileBrowser for accessing HDFS, JobBrowser for accessing MapReduce jobs (MR1/MR2-YARN), Job Designer for creating MapReduce/Streaming/Java jobs, HBase Browser for exploring and modifying HBase tables and data, Oozie App for submitting and scheduling workflows and bundles, A Pig/HBase/Sqoop2 shell, Beeswax application for executing Hive queries, Search app for querying Solr and Solr Cloud.

PigPen - Map-Reduce for Clojure

  •    Clojure

PigPen is map-reduce for Clojure, or distributed Clojure. It compiles to Apache Pig or Cascading but you don't need to know much about either of them to use it. It provides the ability to write map-reduce queries as programs, not as scripts.

Pangool - Tuple MapReduce for Hadoop and MapReduce made easy

  •    Java

Pangool is a framework on top of Hadoop that implements Tuple MapReduce. It supports secondary sorting, Built-in reduce-side joining capabilities, Built-in serialization support for Thrift and ProtoStuff and lot more.

level-sum - Calculate sums in a LevelDB and get live updates.

  •    Javascript

Calculate multidimensional sums in a LevelDB and get live updates.Create a new sums db. Make sure your db has been given the powers of level-sublevel.

Phadoop - Map/reduce jobs for Hadoop in PHP

  •    PHP

Phadoop allows you to write map/reduce tasks for Hadoop in PHP. I created it to give a techtalk about Hadoop in the company I worked in. It is not ready for production use yet but can help you to play with Hadoop in PHP. You can find more examples in the examples directory.

timothy - Timothy's primary goal is to make The Yellow Elephant rich and famous

  •    Javascript

Timothy's primary goal is to make Hadoop's Yellow Elephant rich and famous. Status and counters for the job can be updated using the this.updateStatus and this.updateCounter functions.

aistore - AIStore: scalable storage for AI applications

  •    Go

AIStore (AIS for short) is a built from scratch storage solution for AI applications. At its core, it's open-source object storage with extensions tailored for AI and, specifically, for petascale deep learning. As a storage system, AIS is a distributed object store with a RESTful S3-like API, and the gamut of capabilities that one would normally expect from an object store: eventual consistency, flat namespace, versioning, and all the usual read/write and control primitives to read and write objects and create, destroy, list, and configure buckets that contain those objects.

datastore-mapper - Appengine Datastore Mapper in Go

  •    Go

This is an implementation of the appengine datastore mapper functionality of the appengine map-reduce framework developed using in Go. It is alpha status while I develop the API and should not be used in production yet but it's been successfully used to export datastore entities to JSON for import into BigQuery, streaming inserts directly into BigQuery and for schema migrations and lightweight aggregation reporting.

hyperreduce - Distributed reduce on top of hypercore

  •    Javascript

Distributed reduce on top of hypercore. We believe ops doesn't need to be complicated. If hypercore is distributed streams, hyperreduce is a distributed reducer for streams. We needed this to to turn our feed of server errors into a single meaningful value.