Hadoop Common

  •        0

Apache Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or reexecuted on any node in the cluster. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both Map/Reduce and the distributed file system are designed so that node failures are automatically handled by the framework.

http://hadoop.apache.org/common/

Tags
Implementation
License
Platform

   




Related Projects

Redisson - Redis based In-Memory Data Grid for Java


Redisson - distributed Java objects and services (Set, Multimap, SortedSet, Map, List, Queue, BlockingQueue, Deque, BlockingDeque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Executor service, Tomcat Session Manager, Scheduler service, JCache API) on top of Redis server. Rich Redis client.

HPCC System - Hadoop alternative


HPCC is a proven and battle-tested platform for manipulating, transforming, querying and data warehousing Big Data. It supports two type of configuration. Thor is responsible for consuming vast amounts of data, transforming, linking and indexing that data. It functions as a distributed file system with parallel processing power spread across the nodes. Roxie, the Data Delivery Engine, provides separate high-performance online query processing and data warehouse capabilities.

Phach-Thesis - distributed, map reduce system


distributed, map reduce system

Spark - Fast Cluster Computing


Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.

PigPen - Map-Reduce for Clojure


PigPen is map-reduce for Clojure, or distributed Clojure. It compiles to Apache Pig or Cascading but you don't need to know much about either of them to use it. It provides the ability to write map-reduce queries as programs, not as scripts.

Helix - Cluster Management Framework


Helix is a generic cluster management framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. It helps to perform scheduling of maintenance tasks, such as backups, garbage collection, file consolidation, index rebuilds, repartitioning of data or resources across the cluster, informing dependent systems of changes so they can react appropriately to cluster changes, throttling system tasks and changes and so on.

Luxun - A high-throughput, persistent, distributed, publish-subscribe messaging system based on memo


A high-throughput, persistent, distributed, publish-subscribe messaging system based on memory mapped file and Thrift RPC.

Hazelcast - In-Memory Data Grid for Java


Hazelcast is a clustering and highly scalable data distribution platform for Java. It supports Distributed implementations of java.util.{Queue, Set, List, Map}, java.util.concurrency.locks.Lock, java.util.concurrent.ExecutorService, Distributed Indexing and Query support, Dynamic scaling, partitioning with backups, fail-over, Web-based cluster monitoring tool and lot more.

LuMongo - Realtime Time Distributed Search


LuMongo is a real-time distributed search and storage system based on Lucene. LuMongo is designed from the ground up to scale both vertically and horizontally across servers. LuMongo stores Lucene indexes directly into MongoDB. Documents can be stored natively into MongoDB. When stored natively document can be queried as normal out of MongoDB and use of Map-Reduce and the Aggregation Framework is possible.

bmr-wordcount - Browser Map-Reduce: distributed word count example


Browser Map-Reduce: distributed word count example

woodchuck - Distributed materialized map-reduce object store backed by Redis


Distributed materialized map-reduce object store backed by Redis

disco - a Map/Reduce framework for distributed computing


a Map/Reduce framework for distributed computing

node-mare - A framework for map reduce style distributed task execution using node.js


A framework for map reduce style distributed task execution using node.js

Ceph - Distributed Object Store


Ceph provides seamless access to objects using native language bindings or radosgw, a REST interface that’s compatible with applications written for S3 and Swift. Ceph’s RADOS Block Device (RBD) provides access to block device images that are striped and replicated across the entire storage cluster. Ceph provides a POSIX-compliant network file system that aims for high performance, large data storage, and maximum compatibility with legacy applications.

File-System-Deduplication-using-Map-Reduce-


This is a test project, that aims to perform data deduplication on fies, using Rabin Karp Algorithm for pattern matching and Map Reduce for parallelizing the dedup process.

PHP-Map-Reduce


Implementation of similar to Map/Reduce algorithm in PHP. It uses any database MySQL/Postgres/etc. for tasks management. Source and results data could be stored on Local storage, FTP, Database etc. PHP multi-threading and Cluster modes are supported.

genre-cluster-mr - A Hadoop map reduce that counts genre occurrences in the million song dataset.


A Hadoop map reduce that counts genre occurrences in the million song dataset.

mrjob - Run MapReduce jobs on Hadoop or Amazon Web Services


mrjob is a Python 2.7/3.3+ package that helps you write and run Hadoop Streaming jobs. It fully supports Amazon's Elastic MapReduce (EMR) service, which allows you to buy time on a Hadoop cluster on an hourly basis. mrjob has basic support for Google Cloud Dataproc (Dataproc) which allows you to buy time on a Hadoop cluster on a minute-by-minute basis. It also works with your own Hadoop cluster.

MapRedToRc - Map reduce program output to RC file


Map reduce program output to RC file

emr-oozie-sample - AWS Elastic Map Reduce with Oozie workflow system installed as bootstrap actions


AWS Elastic Map Reduce with Oozie workflow system installed as bootstrap actions