Hadoop Common

Apache Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or reexecuted on any node in the cluster. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both Map/Reduce and the distributed file system are designed so that node failures are automatically handled by the framework.



http://hadoop.apache.org/common/


Bookmark and Share          4920



comments powered by Disqus


Related Products

HPCC System - Hadoop alternative

HPCC is a proven and battle-tested platform for manipulating, transforming, querying and data warehousing Big Data. It supports two type of configuration. Thor is responsible for consuming vast amounts of data, transforming, linking and indexing that data. It functions as a distributed file system with parallel processing power spread across the nodes. Roxie, the Data Delivery Engine, provides separate high-performance online query processing and data warehouse capabilities.

Read more

Cascading - Data Processing Workflows on Hadoop

Cascading is a Data Processing API, Process Planner, and Process Scheduler used for defining and executing complex, scale-free, and fault tolerant data processing workflows on an Apache Hadoop cluster. It is a thin Java library and API that sits on top of Hadoop's MapReduce layer and is executed from the command line like any other Hadoop application.

Read more

Nutch

Nutch is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.

Read more

CouchDB

Apache CouchDB is a document-oriented database that can be queried and indexed in a MapReduce fashion using JavaScript. CouchDB also offers incremental replication with bi-directional conflict detection and resolution.

Read more

Raven DB - document database for .NET

Raven is an document database for the .NET/Windows platform. Raven offers a flexible data model design to fit the needs of real world systems. Raven stores schema-less JSON documents, allow you to define indexes using Linq queries and focus on low latency and high performance.

Read more

Sqoop - Transfers data between Hadoop and Datastores

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. You can use Sqoop to import data from external structured datastores into Hadoop Distributed File System or related systems like Hive and HBase. Conversely, Sqoop can be used to extract data from Hadoop and export it to external structured datastores such as relational databases and enterprise data warehouses.

Read more

Mongodb-CSharp - C# driver to connect MongoDB

This is a driver to connect to MongoDB using .Net. It is written entirely in C# and has been tested and developed under both Windows and Mono 2.0 (Ubuntu 32-bit 9.04). Currently many features have been implemented with a few remaining. The api is very likely to change and be in flux for a while but is quickly settling down.

Read more

Carrot2 - Search Results Clustering Engine

Carrot2 is an Open Source Search Results Clustering Engine. It could cluster the search results from various sources and generates small collection of documents. Carrot2 offers ready-to-use components for fetching search results from various sources including YahooAPI, GoogleAPI, Bing API, eTools Meta Search, Lucene, SOLR, Google Desktop and more.

Read more

Ganglia - scalable distributed monitoring system

Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization.

Read more

jCharts - Java based charting utility

jCharts is a 100% Java based charting utility that outputs a variety of charts. Servlets, JSP's, and Swing application could use this library to generate charts. It could generate charts of type Area, Area Stacked, Bar, Bar Clustered, Bar Clustered Horizontal, Bar Horizontal, Bar Stacked, Bar Stacked Horizontal, Combo, Hi/Low Open/Close, Line, Pie 2D, Pie 3D, Point, Radar, XY Plot and lot more.

Read more

Follow feeds Follow bestopensource on Twitter Follow bestopensource on Facebook

Enter your email address:

Delivered by FeedBurner



Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.