Hadoop Common
Apache Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or reexecuted on any node in the cluster. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both Map/Reduce and the distributed file system are designed so that node failures are automatically handled by the framework.
http://hadoop.apache.org/common/
comments powered by Disqus
Related Products
HPCC System - Hadoop alternative
HPCC is a proven and battle-tested platform for manipulating, transforming, querying and data warehousing Big Data.
It supports two type of configuration. Thor is responsible for consuming vast amounts of data, transforming, linking and indexing that data. It functions as a distributed file system with parallel processing power spread across the nodes. Roxie, the Data Delivery Engine, provides separate high-performance online query processing and data warehouse capabilities.
Cascading - Data Processing Workflows on Hadoop
Cascading is a Data Processing API, Process Planner, and Process Scheduler used for defining and executing complex, scale-free, and fault tolerant data processing workflows on an Apache Hadoop cluster. It is a thin Java library and API that sits on top of Hadoop's MapReduce layer and is executed from the command line like any other Hadoop application.
Nutch
Nutch is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.
CouchDB
Apache CouchDB is a document-oriented database that can be queried and indexed in a MapReduce fashion using JavaScript. CouchDB also offers incremental replication with bi-directional conflict detection and resolution.
Raven DB - document database for .NET
Raven is an document database for the .NET/Windows platform. Raven offers a flexible data model design to fit the needs of real world systems. Raven stores schema-less JSON documents, allow you to define indexes using Linq queries and focus on low latency and high performance.
Sqoop - Transfers data between Hadoop and Datastores
Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. You can use Sqoop to import data from external structured datastores into Hadoop Distributed File System or related systems like Hive and HBase. Conversely, Sqoop can be used to extract data from Hadoop and export it to external structured datastores such as relational databases and enterprise data warehouses.
Mongodb-CSharp - C# driver to connect MongoDB
This is a driver to connect to MongoDB using .Net. It is written entirely in C# and has been tested and developed under both Windows and Mono 2.0 (Ubuntu 32-bit 9.04). Currently many features have been implemented with a few remaining. The api is very likely to change and be in flux for a while but is quickly settling down.
Carrot2 - Search Results Clustering Engine
Carrot2 is an Open Source Search Results Clustering Engine. It could cluster the search results from various sources and generates small collection of documents. Carrot2 offers ready-to-use components for fetching search results from various sources including YahooAPI, GoogleAPI, Bing API, eTools Meta Search, Lucene, SOLR, Google Desktop and more.
Ganglia - scalable distributed monitoring system
Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization.
jCharts - Java based charting utility
jCharts is a 100% Java based charting utility that outputs a variety of charts. Servlets, JSP's, and Swing application could use this library to generate charts. It could generate charts of type Area, Area Stacked, Bar, Bar Clustered, Bar Clustered Horizontal, Bar Horizontal, Bar Stacked, Bar Stacked Horizontal, Combo, Hi/Low Open/Close, Line, Pie 2D, Pie 3D, Point, Radar, XY Plot and lot more.