PigPen - Map-Reduce for Clojure

  •        0

PigPen is map-reduce for Clojure, or distributed Clojure. It compiles to Apache Pig or Cascading but you don't need to know much about either of them to use it. It provides the ability to write map-reduce queries as programs, not as scripts.




comments powered by Disqus

Related Projects

Hadoop Common

Apache Hadoop is a framework for running applications on large clusters built of commodity hardware. Hadoop common supports other Hadoop subprojects

HPCC System - Hadoop alternative

HPCC is a proven and battle-tested platform for manipulating, transforming, querying and data warehousing Big Data. It supports two type of configuration. Thor is responsible for consuming vast amounts of data, transforming, linking and indexing that data. It functions as a distributed file system with parallel processing power spread across the nodes. Roxie, the Data Delivery Engine, provides separate high-performance online query processing and data warehouse capabilities.


Nutch is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.

Hue - The open source Apache Hadoop UI

Hue is a Web application for interacting with Apache Hadoop. It supports a FileBrowser for accessing HDFS, JobBrowser for accessing MapReduce jobs (MR1/MR2-YARN), Job Designer for creating MapReduce/Streaming/Java jobs, HBase Browser for exploring and modifying HBase tables and data, Oozie App for submitting and scheduling workflows and bundles, A Pig/HBase/Sqoop2 shell, Beeswax application for executing Hive queries, Search app for querying Solr and Solr Cloud.

Terrier - Information Retrieval Platform

Terrier is a highly flexible, efficient, and effective open source search engine, readily deployable on large-scale collections of documents. Terrier implements state-of-the-art indexing and retrieval functionalities, and provides an ideal platform for the rapid development and evaluation of large-scale retrieval applications. Terrier can index large corpora of documents, and provides multiple indexing strategies, such as multi-pass, single-pass and large-scale MapReduce indexing.

Fluent NHibernate - Fluent mapping for model

Fluent, XML-less, compile safe, automated, convention-based mappings for NHibernate. Fluent NHibernate offers an alternative to NHibernate's standard XML mapping files. Rather than writing XML documents, you write mappings in strongly typed C# code. This allows for easy refactoring, improved readability and more concise code.

Terrastore - Scalable, elastic, consistent document store.

Terrastore is a modern document store which provides advanced scalability and elasticity features without sacrificing consistency. Terrastore is based on Terracotta, so it relies on an industry-proven, fast (and cool) clustering technology. Terrastore is accessed through the universally supported HTTP protocol. Terrastore is a distributed document store supporting single-cluster and multi-cluster deployments.

disco - a Map/Reduce framework for distributed computing

a Map/Reduce framework for distributed computing

Spark - Fast Cluster Computing

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.


a Map/Reduce framework for distributed computing