Aqueduct

  •        97

Aqueduct is a framework for analyzing large data sets by composing small functional building blocks into complex pipeline graphs that are processed as streams.

http://aqueduct.codeplex.com/

Tags
Implementation
License
Platform

   




Related Projects

Apache Hive - The Apache Hive (TM) data warehouse software facilitates querying and managing large d


The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage.

social-graph-analysis - Social Graph Analysis using Elastic MapReduce and PyPy


Social Graph Analysis using Elastic MapReduce and PyPy

RNA-Expression-Analysis - Large Scale RNA Expression Analysis using Hadoop MapReduce


Large Scale RNA Expression Analysis using Hadoop MapReduce

EventQL - The database for large-scale event analytics


EventQL is a distributed, column-oriented database built for large-scale event collection and analytics. It runs super-fast SQL and MapReduce queries. Its features include Automatic partitioning, Columnar storage, Standard SQL support, Scales to petabytes, Timeseries and relational data, Fast range scans and lot more.

aqueduct-elastic - Connects to an elastic file server through Aqueduct


Connects to an elastic file server through Aqueduct



hraven - hRaven collects run time data and statistics from MapReduce jobs in an easily queryable format


hRaven collects run time data and statistics from map reduce jobs running on Hadoop clusters and stores the collected job history in an easily queryable format. For the jobs that are run through frameworks (Pig or Scalding/Cascading) that decompose a script or application into a DAG of map reduce jobs for actual execution, hRaven groups job history data together by an application construct. This will allow for easier visualization of all of the component jobs' execution for an application and more comprehensive trending and analysis over time.

DBsyslog - (mongodb mapreduce)Web Traffic Analysis


(mongodb mapreduce)Web Traffic Analysis

CCF-1018625 - Cross-Language Bayesian Models for Web-Scale Text Analysis Using MapReduce


Cross-Language Bayesian Models for Web-Scale Text Analysis Using MapReduce

p3 - An open source pcap packet and NetFlow file analysis tool using Hadoop MapReduce and Hive.


An open source pcap packet and NetFlow file analysis tool using Hadoop MapReduce and Hive.

bgpdoop - An tool for bgp analysis using Hadoop MapReduce and Hive


An tool for bgp analysis using Hadoop MapReduce and Hive

jumbune - Jumbune is an open-source project to optimize both Yarn (v2) and older (v1) Hadoop based solutions


Jumbune is an open-source product built for analyzing Hadoop cluster and MapReduce jobs. It provides development & administrative insights of Hadoop based analytical solutions. It enables user to Debug, Profile, Monitor & Validate analytical solutions hosted on decoupled clusters.

aqueduct - Data integration gem designed to increase data flow between applications


Data integration gem designed to increase data flow between applications

Redisson - Redis based In-Memory Data Grid for Java


Redisson - distributed Java objects and services (Set, Multimap, SortedSet, Map, List, Queue, BlockingQueue, Deque, BlockingDeque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Executor service, Tomcat Session Manager, Scheduler service, JCache API) on top of Redis server. Rich Redis client.

kiji-mapreduce - A framework for MapReduce-based computation over data managed by KijiSchema


A framework for MapReduce-based computation over data managed by KijiSchema

SpiralCrypt Encryption Tools


Command line encryption tool for one time, daemon, or stream data processing. Data stats, check sums, conversion to/from text. Data/keys from files, pipes, standard input. In-place/diverted processing or data-analysis-only. Random, file, password keys.

Kafka - A high-throughput distributed messaging system


Kafka provides a publish-subscribe solution that can handle all activity stream data and processing on a consumer-scale web site. This kind of activity (page views, searches, and other user actions) are a key ingredient in many of the social feature on the modern web. This data is typically handled by "logging" and ad hoc log aggregation solutions due to the throughput requirements. This kind of ad hoc solution is a viable solution to providing logging data to Hadoop.

snappy-spark - Apache Spark with SnappyData extensions


Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing.

MapReduce on Cell


MapReduce is a simple and flexible parallel programming model initially proposed by Google for large scale data processing in a distributed computing environment. This project implements the MapReduce runtime and API for the Cell processor platform.

hdfs2cass - Hadoop mapreduce job to bulk load data into Cassandra


Hadoop mapreduce job to bulk load data into Cassandra

TreeReduction - process tree-structured data using Hadoop MapReduce


process tree-structured data using Hadoop MapReduce