Displaying 1 to 8 from 8 results

jstorm - Enterprise Stream Process Engine


Alibaba JStorm is an enterprise fast and stable streaming process engine. It runs program up to 4x faster than Apache Storm. It is easy to switch from record mode to mini-batch mode. It is not only a streaming process engine. It means one solution for real time requirement, whole realtime ecosystem.

Apache Tez - A Framework for YARN-based, Data Processing Applications In Hadoop


Apache Tez is an extensible framework for building high performance batch and interactive data processing applications, coordinated by YARN in Apache Hadoop. Tez improves the MapReduce paradigm by dramatically improving its speed, while maintaining MapReduce’s ability to scale to petabytes of data. Important Hadoop ecosystem projects like Apache Hive and Apache Pig use Apache Tez, as do a growing number of third party data access applications developed for the broader Hadoop ecosystem.

Fluo - Make incremental updates to large data sets stored in Apache Accumulo


Apache Fluo (incubating) is an open source implementation of Percolator (which populates Google's search index) for Apache Accumulo. Fluo makes it possible to update the results of a large-scale computation, index, or analytic as new data is discovered. When combining new data with existing data, Fluo offers reduced latency when compared to batch processing frameworks (e.g Spark, MapReduce).

Hazelcast Jet - Distributed data processing engine, built on top of Hazelcast


Hazelcast Jet is a distributed computing platform built for high-performance stream processing and fast batch processing. It embeds Hazelcast In Memory Data Grid (IMDG) to provide a lightweight package of a processor and a scalable in-memory storage. It supports distributed java.util.stream API support for Hazelcast data structures such as IMap and IList, Distributed implementations of java.util.{Queue, Set, List, Map} data structures highly optimized to be used for the processing




Apache Beam - Unified model for defining both batch and streaming data-parallel processing pipelines


Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the pipeline. The pipeline is then executed by one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.

spring-cloud-dataflow - Spring Cloud Data Flow is a toolkit for building data integration and real-time data processing pipelines


Spring Cloud Data Flow is a toolkit for building data integration and real-time data processing pipelines.Pipelines consist of Spring Boot apps, built using the Spring Cloud Stream or Spring Cloud Task microservice frameworks.

batch-shipyard - Execute batch and HPC Dockerized workloads on Azure Batch with shared file system provisioning and linking support


Additionally, Batch Shipyard provides the ability to provision and manage entire standalone remote file systems (storage clusters) in Azure, independent of any integrated Azure Batch functionality.Batch Shipyard is now integrated directly into Azure Cloud Shell and you can execute any Batch Shipyard workload using your web browser or the Microsoft Azure Android and iOS app.