Alibaba JStorm is an enterprise fast and stable streaming process engine. It runs program up to 4x faster than Apache Storm. It is easy to switch from record mode to mini-batch mode. It is not only a streaming process engine. It means one solution for real time requirement, whole realtime ecosystem.
stream-processing batch-processing real-time data-processing distributedApache Tez is an extensible framework for building high performance batch and interactive data processing applications, coordinated by YARN in Apache Hadoop. Tez improves the MapReduce paradigm by dramatically improving its speed, while maintaining MapReduce’s ability to scale to petabytes of data. Important Hadoop ecosystem projects like Apache Hive and Apache Pig use Apache Tez, as do a growing number of third party data access applications developed for the broader Hadoop ecosystem.
map-reduce batch-processing data-processing big-data hadoop yarn directed-acyclic-graphHazelcast Jet is a distributed computing platform built for high-performance stream processing and fast batch processing. It embeds Hazelcast In-Memory Data Grid (IMDG) to provide a lightweight, simple-to-deploy package that includes scalable in-memory storage. Hazelcast Jet performs parallel execution to enable data-intensive applications to operate in near real-time.
in-memory data-grid big-data stream-processing data-processing real-time streams batch-processingBuild concurrent and multi-stage data ingestion and data processing pipelines with Elixir. It allows developers to consume data efficiently from different sources, known as producers, such as Amazon SQS, Apache Kafka, Google Cloud PubSub, RabbitMQ, and others. Broadway takes the burden of defining concurrent GenStage topologies and provide a simple configuration API that automatically defines concurrent producers, concurrent processing, batch handling, and more, leading to both time and cost efficient ingestion and processing of data.
data-processing concurrent data-pipeline batch-processingEasy Batch parent module
batch batch-processingApache Fluo (incubating) is an open source implementation of Percolator (which populates Google's search index) for Apache Accumulo. Fluo makes it possible to update the results of a large-scale computation, index, or analytic as new data is discovered. When combining new data with existing data, Fluo offers reduced latency when compared to batch processing frameworks (e.g Spark, MapReduce).
percolator incremental-updates batch-processing big-dataHazelcast Jet is a distributed computing platform built for high-performance stream processing and fast batch processing. It embeds Hazelcast In Memory Data Grid (IMDG) to provide a lightweight package of a processor and a scalable in-memory storage. It supports distributed java.util.stream API support for Hazelcast data structures such as IMap and IList, Distributed implementations of java.util.{Queue, Set, List, Map} data structures highly optimized to be used for the processing
data-grid data-processing data-streaming in-memory batch-processing stream-processingApache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the pipeline. The pipeline is then executed by one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.
data-processing data-streaming batch-processing stream-processing distributed big-dataSpring Cloud Data Flow is a toolkit for building data integration and real-time data processing pipelines.Pipelines consist of Spring Boot apps, built using the Spring Cloud Stream or Spring Cloud Task microservice frameworks.
microservices-architecture orchestration stream-processing batch-processing predictive-analytics cloud-native portabilityAdditionally, Batch Shipyard provides the ability to provision and manage entire standalone remote file systems (storage clusters) in Azure, independent of any integrated Azure Batch functionality.Batch Shipyard is now integrated directly into Azure Cloud Shell and you can execute any Batch Shipyard workload using your web browser or the Microsoft Azure Android and iOS app.
azure-batch docker hpc mpi gpu infiniband rdma azure nvidia-docker batch-processing nfs glusterfs smb azure-functionsPSSH is supported on Python 2.5 and later (including Python 3.1 and later). It was originally written and maintained by Brent N. Chun. Due to his busy schedule, Brent handed over maintenance to Andrew McNabb in October 2009. This project originally located at Google Code. Since Google Code has been closed, and has not appeared elsewhere, I (lilydjwg) have exported it to GitHub.
pssh ssh parallel-processing batch-processingThe monad-batcher package provides the Batcher applicative monad that batches commands for later more efficient execution. See the example.
haskell-library monad batch-processingLauncher is a utility for performing simple, data parallel, high throughput computing (HTC) workflows on clusters, massively parallel processor (MPP) systems, workgroups of computers, and personal machines. Launcher does not need to be compiled. Unpack the tarball or clone the repository in the desired directory. Then, set LAUNCHER_DIR to point to that location. Python 2.7 or greater and hwloc are required for full functionality. See INSTALL for more information.
hpc batch-processing tacc xsede parametric-submission heterogeneous launcher shellAsakusa Framework Parent POM
asakusa-framework batch batch-processing hadoop mapreduce data-flow framework big-dataA Java API for creating unified big-data processing flows providing an engine independent programming model which can express both batch and stream transformations.
big-data apache-flink apache-spark java-api hadoop kafka hdfs unified-bigdata-processing streaming-data batch-processing🐤 pssst! - ida-batch_decompile is also part of project: unbox - a nobrainer commandline tool to unpack and decompile all sorts of things.
ida decompile ida-plugin batch-processing ida-batch-decompile reverse-engineeringCORB is a Java tool designed for bulk content-reprocessing of documents stored in MarkLogic. CORB stands for COntent Reprocessing in Bulk and is a multi-threaded workhorse tool at your disposal. In a nutshell, CORB works off a list of documents in a database and performs operations against those documents. CORB operations can include generating a report across all documents, manipulating the individual documents, or a combination thereof. This document provides a comprehensive overview of CORB and the options available to customize the execution of a CORB job, as well as the ModuleExecutor Tool, which can be used to execute a single (XQuery or JavaScript) module in MarkLogic.
batch-processing marklogic corb xcc corb-jobs batch-job javascript-modules xqueryRecursive Neural Networks for PyTorch, with efficient batch processing.
pytorch neural-network lstm tree-lstm recursive-neural-networks tree-structure batch-processing deep-learningThe executable file bin/sesh is all you need.
ops batch-processing ssh
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.