sp - Stream Processors on Kafka in Golang

  •        31

A Swiss army knife for kafka + golang stream processing.

https://github.com/xtaci/sp

Tags
Implementation
License
Platform

   




Related Projects

kcptun - A Secure Tunnel Based On KCP with N:M Multiplexing

  •    Go

kcptun maintains a single website — github.com/xtaci/kcptun. Any websites other than github.com/xtaci/kcptun are not endorsed by xtaci. kcptun won't publish anything on any social media.Download precompiled Releases.

automi - A stream API for Go (alpha)

  •    Go

Automi abstracts away (not too far away) the gnarly details of using Go channels to create pipelined and staged processes. It exposes higher-level API to compose and integrate stream of data over Go channels for processing. This is still alpha work. The API is still evolving and changing rapidly with each commit (beware). Nevertheless, the core concepts are have been bolted onto the API. The following example shows how Automi could be used to compose a multi-stage pipeline to process stream of data from a csv file. The code implements stream processing based on the pipeline patterns. What is clearly absent, however, is the low level channel communication code to coordinate and synchronize goroutines. The programmer is provided a clean surface to express business code without the noisy channel infrastructure code. Underneath the cover however, Automi is using patterns similar to the pipeline patterns to create safe and concurrent structures to execute the processing of the data stream.

Hazelcast Jet - A general purpose distributed data processing engine, built on top of Hazelcast.

  •    Java

Hazelcast Jet is a distributed computing platform built for high-performance stream processing and fast batch processing. It embeds Hazelcast In-Memory Data Grid (IMDG) to provide a lightweight, simple-to-deploy package that includes scalable in-memory storage. Hazelcast Jet performs parallel execution to enable data-intensive applications to operate in near real-time.

wallaroo - Build and scale real-time data applications as easily as writing a Python script

  •    Pony

Wallaroo is a fast, elastic data processing engine that rapidly takes you from prototype to production by eliminating infrastructure complexity. Wallaroo is a fast and elastic data processing engine that rapidly takes you from prototype to production.

Hazelcast Jet - Distributed data processing engine, built on top of Hazelcast

  •    Java

Hazelcast Jet is a distributed computing platform built for high-performance stream processing and fast batch processing. It embeds Hazelcast In Memory Data Grid (IMDG) to provide a lightweight package of a processor and a scalable in-memory storage. It supports distributed java.util.stream API support for Hazelcast data structures such as IMap and IList, Distributed implementations of java.util.{Queue, Set, List, Map} data structures highly optimized to be used for the processing


bistro - A general-purpose data analysis engine radically changing the way batch and stream data is processed

  •    Java

The main general goal of Bistro is data processing. By data processing we mean deriving new data from existing data. Bistro assumes that data is represented as a number of sets of elements. Each element is a tuple which is a combination of column values. A value can be any (Java) object.

goka - Goka is a compact yet powerful distributed stream processing library for Apache Kafka written in Go

  •    Go

Goka is a compact yet powerful distributed stream processing library for Apache Kafka written in Go. Goka aims to reduce the complexity of building highly scalable and highly available microservices. Goka extends the concept of Kafka consumer groups by binding a state table to them and persisting them in Kafka. Goka provides sane defaults and a pluggable architecture.

faust - Python Stream Processing

  •    Python

Faust is a stream processing library, porting the ideas from Kafka Streams to Python. It is used at Robinhood to build high performance distributed systems and real-time data pipelines that process billions of events every day.

spring-cloud-dataflow - Spring Cloud Data Flow is a toolkit for building data integration and real-time data processing pipelines

  •    Java

Spring Cloud Data Flow is a toolkit for building data integration and real-time data processing pipelines.Pipelines consist of Spring Boot apps, built using the Spring Cloud Stream or Spring Cloud Task microservice frameworks.

jstorm - Enterprise Stream Process Engine

  •    Java

Alibaba JStorm is an enterprise fast and stable streaming process engine. It runs program up to 4x faster than Apache Storm. It is easy to switch from record mode to mini-batch mode. It is not only a streaming process engine. It means one solution for real time requirement, whole realtime ecosystem.

streaming-benchmarks - Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache Spark, Apache Flink,

  •    Java

Code licensed under the Apache 2.0 license. See LICENSE file for terms.At Yahoo we have adopted Apache Storm as our stream processing platform of choice. But that was in 2012 and the landscape has changed significantly since then. Because of this we really want to know what Storm is good at, where it needs to be improved compared to other systems, and what its limitations are compared to other tools so we can recommend the best tool for the job to our customers. To do this we started to look for stream processing benchmarks that we could use to do this evaluation, but all of them ended up lacking in several fundamental areas. Primarily they did not test anything close to a read world use case, so we decided to write a simple one. This is the first round of these tests. The tool here is not polished and only covers three tools and one specific use case. We hope to expand this in the future in terms of the tools tested, the variety of processing tested, and the metrics gathered.

Apache Flink - Platform for Scalable Batch and Stream Data Processing

  •    Java

Apache Flink is an open source platform for scalable batch and stream data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.

Apache Beam - Unified model for defining both batch and streaming data-parallel processing pipelines

  •    Java

Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the pipeline. The pipeline is then executed by one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.

incubator-doris - Palo,an MPP data warehouse

  •    C++

Palo is an MPP-based interactive SQL data warehousing for reporting and analysis. Palo mainly integrates the technology of Google Mesa and Apache Impala. Unlike other popular SQL-on-Hadoop systems, Palo is designed to be a simple and single tightly coupled system, not depending on other systems. Palo not only provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Palo not only provides batch data loading, but also provides near real-time mini-batch data loading. Palo also provides high availability, reliability, fault tolerance, and scalability. The simplicity (of developing, deploying and using) and meeting many data serving requirements in single system are the main features of Palo. In Baidu, the largest Chinese search engine, we run a two-tiered data warehousing system for data processing, reporting and analysis. Similar to lambda architecture, the whole data warehouse comprises data processing and data serving. Data processing does the heavy lifting of big data: cleaning data, merging and transforming it, analyzing it and preparing it for use by end user queries; data serving is designed to serve queries against that data for different use cases. Currently data processing includes batch data processing and stream data processing technology, like Hadoop, Spark and Storm; Palo is a SQL data warehouse for serving online and interactive data reporting and analysis querying.

riemann - A network event stream processing system, in Clojure.

  •    Clojure

A network event stream processing system, in Clojure.

Samza - Distributed Stream Processing Framework

  •    Java

Apache Samza is a distributed stream processing framework. It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management. It provides a very simple call-back based process message API that should be familiar to anyone who's used Map/Reduce. Samza was originally developed at LinkedIn. It's currently used to process tracking data, service log data, and for data ingestion pipelines for realtime services.

riemann - A network event stream processing system, in Clojure.

  •    Clojure

Riemann aggregates events from your servers and applications with a powerful stream processing language.

awesome-streaming - a curated list of awesome streaming frameworks, applications, etc

  •    

A curated list of awesome streaming (stream processing) frameworks, applications, readings and other resources. Inspired by other awesome projects.

game - a game service using stream processing paradigm

  •    Go

a game service using stream processing paradigm