Displaying 1 to 20 from 20 results

smart_open - Utils for streaming large files (S3, HDFS, gzip, bz2...)

  •    Python

There are a few optional keyword arguments that are useful only for S3 access. These are both passed to boto.s3_connect() as keyword arguments. The S3 reader supports gzipped content, as long as the key is obviously a gzipped file (e.g. ends with ".gz").

river - 🌊 Online machine learning in Python

  •    Python

River is a Python library for online machine learning. It is the result of a merger between creme and scikit-multiflow. River's ambition is to be the go-to library for doing machine learning on streaming data. As a quick example, we'll train a logistic regression to classify the website phishing dataset. Here's a look at the first observation in the dataset.

Kafka UI - Open-Source Web UI for Apache Kafka Management

  •    TypeScript

Kafka UI is a UI for Apache Kafka to monitor and manage Apache Kafka clusters. It is a simple tool that makes your data flows observable, helps find and troubleshoot issues faster and deliver optimal performance. Its lightweight dashboard makes it easy to track key metrics of your Kafka clusters - Brokers, Topics, Partitions, Production, and Consumption.

Pravega - Streaming as a new software defined storage primitive

  •    Java

Pravega is an open source distributed storage service implementing Streams. It offers Stream as the main primitive for the foundation of reliable storage systems: a high-performance, durable, elastic, and unlimited append-only byte stream with strict ordering and consistency.

Memgraph - Build modern, graph-based applications on top of your streaming data in minutes

  •    C++

Memgraph is a streaming graph application platform that helps you wrangle your streaming data, build sophisticated models that you can query in real-time, and develop graph applications.

awesome-kafka - A list about Apache Kafka


This list is for anyone wishing to learn about Apache Kafka, but do not have a starting point. You can help by sending Pull Requests to add more information.

cinje - A Pythonic and ultra fast template engine DSL.

  •    Python

Cinje is a modern, elegant template engine constructed as a Python domain specific language (DSL) that integrates into your applications as any other Python code would: by importing them. Your templates are transformed from their source into clean, straightforward, and understandable Python source prior to the Python interpreter compiling it to bytecode. It's a word from the constructed language Lojban. A combination of Hindi "śikana", English "wrinkle", and Chinese "zhé". It translates as "is a wrinkle/crease/fold [shape] in". It's also a Hungarian noun representing the posessive third-person singular form of "cin", meaning "tin". The "c" makes a "sh" sound, the "j" makes a "jy" sound almost like the "is" in "vision". Correct use does not capitalize the name except at the beginning of sentences.

euphoria - Euphoria is an open source Java API for creating unified big-data processing flows

  •    Java

A Java API for creating unified big-data processing flows providing an engine independent programming model which can express both batch and stream transformations.

imtsl - IMTSL - Incremental and Multi-feature Tensor Subspace Learning

  •    Matlab

Background subtraction (BS) is the art of separating moving objects from their background. The Background Modeling (BM) is one of the main steps of the BS process. Several subspace learning (SL) algorithms based on matrix and tensor tools have been used to perform the BM of the scenes. However, several SL algorithms work on a batch process increasing memory consumption when data size is very large. Moreover, these algorithms are not suitable for streaming data when the full size of the data is unknown. In this work, we propose an incremental tensor subspace learning that uses only a small part of the entire data and updates the low-rank model incrementally when new data arrive. In addition, the multi-feature model allows us to build a robust low-rank background model of the scene. Experimental results shows that the proposed method achieves interesting results for background subtraction task. The source code is available only for academic/research purposes (non-commercial).

RxHttpClient - Simple Http client (Use RxSwift for stream data)

  •    Swift

RxHttpClient is a "reactive wrapper" around NSURLSession. Under the hood it implements session delegates (like NSURLSessionDelegate or NSURLSessionTaskDelegate) and translates session events into Observables using RxSwift. Main purpose of this framework is to make "streaming" data as simple as possible and provide convenient features for caching data. RxHttpClient uses RxSwift so it should be included into cartfile.

pravega-samples - Sample Applications for Pravega.

  •    Java

This repository contains code samples to demonstrate how developers can work with Pravega. We also provide code samples to connect analytics engines such as Flink and Hadoop with Pravega as a storage substrate for data streams. For more information on Pravega, we recommend to read the documentation and the developer guide.

frizzle - The magic message bus

  •    Go

Frizzle is a magic message (Msg) bus designed for parallel processing with many goroutines. Start with the example implementation which shows a simple canonical implementation of a Processor on top of Frizzle and most of the core functions.

Saber - Window-Based Hybrid CPU/GPU Stream Processing Engine

  •    Java

Owner of artifact grants ACM permission to serve the artifact to users of the ACM Digital Library. Saber has been implemented in Java and C. The Java code is compiled and packaged using Apache Maven (3.3.1) and the Java SDK ( The C code is compiled and packaged using GNU make (3.81) and gcc (4.8.4).

machine - Machine is a workflow/pipeline library for processing data

  •    Go

Machine is a library for creating data workflows. These workflows can be either very concise or quite complex, even allowing for cycles for flows that need retry or self healing mechanisms. It supports opentelemetry spans and metrics out of the box and supports building dynamic pipelines using native go plugins and hashicorp or yaegi based plugins by using the providers here. Contributions, issues and feature requests are welcome. Feel free to check issues page if you want to contribute. Check the contributing guide.

streaming-integrator - A stream processing runtime that allows connecting any streaming data source to any destination and act on it

  •    Python

WSO2 Streaming Integrator (SI) is a streaming data processing server that allows you to integrate streaming data and take action based on streaming data. WSO2 SI is powered by Siddhi.io, a well-known cloud native open source stream processing engine. Siddhi lets users write complex stream processing logic using a SQL-like language known as SiddhiQL. You can aggregate, transform, enrich, analyze, cleanse and correlate streams of data on the fly using Siddhi queries and constructs.

makinage - Stream Processing Made Easy

  •    Python

Maki Nage is a Python stream processing library and framework. It provides expressive and extensible APIs. Maki Nage speeds up the development of stream applications. It can be used to process stream and batch data. More than that, it allows to develop an application with batch data, and deploy it as a Kafka micro-service. Read the book to learn more.


  •    Python

Project files for the accompanying post, Streaming Data Analytics with Amazon Kinesis Data Firehose, Amazon Redshift, and Amazon QuickSight. See post for the most up-to-date instructions for using the source code.

utilities - Utilities for Modern C++

  •    C++

This library includes many useful utilities for C++. Also make sure you have a modern C++ compiler installed and you're ready to run all the examples in the project.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.