Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON. With Miller, you get to use named fields without needing to count positional indices, using familiar formats such as CSV, TSV, JSON, and positionally-indexed.
data-processing data-cleaning csv csv-files csv-format csv-reader streaming-data streaming-algorithms tsv json json-data data-reduction data-regression statistics statistical-analysis devops devops-tools tabular-data command-line command-line-toolsThere are a few optional keyword arguments that are useful only for S3 access. These are both passed to boto.s3_connect() as keyword arguments. The S3 reader supports gzipped content, as long as the key is obviously a gzipped file (e.g. ends with ".gz").
s3 hdfs webhdfs boto streaming file streaming-data gzip-stream bz2River is a Python library for online machine learning. It is the result of a merger between creme and scikit-multiflow. River's ambition is to be the go-to library for doing machine learning on streaming data. As a quick example, we'll train a logistic regression to classify the website phishing dataset. Here's a look at the first observation in the dataset.
data-science machine-learning streaming online-learning streaming-data concept-drift incremental-learning online-machine-learning online-statisticsKafka UI is a UI for Apache Kafka to monitor and manage Apache Kafka clusters. It is a simple tool that makes your data flows observable, helps find and troubleshoot issues faster and deliver optimal performance. Its lightweight dashboard makes it easy to track key metrics of your Kafka clusters - Brokers, Topics, Partitions, Production, and Consumption.
kafka big-data web-ui streams kafka-connect apache-kafka-ui kafka-producer kafka-client kafka-streams hacktoberfest streaming-data kafka-manager kafka-cluster event-streaming cluster-management kafka-brokersPravega is an open source distributed storage service implementing Streams. It offers Stream as the main primitive for the foundation of reliable storage systems: a high-performance, durable, elastic, and unlimited append-only byte stream with strict ordering and consistency.
streaming streaming-data distributed-storage real-time-data data-ingestion analyticsMemgraph is a streaming graph application platform that helps you wrangle your streaming data, build sophisticated models that you can query in real-time, and develop graph applications.
kafka graph graph-algorithms nosql stream-processing graph-database kafka-streams cypher graph-analysis streaming-data opencypherThis list is for anyone wishing to learn about Apache Kafka, but do not have a starting point. You can help by sending Pull Requests to add more information.
kafka streaming-data data-pipeline stream-processing apache-kafka apache-sparkCinje is a modern, elegant template engine constructed as a Python domain specific language (DSL) that integrates into your applications as any other Python code would: by importing them. Your templates are transformed from their source into clean, straightforward, and understandable Python source prior to the Python interpreter compiling it to bytecode. It's a word from the constructed language Lojban. A combination of Hindi "Å›ikana", English "wrinkle", and Chinese "zhé". It translates as "is a wrinkle/crease/fold [shape] in". It's also a Hungarian noun representing the posessive third-person singular form of "cin", meaning "tin". The "c" makes a "sh" sound, the "j" makes a "jy" sound almost like the "is" in "vision". Correct use does not capitalize the name except at the beginning of sentences.
pypy template-engine streaming-data text-processing dsl cpython python-2 python-3A Java API for creating unified big-data processing flows providing an engine independent programming model which can express both batch and stream transformations.
big-data apache-flink apache-spark java-api hadoop kafka hdfs unified-bigdata-processing streaming-data batch-processingBackground subtraction (BS) is the art of separating moving objects from their background. The Background Modeling (BM) is one of the main steps of the BS process. Several subspace learning (SL) algorithms based on matrix and tensor tools have been used to perform the BM of the scenes. However, several SL algorithms work on a batch process increasing memory consumption when data size is very large. Moreover, these algorithms are not suitable for streaming data when the full size of the data is unknown. In this work, we propose an incremental tensor subspace learning that uses only a small part of the entire data and updates the low-rank model incrementally when new data arrive. In addition, the multi-feature model allows us to build a robust low-rank background model of the scene. Experimental results shows that the proposed method achieves interesting results for background subtraction task. The source code is available only for academic/research purposes (non-commercial).
tensor background-subtraction subspace-learning matlab streaming-data foreground-detectionRxHttpClient is a "reactive wrapper" around NSURLSession. Under the hood it implements session delegates (like NSURLSessionDelegate or NSURLSessionTaskDelegate) and translates session events into Observables using RxSwift. Main purpose of this framework is to make "streaming" data as simple as possible and provide convenient features for caching data. RxHttpClient uses RxSwift so it should be included into cartfile.
rxswift streaming-data nsurlsessionThis repository contains code samples to demonstrate how developers can work with Pravega. We also provide code samples to connect analytics engines such as Flink and Hadoop with Pravega as a storage substrate for data streams. For more information on Pravega, we recommend to read the documentation and the developer guide.
pravega streaming-data data-streaming sample-appFrizzle is a magic message (Msg) bus designed for parallel processing with many goroutines. Start with the example implementation which shows a simple canonical implementation of a Processor on top of Frizzle and most of the core functions.
golang-library message-bus pipeline stream-processing streaming-data kafka kinesis consumer producerOwner of artifact grants ACM permission to serve the artifact to users of the ACM Digital Library. Saber has been implemented in Java and C. The Java code is compiled and packaged using Apache Maven (3.3.1) and the Java SDK (1.7.0.79). The C code is compiled and packaged using GNU make (3.81) and gcc (4.8.4).
stream-processing hybrid multicore gpu high-throughput saber stream streaming-data streaming sliding-windows multicore-cpuMachine is a library for creating data workflows. These workflows can be either very concise or quite complex, even allowing for cycles for flows that need retry or self healing mechanisms. It supports opentelemetry spans and metrics out of the box and supports building dynamic pipelines using native go plugins and hashicorp or yaegi based plugins by using the providers here. Contributions, issues and feature requests are welcome. Feel free to check issues page if you want to contribute. Check the contributing guide.
workflow pipeline workflow-engine pipeline-framework stream-processing golang-library mit-license golang-package streaming-data github-actions golangci-lint codespacesWSO2 Streaming Integrator (SI) is a streaming data processing server that allows you to integrate streaming data and take action based on streaming data. WSO2 SI is powered by Siddhi.io, a well-known cloud native open source stream processing engine. Siddhi lets users write complex stream processing logic using a SQL-like language known as SiddhiQL. You can aggregate, transform, enrich, analyze, cleanse and correlate streams of data on the fly using Siddhi queries and constructs.
real-time wso2 stream-processing event-driven cloud-native streaming-data siddhi streaming-integrationMaki Nage is a Python stream processing library and framework. It provides expressive and extensible APIs. Maki Nage speeds up the development of stream applications. It can be used to process stream and batch data. More than that, it allows to develop an application with batch data, and deploy it as a Kafka micro-service. Read the book to learn more.
distributed-systems machine-learning streaming kafka reactive-programming stream-processing streaming-data reactive-systems reactive-machine-learningProject files for the accompanying post, Streaming Data Analytics with Amazon Kinesis Data Firehose, Amazon Redshift, and Amazon QuickSight. See post for the most up-to-date instructions for using the source code.
aws kinesis-firehose redshift streaming-dataThis library includes many useful utilities for C++. Also make sure you have a modern C++ compiler installed and you're ready to run all the examples in the project.
color json utility terminal containers filesystem functions progress-bar file data-structures streaming-data
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.