smart_open - Utils for streaming large files (S3, HDFS, gzip, bz2...)

  •        94

There are a few optional keyword arguments that are useful only for S3 access. These are both passed to boto.s3_connect() as keyword arguments. The S3 reader supports gzipped content, as long as the key is obviously a gzipped file (e.g. ends with ".gz").

https://github.com/RaRe-Technologies/smart_open

Tags
Implementation
License
Platform

   




Related Projects

AthenaX - SQL-based streaming analytics platform at scale

  •    Java

AthenaX is a streaming analytics platform that enables users to run production-quality, large scale streaming analytics using Structured Query Language (SQL). AthenaX was released and open sourced by Uber Technologies. It is capable of scaling across hundreds of machines and processing hundreds of billions of real-time events daily.Apache 2.0 License.

s3-upload-stream - A Node.js module for streaming data to Amazon S3 via the multipart upload API

  •    Javascript

A pipeable write stream which uploads to Amazon S3 using the multipart file upload API. NOTE: This module is deprecated after the 2.1.0 release of the AWS SDK on Dec 9, 2014, which added S3.upload(). I highly recommend switching away from this module and using the official method supported by AWS.

Pravega - Streaming as a new software defined storage primitive

  •    Java

Pravega is an open source distributed storage service implementing Streams. It offers Stream as the main primitive for the foundation of reliable storage systems: a high-performance, durable, elastic, and unlimited append-only byte stream with strict ordering and consistency.

Apache Flink - Platform for Scalable Batch and Stream Data Processing

  •    Java

Apache Flink is an open source platform for scalable batch and stream data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.

Apache Beam - Unified model for defining both batch and streaming data-parallel processing pipelines

  •    Java

Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the pipeline. The pipeline is then executed by one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.


streamDM - Stream Data Mining Library for Spark Streaming

  •    Scala

streamDM is a new open source software for mining big data streams using Spark Streaming, started at Huawei Noah's Ark Lab. streamDM is licensed under Apache Software License v2.0. Big Data stream learning is more challenging than batch or offline learning, since the data may not keep the same distribution over the lifetime of the stream. Moreover, each example coming in a stream can only be processed once, or they need to be summarized with a small memory footprint, and the learning algorithms must be very efficient.

skipper - Streaming multi-uploads for Sails/Express - supports disk, S3, gridfs, and custom file adapters

  •    Javascript

Skipper makes it easy to implement streaming file uploads to disk, S3, or any supported file upload adapters.The following example assumes skipper is already installed as the body parser in your Express or Sails app. It receives one or more files from a file parameter named avatar using the default, built-in file adapter (skipper-disk). This streams the file(s) to the default upload directory .tmp/uploads/ on the server's local disk.

react-native-audio-streaming - iOS & Android react native module to play an audio stream, with background support and media controls

  •    Java

react-native-audio-streaming is not maintained anymore. The main purpose was to play shoutcast streams with meta data and display a notification while playing. If you are only looking to play local audio file with app in foreground, please see other audio libs.

Hosebird client - A Java HTTP client for consuming Twitter's Streaming API

  •    Java

A Java HTTP client for consuming Twitter's Streaming API. It has GZip support, OAuth support, Partitioning support, Automatic reconnections with appropriate backfill counts, Access to raw bytes payload, Proper backoffs/retry schemes, Relevant statistics/events, Control stream support for sitestreams.

koel - A personal music streaming server that works.

  •    PHP

Koel (also stylized as koel, with a lowercase k) is a simple web-based personal audio streaming service written in Vue on the client side and Laravel on the server side. Targeting web developers, Koel embraces some of the more modern web technologies – flexbox, audio, and drag-and-drop API to name a few – to do its job.

restreamer - Datarhei/Restreamer allows you to do h

  •    Javascript

Datarhei/Restreamer offers smart free video streaming. Stream H.264 video of IP cameras live to your website. Upload your live video on YouTube-Live, Ustream, Twitch, Livestream.com or any other streaming solutions e.g. Wowza-Streaming-Engine. Our Docker-Image is easy to install and runs on Linux, MacOS and Windows. Datarhei/Restreamer can be perfectly combined with single-board computers like Raspberry Pi and Odroid. It is free (licensed under Apache 2.0) and you can use it for any purpose, private or commercial. Documentation is available on Datarhei/Restreamer GitHub pages. We give you a lot of of informations from setting up a camera, embedding your player upon your website and streaming to services like e.g. YouTube-Live, Ustream and Livestream.com and many more things.

automi - A stream API for Go (alpha)

  •    Go

Automi abstracts away (not too far away) the gnarly details of using Go channels to create pipelined and staged processes. It exposes higher-level API to compose and integrate stream of data over Go channels for processing. This is still alpha work. The API is still evolving and changing rapidly with each commit (beware). Nevertheless, the core concepts are have been bolted onto the API. The following example shows how Automi could be used to compose a multi-stage pipeline to process stream of data from a csv file. The code implements stream processing based on the pipeline patterns. What is clearly absent, however, is the low level channel communication code to coordinate and synchronize goroutines. The programmer is provided a clean surface to express business code without the noisy channel infrastructure code. Underneath the cover however, Automi is using patterns similar to the pipeline patterns to create safe and concurrent structures to execute the processing of the data stream.

Red5 - Media Server

  •    Java

Red5 is an Open Source Flash Server written in Java that supports Streaming Video (FLV, F4V, MP4, 3GP), Streaming Audio (MP3, F4A, M4A, AAC), Recording Client Streams (FLV and AVC+AAC in FLV container), Shared Objects, Live Stream Publishing, Remoting Protocols: RTMP, RTMPT, RTMPS, and RTMPE.

Gimel - PayPal's Big Data Processing Framework

  •    Scala

Gimel provides unified Data API to access data from any storage like HDFS, GS, Alluxio, Hbase, Aerospike, BigQuery, Druid, Elastic, Teradata, Oracle, MySQL, etc.

oppressor - streaming http compression response negotiator

  •    Javascript

Return a duplex stream that will be compressed with gzip, deflate, or no compression depending on the accept-encoding headers sent.oppressor will emulate calls to http.ServerResponse methods like writeHead() so that modules like filed that expect to be piped directly to the response object will work.

busboy - A streaming parser for HTML form data for node.js

  •    Javascript

A node.js module for parsing incoming HTML form data.file(< string >fieldname, < ReadableStream >stream, < string >filename, < string >transferEncoding, < string >mimeType) - Emitted for each new file form field found. transferEncoding contains the 'Content-Transfer-Encoding' value for the file stream. mimeType contains the 'Content-Type' value for the file stream.

streaming-benchmarks - Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache Spark, Apache Flink,

  •    Java

Code licensed under the Apache 2.0 license. See LICENSE file for terms.At Yahoo we have adopted Apache Storm as our stream processing platform of choice. But that was in 2012 and the landscape has changed significantly since then. Because of this we really want to know what Storm is good at, where it needs to be improved compared to other systems, and what its limitations are compared to other tools so we can recommend the best tool for the job to our customers. To do this we started to look for stream processing benchmarks that we could use to do this evaluation, but all of them ended up lacking in several fundamental areas. Primarily they did not test anything close to a read world use case, so we decided to write a simple one. This is the first round of these tests. The tool here is not polished and only covers three tools and one specific use case. We hope to expand this in the future in terms of the tools tested, the variety of processing tested, and the metrics gathered.

OpenLiteSpeed - High performance, lightweight, HTTP server

  •    C++

OpenLiteSpeed is a high-performance, lightweight, open source HTTP server developed and copyrighted by LiteSpeed Technologies. It is event driven and it can handle hundreds of thousands of concurrent connections without load spikes.

ZipStream-PHP - Fork of pablotron's zip streaming library.

  •    PHP

You can also add comments, modify file timestamps, and customize (or disable) the HTTP headers. It is also possible to specify the storage method when adding files, the current default storage method is 'deflate' i.e files are stored with Compression mode 0x08. See the class file for details. Please take a look at the CONTRIBUTOR-README.md File.