Displaying 1 to 20 from 46 results

kafka-storm-starter - Code examples that show to integrate Apache Kafka 0

  •    Scala

Code examples that show how to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark 1.1+ while using Apache Avro as the data serialization format. Take a look at the Kafka Streams code examples at https://github.com/confluentinc/examples.

rq - Record Query - A tool for doing record analysis and transformation

  •    Javascript

This is the home of the tool called rq (record query). It's a tool that's used for performing queries on streams of records in various formats. The goal is to make ad-hoc exploration of data sets easy without having to use more heavy-weight tools like SQL/MapReduce/custom programs. rq fills a similar niche as tools like awk or sed, but works with structured (record) data instead of text.

schema-registry - Schema registry for Kafka

  •    Java

Schema Registry provides a RESTful interface for storing and retrieving versioned Avro schemas for use with Kafka.

avsc - Avro for JavaScript :zap:

  •    Javascript

Pure JavaScript implementation of the Avro specification. avsc is compatible with all versions of node.js since 0.11 and major browsers via browserify (see the full compatibility table here). For convenience, you can also find compiled distributions with the releases (but please host your own copy).

kafka-topics-ui - Web Tool for Kafka Topics using Kafka Rest |

  •    Javascript

Browse Kafka topics and understand what's happening on your cluster. Find topics / view topic metadata / browse topic data (kafka messages) / view topic configuration / download data. This is a web tool for the confluentinc/kafka-rest proxy. Config: If you don't use our docker image, keep in mind that Kafka-REST-Proxy CORS support can be a bit buggy, so if you have trouble setting it up, you may need to provide CORS headers through a proxy (i.e. nginx).

cpp-serializers - Benchmark comparing various data serialization libraries (thrift, protobuf etc

  •    C++

Compare various data serialization libraries for C++. This project does not have any external library dependencies. All (boost, thrift etc.) needed libraries are downloaded and built automatically, but you need enough free disk space to build all components. To build this project you need a compiler that supports C++11 features. Project was tested with GCC 4.8.2 (Ubuntu 14.04).

camus-compressor - Camus Compressor merges files created by Camus and saves them in a compressed format

  •    Java

Camus Compressor merges files created by Camus and saves them in a compressed format.Camus is massively used at Allegro for dumping more than 200 Kafka topics onto HDFS. The script runs every 15 minutes and creates one file per Kafka partition which results in about 76800 small files per day. Most of the files do not exceed Hadoop block size. This is a clear Hadoop antipattern which leads to performance issues, for example extensive number of mappers in SQL queries’ executions.

storagetapper - StorageTapper is a scalable realtime MySQL change data streaming and transformation service

  •    Go

StorageTapper is a scalable realtime MySQL change data streaming and transformation service.Service reads data from MySQL, transforms it into an Avro schema serialized format, and publishes these events to Kafka. Consumers can then consume these events directly from Kafka.

avro4s - Avro schema generation and serialization / deserialization

  •    Scala

Avro4s is a schema/class generation and serializing/deserializing library for Avro written in Scala. The objective is to allow seamless use with Scala without the need to to write boilerplate conversions yourself, and without the runtime overhead of reflection. Hence, this is a macro based library and generates code for use with Avro at compile time.Avro4s allows us to generate schemas directly from classes in a totally straightforward way. Let's define some classes.

iceberg - Iceberg is a table format for large, slow-moving tabular data

  •    Java

Iceberg is a new table format for storing large, slow-moving tabular data. It is designed to improve on the de-facto standard table layout built into Hive, Presto, and Spark.Iceberg is under active development at Netflix.

gcs-tools - GCS support for avro-tools and parquet-tools

  •    Java

Light weight wrapper that adds Google Cloud Storage (GCS) support to common Hadoop tools, including avro-tools, parquet-tools and proto-tools for Scio's Protobuf in Avro file, so that they can be used from regular workstations or laptops, outside of a Google Compute Engine (GCE) instance.It uses your existing OAuth2 credentials and allows authentication via a browser.

ratatool - A tool for random data sampling and generation

  •    Scala

Or download the release jar and run it.The command line tool can be used to sample from local file system or Google Cloud Storage directly if Google Cloud SDK is installed and authenticated.

avro_turf - A library that makes it easier to use the Avro serialization format from Ruby.

  •    Ruby

The AvroTurf::SchemaRegistry, AvroTurf::CachedSchemaRegistry, and FakeSchemaRegistryServer names have been deprecated because the Avro spec recently introduced an incompatible single-message encoding format. These classes have been renamed to AvroTurf::ConfluentSchemaRegistry, AvroTurf::CachedConfluentSchemaRegistry, and FakeConfluentSchemaRegistry.

parquet-avro-extra - Scala macros for generating Parquet schema projections and filter predicates

  •    Scala

Scala macros for generating Parquet column projections and filter predicates.

avro-typescript - TypeScript Code Generator for Apache Avro Schema Types

  •    TypeScript

A simple JS library to convert Avro Schemas to TypeScript interfaces. The library can be run in node.js or the browser. It takes a Avro Schema as a JavaScript object (from JSON) and returns the TypeScript code as a string.

dcos-metrics - Make metrics accessible.

  •    C++

The metrics component provides operational insight to your DC/OS cluster, providing discrete metrics about your applications and deployments. This can include charts, dashboards, and alerts based on cluster, node, container, and application-level statistics. This project provides a metrics service for all DC/OS clusters which can be integrated with any timeseries data store or hosted metrics service. We aim to be un-opinionated about what you do with the metrics once they’re out of the system. However you look at it, getting those metrics should be mind-numbingly simple.

sbt-avro - Generate Scala classes from Apache Avro schemas hosted on a remote Confluent Schema Registry

  •    Scala

sbt-avro is a sbt 1.x plugin for generating Scala classes from Apache Avro schemas hosted on a remote Confluent Schema Registry. By default sbt-avro will download all Avro schema files from local schema registry to your default resources_managed directory (ie: target/scala-2.12/resources_managed/main/avro/). Please check settings section for more information about available settings.

devops-python-tools - DevOps CLI Tools for Hadoop, Spark, HBase, Log Anonymizer, Ambari Blueprints, AWS CloudFormation, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Elasticsearch, Solr, Travis CI, Pig, IPython - Python / Jython Tools

  •    Python

A few of the Big Data, NoSQL & Linux tools I've written over the years. All programs have --help to list the available options. For many more tools see the DevOps Perl Tools and Advanced Nagios Plugins Collection repos which contains many Hadoop, NoSQL, Web and infrastructure tools and Nagios plugins.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.