Code examples that show how to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark 1.1+ while using Apache Avro as the data serialization format. Take a look at the Kafka Streams code examples at https://github.com/confluentinc/examples.
apache-kafka kafka apache-storm storm spark apache-spark integration avro apache-avroThis is the home of the tool called rq (record query). It's a tool that's used for performing queries on streams of records in various formats. The goal is to make ad-hoc exploration of data sets easy without having to use more heavy-weight tools like SQL/MapReduce/custom programs. rq fills a similar niche as tools like awk or sed, but works with structured (record) data instead of text.
protobuf command-line-tool json avro messagepack yaml toml lodashSchema Registry provides a RESTful interface for storing and retrieving versioned Avro schemas for use with Kafka.
schema-registry kafka schema schemas avro avro-schema rest-api confluentPure JavaScript implementation of the Avro specification. avsc is compatible with all versions of node.js since 0.11 and major browsers via browserify (see the full compatibility table here). For convenience, you can also find compiled distributions with the releases (but please host your own copy).
avro serialization rpc api avdl avpr avsc binary buffer data decoding encoding idl interface ipc json marshalling message protocol schema serde typeBrowse Kafka topics and understand what's happening on your cluster. Find topics / view topic metadata / browse topic data (kafka messages) / view topic configuration / download data. This is a web tool for the confluentinc/kafka-rest proxy. Config: If you don't use our docker image, keep in mind that Kafka-REST-Proxy CORS support can be a bit buggy, so if you have trouble setting it up, you may need to provide CORS headers through a proxy (i.e. nginx).
schema registry kafka topics avroCompare various data serialization libraries for C++. This project does not have any external library dependencies. All (boost, thrift etc.) needed libraries are downloaded and built automatically, but you need enough free disk space to build all components. To build this project you need a compiler that supports C++11 features. Project was tested with GCC 4.8.2 (Ubuntu 14.04).
cpp serialization protobuf capn-proto thrift flatbuffers cereal performance-testing boost msgpack avro apache-avro c-plus-plus yasCamus Compressor merges files created by Camus and saves them in a compressed format.Camus is massively used at Allegro for dumping more than 200 Kafka topics onto HDFS. The script runs every 15 minutes and creates one file per Kafka partition which results in about 76800 small files per day. Most of the files do not exceed Hadoop block size. This is a clear Hadoop antipattern which leads to performance issues, for example extensive number of mappers in SQL queries’ executions.
avro etl hadoop spark kafkaStorageTapper is a scalable realtime MySQL change data streaming and transformation service.Service reads data from MySQL, transforms it into an Avro schema serialized format, and publishes these events to Kafka. Consumers can then consume these events directly from Kafka.
mysql kafka avro cdc etlAvro4s is a schema/class generation and serializing/deserializing library for Avro written in Scala. The objective is to allow seamless use with Scala without the need to to write boilerplate conversions yourself, and without the runtime overhead of reflection. Hence, this is a macro based library and generates code for use with Avro at compile time.Avro4s allows us to generate schemas directly from classes in a totally straightforward way. Let's define some classes.
macro-generation avro avro-schema schema-generationIceberg is a new table format for storing large, slow-moving tabular data. It is designed to improve on the de-facto standard table layout built into Hive, Presto, and Spark.Iceberg is under active development at Netflix.
spark hadoop parquet avroLight weight wrapper that adds Google Cloud Storage (GCS) support to common Hadoop tools, including avro-tools, parquet-tools and proto-tools for Scio's Protobuf in Avro file, so that they can be used from regular workstations or laptops, outside of a Google Compute Engine (GCE) instance.It uses your existing OAuth2 credentials and allows authentication via a browser.
gcs google-storage avro protobuf gcs-connector gcp parquetOr download the release jar and run it.The command line tool can be used to sample from local file system or Google Cloud Storage directly if Google Cloud SDK is installed and authenticated.
scalacheck avro parquet bigquery protobufThe AvroTurf::SchemaRegistry, AvroTurf::CachedSchemaRegistry, and FakeSchemaRegistryServer names have been deprecated because the Avro spec recently introduced an incompatible single-message encoding format. These classes have been renamed to AvroTurf::ConfluentSchemaRegistry, AvroTurf::CachedConfluentSchemaRegistry, and FakeConfluentSchemaRegistry.
schema-registry avro avro-data schemaScala macros for generating Parquet column projections and filter predicates.
scala-macros avro parquetA simple JS library to convert Avro Schemas to TypeScript interfaces. The library can be run in node.js or the browser. It takes a Avro Schema as a JavaScript object (from JSON) and returns the TypeScript code as a string.
avro avro-schema typescript typescript-libraryThe metrics component provides operational insight to your DC/OS cluster, providing discrete metrics about your applications and deployments. This can include charts, dashboards, and alerts based on cluster, node, container, and application-level statistics. This project provides a metrics service for all DC/OS clusters which can be integrated with any timeseries data store or hosted metrics service. We aim to be un-opinionated about what you do with the metrics once they’re out of the system. However you look at it, getting those metrics should be mind-numbingly simple.
metrics avro statsd dogstatsd prometheussbt-avro is a sbt 1.x plugin for generating Scala classes from Apache Avro schemas hosted on a remote Confluent Schema Registry. By default sbt-avro will download all Avro schema files from local schema registry to your default resources_managed directory (ie: target/scala-2.12/resources_managed/main/avro/). Please check settings section for more information about available settings.
avro sbt schema-registryReplicates typical Kafka stack using docker compose.
kafka spark twitter docker-compose avro kafka-connect pysparkA few of the Big Data, NoSQL & Linux tools I've written over the years. All programs have --help to list the available options. For many more tools see the DevOps Perl Tools and Advanced Nagios Plugins Collection repos which contains many Hadoop, NoSQL, Web and infrastructure tools and Nagios plugins.
ambari cloudformation hbase json avro parquet spark pyspark travis-ci pig elasticsearch solr xml hadoop hdfs dockerhub docker awsThis is a library allowing to transform the shape of an Avro record using SQL. It relies on Apache Calcite for the SQL parsing.
avro-schema sql avro
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.