Avro

  •        4350

Avro is a data serialization system. Avro provides: Rich data structures. A compact, fast, binary data format. A container file, to store persistent data. Remote procedure call (RPC). Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.

http://hadoop.apache.org/avro

Tags
Implementation
License
Platform

   




Related Projects

avro-rpc-quickstart - Apache Avro RPC Quick Start. Avro is a subproject of Apache Hadoop.

  •    Ruby

Apache Avro RPC Quick Start. Avro is a subproject of Apache Hadoop.

cpp-serializers - Benchmark comparing various data serialization libraries (thrift, protobuf etc

  •    C++

Compare various data serialization libraries for C++. This project does not have any external library dependencies. All (boost, thrift etc.) needed libraries are downloaded and built automatically, but you need enough free disk space to build all components. To build this project you need a compiler that supports C++11 features. Project was tested with GCC 4.8.2 (Ubuntu 14.04).

avsc - Avro for JavaScript :zap:

  •    Javascript

Pure JavaScript implementation of the Avro specification. avsc is compatible with all versions of node.js since 0.11 and major browsers via browserify (see the full compatibility table here). For convenience, you can also find compiled distributions with the releases (but please host your own copy).

kafka-storm-starter - Code examples that show to integrate Apache Kafka 0

  •    Scala

Code examples that show how to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark 1.1+ while using Apache Avro as the data serialization format. Take a look at the Kafka Streams code examples at https://github.com/confluentinc/examples.

Redisson - Redis based In-Memory Data Grid for Java

  •    Java

Redisson - distributed Java objects and services (Set, Multimap, SortedSet, Map, List, Queue, BlockingQueue, Deque, BlockingDeque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Executor service, Tomcat Session Manager, Scheduler service, JCache API) on top of Redis server. Rich Redis client.


Pangool - Tuple MapReduce for Hadoop and MapReduce made easy

  •    Java

Pangool is a framework on top of Hadoop that implements Tuple MapReduce. It supports secondary sorting, Built-in reduce-side joining capabilities, Built-in serialization support for Thrift and ProtoStuff and lot more.

Jackson - Best JSON parser for Java

  •    Java

Jackson is one of best JSON parser for Java. More than that, Jackson is a suite of data-processing tools for Java (and the JVM platform), including the flagship streaming JSON parser / generator library, matching data-binding library (POJOs to and from JSON) and additional data format modules to process data encoded in Avro, BSON, CBOR, CSV, Smile, (Java) Properties, Protobuf, XML or YAML; and even the large set of data format modules to support data types of widely used data types such as Guava, Joda, PCollections and many, many more.

schema-registry - Schema registry for Kafka

  •    Java

Schema Registry provides a RESTful interface for storing and retrieving versioned Avro schemas for use with Kafka.

avro - Mirror of Apache Avro

  •    Java

Avro toplevel pom

spark-avro - Avro Data Source for Apache Spark

  •    Scala

A library for reading and writing Avro data from Spark SQL. This documentation is for version 4.0.0 of this library, which supports Spark 2.2. For documentation on earlier versions of this library, see the links below.

Hive-JSON-Serde - Read - Write JSON SerDe for Apache Hive.

  •    Java

This library enables Apache Hive to read and write in JSON format. It includes support for serialization and deserialization (SerDe) as well as JSON conversion UDF. Download the latest binaries (json-serde-X.Y.Z-jar-with-dependencies.jar and json-udf-X.Y.Z-jar-with-dependencies.jar) from congiu.net/hive-json-serde. Choose the correct verson for CDH 4, CDH 5 or Hadoop 2.3. Place the JARs into hive/lib or use ADD JAR in Hive.

Ambari - Monitor Hadoop Cluster

  •    Java

The Apache Ambari project is aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive, easy-to-use Hadoop management web UI backed by its RESTful APIs. The set of Hadoop components that are currently supported by Ambari includes HDFS, MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig, Sqoop.

Apache Trafodion - Webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop.

  •    C++

Apache Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop. Trafodion builds on the scalability, elasticity, and flexibility of Hadoop. Trafodion extends Hadoop to provide guaranteed transactional integrity, enabling new kinds of big data applications to run on Hadoop.

spring-hadoop - Spring for Apache Hadoop is a framework for application developers to take advantage of the features of both Hadoop and Spring

  •    Java

The Spring for Apache Hadoop project provides extensions to Spring, Spring Batch, and Spring Integration to build manageable and robust pipeline solutions around Hadoop.Spring for Apache Hadoop extends Spring Batch by providing support for reading from and writing to HDFS, running various types of Hadoop jobs (Java MapReduce, Streaming, Hive, Spark, Pig) and using HBase. An important goal is to provide excellent support for non-Java based developers to be productive using Spring Hadoop and not have to write any Java code to use the core feature set.

gis-tools-for-hadoop - The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data

  •    

The GIS Tools for Hadoop are a collection of GIS tools that leverage the Spatial Framework for Hadoop for spatial analysis of big data. The tools make use of the Geoprocessing Tools for Hadoop toolbox, to provide access to the Hadoop system from the ArcGIS Geoprocessing environment. Start out by navigating to samples and following the instructions provided with each sample.There are also tutorials for using the GP tools and aggregation methods.

parkour - Hadoop MapReduce in idiomatic Clojure.

  •    Clojure

Hadoop MapReduce in idiomatic Clojure. Parkour takes your Clojure code’s functional gymnastics and sends it free-running across the urban environment of your Hadoop cluster. Parkour is a Clojure library for writing distributed programs in the MapReduce pattern which run on the Hadoop MapReduce platform. Parkour does its best to avoid being yet another “framework” – if you know Hadoop, and you know Clojure, then you’re most of the way to knowing Parkour. By combining functional programming, direct access to Hadoop features, and interactive iteration on live data, Parkour supports rapid development of highly efficient Hadoop MapReduce applications.

hadoop-docker - Hadoop docker image

  •    Shell

A few weeks ago we released an Apache Hadoop 2.3 Docker image - this quickly become the most popular Hadoop image in the Docker registry. Following the success of our previous Hadoop Docker images, the feedback and feature requests we received, we aligned with the Hadoop release cycle, so we have released an Apache Hadoop 2.7.1 Docker image - same as the previous version, it's available as a trusted and automated build on the official Docker registry.

mrjob - Run MapReduce jobs on Hadoop or Amazon Web Services

  •    Python

mrjob is a Python 2.7/3.3+ package that helps you write and run Hadoop Streaming jobs. It fully supports Amazon's Elastic MapReduce (EMR) service, which allows you to buy time on a Hadoop cluster on an hourly basis. mrjob has basic support for Google Cloud Dataproc (Dataproc) which allows you to buy time on a Hadoop cluster on a minute-by-minute basis. It also works with your own Hadoop cluster.

spring-hadoop-samples - Spring Hadoop Samples

  •    Java

This repository contains several sample applications that show how you can use Spring for Apache Hadoop.Hadoop has a poor out of the box programming model. Writing applications for Hadoop generally turn into a collection of scripts calling Hadoop command line applications. Spring for Apache Hadoop provides a consistent programming model and declarative configuration model for developing Hadoop applications.

fast-serialization - FST: fast java serialization drop in-replacement

  •    Java

A fast java serialization drop in-replacement and some serialization based utils such as Structs and OffHeap Memory.