Avro is a data serialization system. Avro provides: Rich data structures. A compact, fast, binary data format. A container file, to store persistent data. Remote procedure call (RPC). Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.
http://hadoop.apache.org/avroTags | hadoop serialization |
Implementation | Java |
License | Apache |
Platform | OS-Independent |
Apache Avro RPC Quick Start. Avro is a subproject of Apache Hadoop.
Compare various data serialization libraries for C++. This project does not have any external library dependencies. All (boost, thrift etc.) needed libraries are downloaded and built automatically, but you need enough free disk space to build all components. To build this project you need a compiler that supports C++11 features. Project was tested with GCC 4.8.2 (Ubuntu 14.04).
cpp serialization protobuf capn-proto thrift flatbuffers cereal performance-testing boost msgpack avro apache-avro c-plus-plus yasPure JavaScript implementation of the Avro specification. avsc is compatible with all versions of node.js since 0.11 and major browsers via browserify (see the full compatibility table here). For convenience, you can also find compiled distributions with the releases (but please host your own copy).
avro serialization rpc api avdl avpr avsc binary buffer data decoding encoding idl interface ipc json marshalling message protocol schema serde typeCode examples that show how to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark 1.1+ while using Apache Avro as the data serialization format. Take a look at the Kafka Streams code examples at https://github.com/confluentinc/examples.
apache-kafka kafka apache-storm storm spark apache-spark integration avro apache-avroRedisson - distributed Java objects and services (Set, Multimap, SortedSet, Map, List, Queue, BlockingQueue, Deque, BlockingDeque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Executor service, Tomcat Session Manager, Scheduler service, JCache API) on top of Redis server. Rich Redis client.
cache distributed-caching distributed-locks redis-client redis-cluster collections java-collections hashmap set queuePangool is a framework on top of Hadoop that implements Tuple MapReduce. It supports secondary sorting, Built-in reduce-side joining capabilities, Built-in serialization support for Thrift and ProtoStuff and lot more.
map-reduce sort tupleJackson is one of best JSON parser for Java. More than that, Jackson is a suite of data-processing tools for Java (and the JVM platform), including the flagship streaming JSON parser / generator library, matching data-binding library (POJOs to and from JSON) and additional data format modules to process data encoded in Avro, BSON, CBOR, CSV, Smile, (Java) Properties, Protobuf, XML or YAML; and even the large set of data format modules to support data types of widely used data types such as Guava, Joda, PCollections and many, many more.
json json-parser serializationSchema Registry provides a RESTful interface for storing and retrieving versioned Avro schemas for use with Kafka.
schema-registry kafka schema schemas avro avro-schema rest-api confluentA library for reading and writing Avro data from Spark SQL. This documentation is for version 4.0.0 of this library, which supports Spark 2.2. For documentation on earlier versions of this library, see the links below.
This library enables Apache Hive to read and write in JSON format. It includes support for serialization and deserialization (SerDe) as well as JSON conversion UDF. Download the latest binaries (json-serde-X.Y.Z-jar-with-dependencies.jar and json-udf-X.Y.Z-jar-with-dependencies.jar) from congiu.net/hive-json-serde. Choose the correct verson for CDH 4, CDH 5 or Hadoop 2.3. Place the JARs into hive/lib or use ADD JAR in Hive.
The Apache Ambari project is aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive, easy-to-use Hadoop management web UI backed by its RESTful APIs. The set of Hadoop components that are currently supported by Ambari includes HDFS, MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig, Sqoop.
hadoop-management hadoop-cluster hadoop-tools admin monitorApache Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop. Trafodion builds on the scalability, elasticity, and flexibility of Hadoop. Trafodion extends Hadoop to provide guaranteed transactional integrity, enabling new kinds of big data applications to run on Hadoop.
database distributed-database newsql oltp hbase hadoop map-reduceThe Spring for Apache Hadoop project provides extensions to Spring, Spring Batch, and Spring Integration to build manageable and robust pipeline solutions around Hadoop.Spring for Apache Hadoop extends Spring Batch by providing support for reading from and writing to HDFS, running various types of Hadoop jobs (Java MapReduce, Streaming, Hive, Spark, Pig) and using HBase. An important goal is to provide excellent support for non-Java based developers to be productive using Spring Hadoop and not have to write any Java code to use the core feature set.
The GIS Tools for Hadoop are a collection of GIS tools that leverage the Spatial Framework for Hadoop for spatial analysis of big data. The tools make use of the Geoprocessing Tools for Hadoop toolbox, to provide access to the Hadoop system from the ArcGIS Geoprocessing environment. Start out by navigating to samples and following the instructions provided with each sample.There are also tutorials for using the GP tools and aggregation methods.
spatial-analysis hadoopHadoop MapReduce in idiomatic Clojure. Parkour takes your Clojure code’s functional gymnastics and sends it free-running across the urban environment of your Hadoop cluster. Parkour is a Clojure library for writing distributed programs in the MapReduce pattern which run on the Hadoop MapReduce platform. Parkour does its best to avoid being yet another “framework” – if you know Hadoop, and you know Clojure, then you’re most of the way to knowing Parkour. By combining functional programming, direct access to Hadoop features, and interactive iteration on live data, Parkour supports rapid development of highly efficient Hadoop MapReduce applications.
A few weeks ago we released an Apache Hadoop 2.3 Docker image - this quickly become the most popular Hadoop image in the Docker registry. Following the success of our previous Hadoop Docker images, the feedback and feature requests we received, we aligned with the Hadoop release cycle, so we have released an Apache Hadoop 2.7.1 Docker image - same as the previous version, it's available as a trusted and automated build on the official Docker registry.
This package bundles some of the best Python serialization libraries into one standalone package, with a high-level API that makes it easy to write code that's correct across platforms and Pythons. This allows us to provide all the serialization utilities we need in a single binary wheel. Currently supports JSON, JSONL, MessagePack, Pickle and YAML. Serialization is hard, especially across Python versions and multiple platforms. After dealing with many subtle bugs over the years (encodings, locales, large files) our libraries like spaCy and Prodigy have steadily grown a number of utility functions to wrap the multiple serialization formats we need to support (especially json, msgpack and pickle). These wrapping functions ended up duplicated across our codebases, so we wanted to put them in one place.
yaml serialization json msgpack python-3 pickle python-2 ujsonmrjob is a Python 2.7/3.3+ package that helps you write and run Hadoop Streaming jobs. It fully supports Amazon's Elastic MapReduce (EMR) service, which allows you to buy time on a Hadoop cluster on an hourly basis. mrjob has basic support for Google Cloud Dataproc (Dataproc) which allows you to buy time on a Hadoop cluster on a minute-by-minute basis. It also works with your own Hadoop cluster.
map-reduce hadoop-streaming aws python-mapreduceThis repository contains several sample applications that show how you can use Spring for Apache Hadoop.Hadoop has a poor out of the box programming model. Writing applications for Hadoop generally turn into a collection of scripts calling Hadoop command line applications. Spring for Apache Hadoop provides a consistent programming model and declarative configuration model for developing Hadoop applications.
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.