Avro
Avro is a data serialization system. Avro provides: Rich data structures. A compact, fast, binary data format. A container file, to store persistent data. Remote procedure call (RPC). Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.
comments powered by Disqus
Related Products
Cascading - Data Processing Workflows on Hadoop
Cascading is a Data Processing API, Process Planner, and Process Scheduler used for defining and executing complex, scale-free, and fault tolerant data processing workflows on an Apache Hadoop cluster. It is a thin Java library and API that sits on top of Hadoop's MapReduce layer and is executed from the command line like any other Hadoop application.
Sqoop - Transfers data between Hadoop and Datastores
Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. You can use Sqoop to import data from external structured datastores into Hadoop Distributed File System or related systems like Hive and HBase. Conversely, Sqoop can be used to extract data from Hadoop and export it to external structured datastores such as relational databases and enterprise data warehouses.
MessagePack
MessagePack is a binary-based efficient object serialization library. It enables to exchange structured objects between many languages like JSON. But unlike JSON, it is very fast and small.
Hadoop Common
Apache Hadoop is a framework for running applications on large clusters built of commodity hardware. Hadoop common supports other Hadoop subprojects
Protocol Buffer
Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal RPC protocols and file formats.
SMSj - Java SMS library
This library allows you to send SMSes (GSM) from the Java platform. It gives you full control over the SMS including the UDH field so you can create and send EMS messages, WAP push messages and nokia smart messages (picture, ringtone etc). The API can send SMS by using a GSM phone connected to the serial port or via a SMS gateway (like Clickatell).
RSS.PY - parser and the serializer in Python
This library provides tools for working with RSS feeds as datastructures. The core is an RSS parser capable of understanding most RSS formats, and a serializer that produces RSS1.0. The RSS channel itself can be represented as any arbitrary data structure; two such structures are provided both as examples and to service common usage. This approach allows channels to be manipulated and stored ina fashion that suits both their semantics and the applications that access them.
Katta - Lucene and more in the cloud.
Katta is a scalable, failure tolerant, distributed, data storage for real time access. Katta serves large, replicated, indices as shards to serve high loads and very large data sets. These indices can be of different type. Currently implementations are available for Lucene and Hadoop mapfiles.
Cascalog - Data processing on Hadoop
Cascalog is a fully-featured data processing and querying library for Clojure. The main use cases for Cascalog are processing Big Data on top of Hadoop or doing analysis on your local computer from the Clojure REPL. Cascalog is a replacement for tools like Pig, Hive, and Cascading.
HBase - Hadoop database
HBase provides support to handle BigTable - billions of rows X millions of columns. It is a scalable, distributed, versioned, column-oriented store modeled after Google's Bigtable and runs on top of HDFS (Hadoop Distributed Filesystem). It features compression, in-memory operation per-column. Data could be replicated between the nodes. HBase is used in Facebook and Twitter.