Samza - Distributed Stream Processing Framework

  •        0

Apache Samza is a distributed stream processing framework. It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management. It provides a very simple call-back based process message API that should be familiar to anyone who's used Map/Reduce. Samza was originally developed at LinkedIn. It's currently used to process tracking data, service log data, and for data ingestion pipelines for realtime services.

http://samza.incubator.apache.org/

Tags
Implementation
License
Platform

   

comments powered by Disqus


Related Projects

Project-voldemort - A distributed database, Clone of Amazon's Dynamo


Voldemort is a distributed key-value storage system. Data is automatically replicated over multiple servers. Data is automatically partitioned so each server contains only a subset of the total data. Server failure is handled transparently. It is used at LinkedIn for certain high-scalability storage problems where simple functional partitioning is not sufficient.

Openmeetings - Open Source Web Conferencing


Openmeetings provides video conferencing, instant messaging, white board, collaborative document editing and other groupware tools using API functions of the Red5 Streaming Server for Remoting and Streaming.

IndexTank - Search Engine powers Reddit


IndexTank search engine powers search in Reddit, Social bookmarking site. IndexTank is acquired by LinkedIn and released the project as open source. It includes features like Variables boosts, Facets, Faceted search, Snippeting, Custom scoring functions, Suggest, and Autocomplete.

http-parser - http request/response parser for c


This is a parser for HTTP messages written in C. It parses both requests and responses. The parser is designed to be used in performance HTTP applications. It does not make any syscalls nor allocations, it does not buffer data, it can be interrupted at anytime. Depending on your architecture, it only requires about 40 bytes of data per message stream (in a web server that is per connection).

Scribe - Real time log aggregation used in Facebook


Scribe is a server for aggregating log data that's streamed in real time from clients. It is designed to be scalable and reliable. It is developed and maintained by Facebook. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups.

Socialauth - OAuth library in Java


Socialauth provides support to authenticate users through external oAuth providers like Gmail, Hotmail, Yahoo, Twitter, Facebook, LinkedIn, Foursquare, MySpace, Salesforce, Yammer as well as through OpenID providers like myopenid.com. It could be easily integrated.

Red5 - Media Server


Red5 is an Open Source Flash Server written in Java that supports Streaming Video (FLV, F4V, MP4, 3GP), Streaming Audio (MP3, F4A, M4A, AAC), Recording Client Streams (FLV and AVC+AAC in FLV container), Shared Objects, Live Stream Publishing, Remoting Protocols: RTMP, RTMPT, RTMPS, and RTMPE.

RabbitMQ - Robust messaging for applications


RabbitMQ is a messaging broker - an intermediary for messaging. It gives your applications a common platform to send and receive messages, and your messages a safe place to live until received. It features include reliability, high availability, Clustering and Federation. RabbitMQ ships with an easy-to use management UI that allows you to monitor and control every aspect of your message broker. There are RabbitMQ clients for almost any language you can think of.

kestrel - simple, distributed message queue system


simple, distributed message queue system

Kafka - A high-throughput distributed messaging system


Kafka provides a publish-subscribe solution that can handle all activity stream data and processing on a consumer-scale web site. This kind of activity (page views, searches, and other user actions) are a key ingredient in many of the social feature on the modern web. This data is typically handled by "logging" and ad hoc log aggregation solutions due to the throughput requirements. This kind of ad hoc solution is a viable solution to providing logging data to Hadoop.







Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.

Tag Cloud >>