Kafka - A high-throughput distributed messaging system

  •        6028

Kafka provides a publish-subscribe solution that can handle all activity stream data and processing on a consumer-scale web site. This kind of activity (page views, searches, and other user actions) are a key ingredient in many of the social feature on the modern web. This data is typically handled by "logging" and ad hoc log aggregation solutions due to the throughput requirements. This kind of ad hoc solution is a viable solution to providing logging data to Hadoop.




Related Projects

Scribe - Real time log aggregation used in Facebook

Scribe is a server for aggregating log data that's streamed in real time from clients. It is designed to be scalable and reliable. It is developed and maintained by Facebook. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups.

X-Itools: Enterprise Collaboration

Enterprise Collaboration modules and strong Log Analysis modules

couchdb-lager - Mirror of Apache CouchDB

for the backend:```erlang{lager, [ {handlers, [ {lager_console_backend, [info, {lager_default_formatter, [time," [",severity,"] ", message, "\n"]}]}, {lager_file_backend, [{file, "error.log"}, {level, error}, {formatter, lager_default_formatter}, {formatter_config, [date, " ", time," [",severity,"] ",pid, " ", message, "\n"]}]}, {lager_file_backend, [{file, "console.log"}, {level, info}]} ]}]}.```Included is lager_default_formatter. This provides a generic, default formatting fo

Sentry - Realtime Platform-Agnostic Error Logging and Aggregation platform

Sentry is a realtime event logging and aggregation platform. It specializes in monitoring errors and extracting all the information needed to do a proper post-mortem without any of the hassle of the standard user feedback loop.

Samza - Distributed Stream Processing Framework

Apache Samza is a distributed stream processing framework. It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management. It provides a very simple call-back based process message API that should be familiar to anyone who's used Map/Reduce. Samza was originally developed at LinkedIn. It's currently used to process tracking data, service log data, and for data ingestion pipelines for realtime services.

kafka-cpp - CPP client for Apache Kafka

This library allows you to produce messages to the Kafka distributed publish/subscribe messaging service.Run this to generate the makefile for your system. Do this first.

White-elephant - Hadoop log aggregator and dashboard

White Elephant is a Hadoop log aggregator and dashboard which enables visualization of Hadoop cluster utilization across users. The server is a JRuby web application. In a production environment it can be deployed to tomcat and reads aggregated usage data directly from Hadoop. This data is stored in an in-memory database provided by HyperSQL. Charting is provided by Rickshaw. This project is developed by LinkedIn.

Luxun - A high-throughput, persistent, distributed, publish-subscribe messaging system based on memo

A high-throughput, persistent, distributed, publish-subscribe messaging system based on memory mapped file and Thrift RPC.

gofka - Dependency free drop-in replacement for Kafka written in Go.

Kafka is a high-throughput publish-subscribe messaging system rethough as a distributed commit log.When you're a JVM shop, with a JVM stack, this might come as natural. However, despite having experience with some of this, I'm moving more and more into the Go stack. Distributed systems development in Go has seen the recent adoption of The Raft Consensus Algorithm which with the right implementation, solves all problems Kafka had to solve using Zookeeper. This means Gofka can be 100% dependency free.

RocketMQ - Distributed messaging and streaming data platform

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

kafka-php - PHP client library for Apache Kafka

kafka-php allows you to produce messages to the Apache Kafka distributed publish/subscribe messaging service.Add the lib directory to the PHP include_path and use an autoloader like the one in the examples directory (the code follows the PEAR/Zend one-class-per-file convention).

nxlog - Multi platform Log management

nxlog is a modular, multi-threaded, high-performance log management solution with multi-platform support. In concept it is similar to syslog-ng or rsyslog but is not limited to unix/syslog only. It can collect logs from files in various formats, receive logs from the network remotely over UDP, TCP or TLS/SSL . It supports platform specific sources such as the Windows Eventlog, Linux kernel logs, Android logs, local syslog etc.

Fluentd - Data collector, Log Everything in JSON

Fluentd is an event collector system. It is a generalized version of syslogd, which handles JSON objects for its log messages. It collects logs from various data sources and writes them to files, database or other types of storages.

brod - An unmaintained python client to Kafka 0.6

This library is not maintained anymore and only supports Kafka 0.6. Please use mumrah/kafka-python if you want Kafka 0.8 support.brod lets you produce messages to the Kafka distributed publish/subscribe messaging service. It started as a fork of pykafka (https://github.com/dsully/pykafka), but became a total rewrite as we needed to add many features.

pykafka - Python API for the Kafka Message Queue

pykafka allows you to produce messages to the Kafka distributed publish/subscribe messaging service.

Graylog2 - Open Source Log Management

Graylog2 is an open source log management solution that stores your logs in ElasticSearch. It consists of a server written in Java that accepts your syslog messages via TCP, UDP or AMQP and stores it in the database. The second part is a web interface that allows you to manage the log messages from your web browser. Take a look at the screenshots or the latest release info page to get a feeling of what you can do with Graylog2.

jafka - a fast distributed publish-subscribe messaging system (mq)

a fast distributed publish-subscribe messaging system (mq)

Octopussy - Perl/XML Logs Analyzer, Alerter & Reporter

Octopussy is a Log analyzer tool. It analyzes the log, generates reports and alerts the admin. It has LDAP support to maintain users list. It exports report by Email, FTP & SCP. Scheduled reports could be generated. RRD tool to generate graphs.

BookKeeper - Replicated Log Service

BookKeeper is a replicated log service which can be used to build replicated state machines. A log contains a sequence of events which can be applied to a state machine. BookKeeper guarantees that each replica state machine will see all the same entries, in the same order. It provides back end support to distributed systems, such as messaging systems, coordination systems, filesystems, etc.