Kafka - A high-throughput distributed messaging system

  •        6200

Kafka provides a publish-subscribe solution that can handle all activity stream data and processing on a consumer-scale web site. This kind of activity (page views, searches, and other user actions) are a key ingredient in many of the social feature on the modern web. This data is typically handled by "logging" and ad hoc log aggregation solutions due to the throughput requirements. This kind of ad hoc solution is a viable solution to providing logging data to Hadoop.

http://kafka.apache.org/

Tags
Implementation
License
Platform

   




Related Projects

Scribe - Real time log aggregation used in Facebook

  •    C++

Scribe is a server for aggregating log data that's streamed in real time from clients. It is designed to be scalable and reliable. It is developed and maintained by Facebook. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups.

liftbridge - Lightweight, fault-tolerant message streams.

  •    Go

Liftbridge provides lightweight, fault-tolerant message streams by implementing a durable stream augmentation for the NATS messaging system. It extends NATS with a Kafka-like publish-subscribe log API that is highly available and horizontally scalable. Use Liftbridge as a simpler and lighter alternative to systems like Kafka and Pulsar or use it to add streaming semantics to an existing NATS deployment. See the introduction post on Liftbridge and this post for more context and some of the inspiration behind it.

X-Itools: Enterprise Collaboration

  •    Javascript

Enterprise Collaboration modules and strong Log Analysis modules

Kafka-Message-Server - Example application based on Apache Kafka framework to show it usage as distributed message server

  •    Java

Apache kafka is yet another precious gem from Apache Software Foundation. Kafka was originally developed at Linkedin and later on became a member of Apache project. Apache Kafka is a distributed publish-subscribe messaging system. Kafka differs from traditional messaging system as it is designed as distributed system, persists messages on disk and supports multiple subscribers. Kafka-Message-Server is an sample application for demonstrating kafka usage as message-server. Please follow the below instructions for productive use of the sample application.


Sentry - Realtime Platform-Agnostic Error Logging and Aggregation platform

  •    Python

Sentry is a realtime event logging and aggregation platform. It specializes in monitoring errors and extracting all the information needed to do a proper post-mortem without any of the hassle of the standard user feedback loop.

Samza - Distributed Stream Processing Framework

  •    Java

Apache Samza is a distributed stream processing framework. It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management. It provides a very simple call-back based process message API that should be familiar to anyone who's used Map/Reduce. Samza was originally developed at LinkedIn. It's currently used to process tracking data, service log data, and for data ingestion pipelines for realtime services.

White-elephant - Hadoop log aggregator and dashboard

  •    Java

White Elephant is a Hadoop log aggregator and dashboard which enables visualization of Hadoop cluster utilization across users. The server is a JRuby web application. In a production environment it can be deployed to tomcat and reads aggregated usage data directly from Hadoop. This data is stored in an in-memory database provided by HyperSQL. Charting is provided by Rickshaw. This project is developed by LinkedIn.

spring-integration-kafka

  •    Java

The Spring Integration Kafka extension project provides inbound and outbound channel adapters for Apache Kafka. Apache Kafka is a distributed publish-subscribe messaging system that is designed for high throughput (terabytes of data) and low latency (milliseconds). For more information on Kafka and its design goals, see the Kafka main page.Starting from version 2.0 version this project is a complete rewrite based on the new spring-kafka project which uses the pure java Producer and Consumer clients provided by Kafka 0.9.x.x and 0.10.x.x..

Luxun - A high-throughput, persistent, distributed, publish-subscribe messaging system based on memo

  •    Java

A high-throughput, persistent, distributed, publish-subscribe messaging system based on memory mapped file and Thrift RPC.

GoAccess - Real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser

  •    C

GoAccess is an open source real-time web log analyzer and interactive viewer that runs in a terminal on *nix systems or through your browser. It provides fast and valuable HTTP statistics for system administrators that require a visual server report on the fly. It supports nearly all web log formats (Apache, Nginx, Amazon S3, Elastic Load Balancing, CloudFront, etc)

mypipe - MySQL binary log consumer with the ability to act on changed rows and publish changes to different systems with emphasis on Apache Kafka

  •    Scala

mypipe latches onto a MySQL server with binary log replication enabled and allows for the creation of pipes that can consume the replication stream and act on the data (primarily integrated with Apache Kafka). mypipe tries to provide enough information that usually is not part of the MySQL binary log stream so that the data is meaningful. mypipe requires a row based binary log format and provides Insert, Update, and Delete mutations representing changed rows. Each change is related back to it's table and the API provides metadata like column types, primary key information (composite, key order), and other such useful information.

RocketMQ - Distributed messaging and streaming data platform

  •    Java

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

nxlog - Multi platform Log management

  •    C

nxlog is a modular, multi-threaded, high-performance log management solution with multi-platform support. In concept it is similar to syslog-ng or rsyslog but is not limited to unix/syslog only. It can collect logs from files in various formats, receive logs from the network remotely over UDP, TCP or TLS/SSL . It supports platform specific sources such as the Windows Eventlog, Linux kernel logs, Android logs, local syslog etc.

Fluentd - Data collector, Log Everything in JSON

  •    Ruby

Fluentd is an event collector system. It is a generalized version of syslogd, which handles JSON objects for its log messages. It collects logs from various data sources and writes them to files, database or other types of storages.

Graylog2 - Open Source Log Management

  •    Java

Graylog2 is an open source log management solution that stores your logs in ElasticSearch. It consists of a server written in Java that accepts your syslog messages via TCP, UDP or AMQP and stores it in the database. The second part is a web interface that allows you to manage the log messages from your web browser. Take a look at the screenshots or the latest release info page to get a feeling of what you can do with Graylog2.

emitter - High performance, distributed and low latency publish-subscribe platform.

  •    Go

Emitter is a free open source real-time messaging service that connects all devices. This publish-subscribe messaging API is built for speed and security. Emitter is a real-time communication service for connecting online devices. Infrastructure and APIs for IoT, gaming, apps and real-time web. At its core, emitter.io is a distributed, scalable and fault-tolerant publish-subscribe messaging platform based on MQTT protocol and featuring message storage.

jafka - a fast distributed publish-subscribe messaging system (mq)

  •    Java

a fast distributed publish-subscribe messaging system (mq)

Octopussy - Perl/XML Logs Analyzer, Alerter & Reporter

  •    Perl

Octopussy is a Log analyzer tool. It analyzes the log, generates reports and alerts the admin. It has LDAP support to maintain users list. It exports report by Email, FTP & SCP. Scheduled reports could be generated. RRD tool to generate graphs.