Kafka - A high-throughput distributed messaging system

  •        6165

Kafka provides a publish-subscribe solution that can handle all activity stream data and processing on a consumer-scale web site. This kind of activity (page views, searches, and other user actions) are a key ingredient in many of the social feature on the modern web. This data is typically handled by "logging" and ad hoc log aggregation solutions due to the throughput requirements. This kind of ad hoc solution is a viable solution to providing logging data to Hadoop.

http://kafka.apache.org/

Tags
Implementation
License
Platform

   




Related Projects

Scribe - Real time log aggregation used in Facebook


Scribe is a server for aggregating log data that's streamed in real time from clients. It is designed to be scalable and reliable. It is developed and maintained by Facebook. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups.

X-Itools: Enterprise Collaboration


Enterprise Collaboration modules and strong Log Analysis modules

Sentry - Realtime Platform-Agnostic Error Logging and Aggregation platform


Sentry is a realtime event logging and aggregation platform. It specializes in monitoring errors and extracting all the information needed to do a proper post-mortem without any of the hassle of the standard user feedback loop.

Samza - Distributed Stream Processing Framework


Apache Samza is a distributed stream processing framework. It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management. It provides a very simple call-back based process message API that should be familiar to anyone who's used Map/Reduce. Samza was originally developed at LinkedIn. It's currently used to process tracking data, service log data, and for data ingestion pipelines for realtime services.


spring-integration-kafka


The Spring Integration Kafka extension project provides inbound and outbound channel adapters for Apache Kafka. Apache Kafka is a distributed publish-subscribe messaging system that is designed for high throughput (terabytes of data) and low latency (milliseconds). For more information on Kafka and its design goals, see the Kafka main page.Starting from version 2.0 version this project is a complete rewrite based on the new spring-kafka project which uses the pure java Producer and Consumer clients provided by Kafka 0.9.x.x and 0.10.x.x..

White-elephant - Hadoop log aggregator and dashboard


White Elephant is a Hadoop log aggregator and dashboard which enables visualization of Hadoop cluster utilization across users. The server is a JRuby web application. In a production environment it can be deployed to tomcat and reads aggregated usage data directly from Hadoop. This data is stored in an in-memory database provided by HyperSQL. Charting is provided by Rickshaw. This project is developed by LinkedIn.

Luxun - A high-throughput, persistent, distributed, publish-subscribe messaging system based on memo


A high-throughput, persistent, distributed, publish-subscribe messaging system based on memory mapped file and Thrift RPC.

mypipe - MySQL binary log consumer with the ability to act on changed rows and publish changes to different systems with emphasis on Apache Kafka


mypipe latches onto a MySQL server with binary log replication enabled and allows for the creation of pipes that can consume the replication stream and act on the data (primarily integrated with Apache Kafka). mypipe tries to provide enough information that usually is not part of the MySQL binary log stream so that the data is meaningful. mypipe requires a row based binary log format and provides Insert, Update, and Delete mutations representing changed rows. Each change is related back to it's table and the API provides metadata like column types, primary key information (composite, key order), and other such useful information.

GoAccess - Real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser


GoAccess is an open source real-time web log analyzer and interactive viewer that runs in a terminal on *nix systems or through your browser. It provides fast and valuable HTTP statistics for system administrators that require a visual server report on the fly. It supports nearly all web log formats (Apache, Nginx, Amazon S3, Elastic Load Balancing, CloudFront, etc)

RocketMQ - Distributed messaging and streaming data platform


Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

nxlog - Multi platform Log management


nxlog is a modular, multi-threaded, high-performance log management solution with multi-platform support. In concept it is similar to syslog-ng or rsyslog but is not limited to unix/syslog only. It can collect logs from files in various formats, receive logs from the network remotely over UDP, TCP or TLS/SSL . It supports platform specific sources such as the Windows Eventlog, Linux kernel logs, Android logs, local syslog etc.

Fluentd - Data collector, Log Everything in JSON


Fluentd is an event collector system. It is a generalized version of syslogd, which handles JSON objects for its log messages. It collects logs from various data sources and writes them to files, database or other types of storages.

Graylog2 - Open Source Log Management


Graylog2 is an open source log management solution that stores your logs in ElasticSearch. It consists of a server written in Java that accepts your syslog messages via TCP, UDP or AMQP and stores it in the database. The second part is a web interface that allows you to manage the log messages from your web browser. Take a look at the screenshots or the latest release info page to get a feeling of what you can do with Graylog2.

jafka - a fast distributed publish-subscribe messaging system (mq)


a fast distributed publish-subscribe messaging system (mq)

Octopussy - Perl/XML Logs Analyzer, Alerter & Reporter


Octopussy is a Log analyzer tool. It analyzes the log, generates reports and alerts the admin. It has LDAP support to maintain users list. It exports report by Email, FTP & SCP. Scheduled reports could be generated. RRD tool to generate graphs.

Logsandra - log management using Cassandra


Logsandra is a log management application written in Python and using Cassandra as back-end. It is written as demo for cassandra but it is worth to take a look. It provides support to create your own parser.

BookKeeper - Replicated Log Service


BookKeeper is a replicated log service which can be used to build replicated state machines. A log contains a sequence of events which can be applied to a state machine. BookKeeper guarantees that each replica state machine will see all the same entries, in the same order. It provides back end support to distributed systems, such as messaging systems, coordination systems, filesystems, etc.

gnatsd - High-Performance server for NATS, the cloud native messaging system.


NATS Server is a simple, high performance open source messaging system for cloud native applications, IoT messaging, and microservices architectures. It implements a highly scalable and elegant publish-subscribe (pub/sub) distribution model. The performant nature of NATS make it an ideal base for building modern, reliable, scalable cloud native distributed systems.

Flume - Log management using HDFS


Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.