Flume - Log management using HDFS

  •        3208

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.

http://flume.apache.org/
https://github.com/apache/flume

Tags
Implementation
License
Platform

   




Related Projects

Fluentd - Data collector, Log Everything in JSON


Fluentd is an event collector system. It is a generalized version of syslogd, which handles JSON objects for its log messages. It collects logs from various data sources and writes them to files, database or other types of storages.

nxlog - Multi platform Log management


nxlog is a modular, multi-threaded, high-performance log management solution with multi-platform support. In concept it is similar to syslog-ng or rsyslog but is not limited to unix/syslog only. It can collect logs from files in various formats, receive logs from the network remotely over UDP, TCP or TLS/SSL . It supports platform specific sources such as the Windows Eventlog, Linux kernel logs, Android logs, local syslog etc.

Graylog2 - Open Source Log Management


Graylog2 is an open source log management solution that stores your logs in ElasticSearch. It consists of a server written in Java that accepts your syslog messages via TCP, UDP or AMQP and stores it in the database. The second part is a web interface that allows you to manage the log messages from your web browser. Take a look at the screenshots or the latest release info page to get a feeling of what you can do with Graylog2.

Epylog - a Syslog parser


Epylog is a syslog parser which runs periodically, looks at your logs, processes some of the entries in order to present them in a more comprehensible format, and then mails you the output. It is written specifically for large network clusters where a lot of machines (around 50 and upwards) log to the same loghost using syslog or syslog-ng.

Octopussy - Perl/XML Logs Analyzer, Alerter & Reporter


Octopussy is a Log analyzer tool. It analyzes the log, generates reports and alerts the admin. It has LDAP support to maintain users list. It exports report by Email, FTP & SCP. Scheduled reports could be generated. RRD tool to generate graphs.



Kafka - A high-throughput distributed messaging system


Kafka provides a publish-subscribe solution that can handle all activity stream data and processing on a consumer-scale web site. This kind of activity (page views, searches, and other user actions) are a key ingredient in many of the social feature on the modern web. This data is typically handled by "logging" and ad hoc log aggregation solutions due to the throughput requirements. This kind of ad hoc solution is a viable solution to providing logging data to Hadoop.

Apache NiFi - An easy to use, powerful, and reliable system to process and distribute data


Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Data flow can be tracked and modified at run time. It automates the movement of data between disparate data sources and systems, making data ingestion fast, easy and secure. The project was created by the United States National Security Agency (NSA).

Chainsaw - log viewer and analysis tool


Chainsaw is a companion application to Log4j written by members of the Log4j development community. Chainsaw can read log files formatted in Log4j's XMLLayout, receive events from remote locations, read events from a DB, it can even work with the JDK 1.4 logging events.

nxlog


A multi-platform universal log collector and forwarder

Kong - The Microservice API Gateway


Kong is a cloud-native, fast, scalable, and distributed Microservice Abstraction Layer (also known as an API Gateway, API Middleware or in some cases Service Mesh). Backed by the battle-tested NGINX with a focus on high performance, Kong was made available as an open-source platform in 2015. Under active development, Kong is used in production at thousands of organizations from startups, Global 5000 and Government organizations.

flume-syslog-source2 - Improved Syslog source for Flume


This project defines a plugin for Flume containing an improved Syslog source.It parses RFC 3164 (BSD Syslog) as well as the newer RFC 5424 (Syslog.) The source goes to a greater length to create a Flume event than the built-in version. All fields possible are parsed, and the body is just the body of the message. Even the Tag and PID fields are parsed. This should leave clean and normalized log messages, at the expense of more CPU cycles.

Zenoss - Open Source IT Management


Zenoss Core is an open source IT monitoring product that delivers the functionality to effectively manage the configuration, health, performance of networks, servers and applications through a single, integrated software package.

Webalizer - fast web server log file analysis


The Webalizer is a fast web server log file analysis program. It produces highly detailed, easily configurable usage reports in HTML format, for viewing with a standard web browser. It handles standard Common logfile format (CLF) server logs, several variations of the NCSA Combined logfile format, wu-ftpd/proftpd xferlog (FTP) format logs, Squid proxy server native format, and W3C Extended log formats.

Logsandra - log management using Cassandra


Logsandra is a log management application written in Python and using Cassandra as back-end. It is written as demo for cassandra but it is worth to take a look. It provides support to create your own parser.

X-Itools: Enterprise Collaboration


Enterprise Collaboration modules and strong Log Analysis modules

ploddle - A syslog-compatible log collector, with built-in web interface for searching


A syslog-compatible log collector, with built-in web interface for searching

White-elephant - Hadoop log aggregator and dashboard


White Elephant is a Hadoop log aggregator and dashboard which enables visualization of Hadoop cluster utilization across users. The server is a JRuby web application. In a production environment it can be deployed to tomcat and reads aggregated usage data directly from Hadoop. This data is stored in an in-memory database provided by HyperSQL. Charting is provided by Rickshaw. This project is developed by LinkedIn.

liblogfaf - A library that logs messages using non-blocking UDP datagrams.


liblogfaf (faf stands for fire-and-forget) is a dynamic library that is designed to be LD_PRELOAD-ed while starting a process that uses openlog() & syslog() functions to send syslog messages. It overrides logging functions to make log messages sent as UDP datagrams instead of getting written to /dev/log (which can block). This is useful for processes that call syslog() as part of their main execution flow and can therefore be easily broken when /dev/log buffer gets full, for example when the process that is expected to read from it (usually system syslog daemon like rsyslog or syslog-ng) stops doing that.Please note that liblogfaf should not be used in an environment where reliable log message delivery is required.

syslog - Send CloudWatch Logs Securely to Syslog via a Lambda function


Effortlessly forward all your container logs to a Syslog server.On AWS, CloudWatch Logs is the utility log service, offering high ingestion throughput and cheap storage. A Logs Subscription Filter coordinates the deceptively tough job of delivering every log to a custom Lambda Function log processor and syslog forwarder.