Flume - Log management using HDFS

  •        3367

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.

http://flume.apache.org/
https://github.com/apache/flume

Tags
Implementation
License
Platform

   




Related Projects

Fluentd - Data collector, Log Everything in JSON

  •    Ruby

Fluentd is an event collector system. It is a generalized version of syslogd, which handles JSON objects for its log messages. It collects logs from various data sources and writes them to files, database or other types of storages.

nxlog - Multi platform Log management

  •    C

nxlog is a modular, multi-threaded, high-performance log management solution with multi-platform support. In concept it is similar to syslog-ng or rsyslog but is not limited to unix/syslog only. It can collect logs from files in various formats, receive logs from the network remotely over UDP, TCP or TLS/SSL . It supports platform specific sources such as the Windows Eventlog, Linux kernel logs, Android logs, local syslog etc.

Graylog2 - Open Source Log Management

  •    Java

Graylog2 is an open source log management solution that stores your logs in ElasticSearch. It consists of a server written in Java that accepts your syslog messages via TCP, UDP or AMQP and stores it in the database. The second part is a web interface that allows you to manage the log messages from your web browser. Take a look at the screenshots or the latest release info page to get a feeling of what you can do with Graylog2.

Epylog - a Syslog parser

  •    Python

Epylog is a syslog parser which runs periodically, looks at your logs, processes some of the entries in order to present them in a more comprehensible format, and then mails you the output. It is written specifically for large network clusters where a lot of machines (around 50 and upwards) log to the same loghost using syslog or syslog-ng.

oklog - A distributed and coördination-free log management system

  •    Go

I hoped to find the opportunity to continue developing OK Log after the spike of its creation. Unfortunately, despite effort, no such opportunity presented itself. Please look at OK Log for inspiration, and consider using the (maintained!) projects that came from it, ulid and run. OK Log is a distributed and coördination-free log management system for big ol' clusters. It's an on-prem solution that's designed to be a sort of building block: easy to understand, easy to operate, and easy to extend.


Octopussy - Perl/XML Logs Analyzer, Alerter & Reporter

  •    Perl

Octopussy is a Log analyzer tool. It analyzes the log, generates reports and alerts the admin. It has LDAP support to maintain users list. It exports report by Email, FTP & SCP. Scheduled reports could be generated. RRD tool to generate graphs.

GoAccess - Real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser

  •    C

GoAccess is an open source real-time web log analyzer and interactive viewer that runs in a terminal on *nix systems or through your browser. It provides fast and valuable HTTP statistics for system administrators that require a visual server report on the fly. It supports nearly all web log formats (Apache, Nginx, Amazon S3, Elastic Load Balancing, CloudFront, etc)

Kafka - A high-throughput distributed messaging system

  •    Java

Kafka provides a publish-subscribe solution that can handle all activity stream data and processing on a consumer-scale web site. This kind of activity (page views, searches, and other user actions) are a key ingredient in many of the social feature on the modern web. This data is typically handled by "logging" and ad hoc log aggregation solutions due to the throughput requirements. This kind of ad hoc solution is a viable solution to providing logging data to Hadoop.

syslog-ng - syslog-ng is an enhanced log daemon, supporting a wide range of input and output methods: syslog, unstructured text, queueing, SQL & NoSQL

  •    C

syslog-ng is an enhanced log daemon, supporting a wide range of input and output methods: syslog, unstructured text, message queues, databases (SQL and NoSQL alike), and more. For a brief introduction to configuring the syslog-ng application, see the quickstart guide.

Apache NiFi - An easy to use, powerful, and reliable system to process and distribute data

  •    Java

Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Data flow can be tracked and modified at run time. It automates the movement of data between disparate data sources and systems, making data ingestion fast, easy and secure. The project was created by the United States National Security Agency (NSA).

nxlog

  •    C

A multi-platform universal log collector and forwarder

Chainsaw - log viewer and analysis tool

  •    Java

Chainsaw is a companion application to Log4j written by members of the Log4j development community. Chainsaw can read log files formatted in Log4j's XMLLayout, receive events from remote locations, read events from a DB, it can even work with the JDK 1.4 logging events.

Kong - The Microservice API Gateway

  •    Lua

Kong is a cloud-native, fast, scalable, and distributed Microservice Abstraction Layer (also known as an API Gateway, API Middleware or in some cases Service Mesh). Backed by the battle-tested NGINX with a focus on high performance, Kong was made available as an open-source platform in 2015. Under active development, Kong is used in production at thousands of organizations from startups, Global 5000 and Government organizations.

Zenoss - Open Source IT Management

  •    Python

Zenoss Core is an open source IT monitoring product that delivers the functionality to effectively manage the configuration, health, performance of networks, servers and applications through a single, integrated software package.

Webalizer - fast web server log file analysis

  •    C

The Webalizer is a fast web server log file analysis program. It produces highly detailed, easily configurable usage reports in HTML format, for viewing with a standard web browser. It handles standard Common logfile format (CLF) server logs, several variations of the NCSA Combined logfile format, wu-ftpd/proftpd xferlog (FTP) format logs, Squid proxy server native format, and W3C Extended log formats.

Logsandra - log management using Cassandra

  •    Python

Logsandra is a log management application written in Python and using Cassandra as back-end. It is written as demo for cassandra but it is worth to take a look. It provides support to create your own parser.

flume - WE HAVE MOVED to Apache Incubator

  •    Java

WE HAVE MOVED to Apache Incubator. https://cwiki.apache.org/FLUME/ . Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.

X-Itools: Enterprise Collaboration

  •    Javascript

Enterprise Collaboration modules and strong Log Analysis modules

White-elephant - Hadoop log aggregator and dashboard

  •    Java

White Elephant is a Hadoop log aggregator and dashboard which enables visualization of Hadoop cluster utilization across users. The server is a JRuby web application. In a production environment it can be deployed to tomcat and reads aggregated usage data directly from Hadoop. This data is stored in an in-memory database provided by HyperSQL. Charting is provided by Rickshaw. This project is developed by LinkedIn.

liblogfaf - A library that logs messages using non-blocking UDP datagrams.

  •    C

liblogfaf (faf stands for fire-and-forget) is a dynamic library that is designed to be LD_PRELOAD-ed while starting a process that uses openlog() & syslog() functions to send syslog messages. It overrides logging functions to make log messages sent as UDP datagrams instead of getting written to /dev/log (which can block). This is useful for processes that call syslog() as part of their main execution flow and can therefore be easily broken when /dev/log buffer gets full, for example when the process that is expected to read from it (usually system syslog daemon like rsyslog or syslog-ng) stops doing that.Please note that liblogfaf should not be used in an environment where reliable log message delivery is required.