dynamometer - A tool for scale and performance testing of HDFS with a specific focus on the NameNode

  •        30

Dynamometer is a tool to performance test Hadoop's HDFS NameNode. The intent is to provide a real-world environment by initializing the NameNode against a production file system image and replaying a production workload collected via e.g. the NameNode's audit logs. This allows for replaying a workload which is not only similar in characteristic to that experienced in production, but actually identical. Dynamometer will launch a YARN application which starts a single NameNode and a configurable number of DataNodes, simulating an entire HDFS cluster as a single application. There is an additional workload job run as a MapReduce job which accepts audit logs as input and uses the information contained within to submit matching requests to the NameNode, inducing load on the service.

https://github.com/linkedin/dynamometer

Tags
Implementation
License
Platform

   




Related Projects

snakebite - A pure python HDFS client

  •    Python

Snakebite is a python library that provides a pure python HDFS client and a wrapper around Hadoops minicluster. The client uses protobuf for communicating with the NameNode and comes in the form of a library and a command line interface. Currently, the snakebite client supports most actions that involve the Namenode and reading data from DataNodes.Note: all methods that read data from a data node are able to check the CRC during transfer, but this is disabled by default because of performance reasons. This is the opposite behaviour from the stock Hadoop client.

Apache Tez - A Framework for YARN-based, Data Processing Applications In Hadoop

  •    Java

Apache Tez is an extensible framework for building high performance batch and interactive data processing applications, coordinated by YARN in Apache Hadoop. Tez improves the MapReduce paradigm by dramatically improving its speed, while maintaining MapReduce’s ability to scale to petabytes of data. Important Hadoop ecosystem projects like Apache Hive and Apache Pig use Apache Tez, as do a growing number of third party data access applications developed for the broader Hadoop ecosystem.

hadoop-hdfs - Mirror of Apache Hadoop HDFS

  •    Java

Mirror of Apache Hadoop HDFS

apex-core - Mirror of Apache Apex core

  •    Java

Apache Apex is a unified platform for big data stream and batch processing. Use cases include ingestion, ETL, real-time analytics, alerts and real-time actions. Apex is a Hadoop-native YARN implementation and uses HDFS by default. It simplifies development and productization of Hadoop applications by reducing time to market. Key features include Enterprise Grade Operability with Fault Tolerance, State Management, Event Processing Guarantees, No Data Loss, In-memory Performance & Scalability and Native Window Support.Please visit the documentation section.

Ambari - Monitor Hadoop Cluster

  •    Java

The Apache Ambari project is aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive, easy-to-use Hadoop management web UI backed by its RESTful APIs. The set of Hadoop components that are currently supported by Ambari includes HDFS, MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig, Sqoop.


dr-elephant - Performance monitoring and tuning tool for Apache Hadoop

  •    Java

Dr. Elephant is a performance monitoring and tuning tool for Hadoop and Spark. It automatically gathers all the metrics, runs analysis on them, and presents them in a simple way for easy consumption. Its goal is to improve developer productivity and increase cluster efficiency by making it easier to tune the jobs. It analyzes the Hadoop and Spark jobs using a set of pluggable, configurable, rule-based heuristics that provide insights on how a job performed, and then uses the results to make suggestions about how to tune the job to make it perform more efficiently.For more information on Dr. Elephant, check the wiki pages here.

spring-hadoop - Spring for Apache Hadoop is a framework for application developers to take advantage of the features of both Hadoop and Spring

  •    Java

The Spring for Apache Hadoop project provides extensions to Spring, Spring Batch, and Spring Integration to build manageable and robust pipeline solutions around Hadoop.Spring for Apache Hadoop extends Spring Batch by providing support for reading from and writing to HDFS, running various types of Hadoop jobs (Java MapReduce, Streaming, Hive, Spark, Pig) and using HBase. An important goal is to provide excellent support for non-Java based developers to be productive using Spring Hadoop and not have to write any Java code to use the core feature set.

serverless-artillery - Combine serverless with artillery and you get serverless-artillery for instant, cheap, and easy performance testing at scale

  •    Javascript

Combine serverless with artillery and you get serverless-artillery (a.k.a. serverless-artillery) for instant, cheap, and easy performance testing at scale. We were motivated to create this project in order to facilitate moving performance testing earlier and more frequently into our CI/CD pipelines such that the question wasn't 'whether...' but 'why wouldn't...' '...you automatically (acceptance and) perf test your system every time you check in?'.

Hadoop Common

  •    Java

Apache Hadoop is a framework for running applications on large clusters built of commodity hardware. Hadoop common supports other Hadoop subprojects

Hue - The open source Apache Hadoop UI

  •    Java

Hue is a Web application for interacting with Apache Hadoop. It supports a FileBrowser for accessing HDFS, JobBrowser for accessing MapReduce jobs (MR1/MR2-YARN), Job Designer for creating MapReduce/Streaming/Java jobs, HBase Browser for exploring and modifying HBase tables and data, Oozie App for submitting and scheduling workflows and bundles, A Pig/HBase/Sqoop2 shell, Beeswax application for executing Hive queries, Search app for querying Solr and Solr Cloud.

mr4c

  •    Java

MR4C is an implementation framework that allows you to run native code within the Hadoop execution framework. Pairing the performance and flexibility of natively developed algorithms with the unfettered scalability and throughput inherent in Hadoop, MR4C enables large-scale deployment of advanced data processing applications.If you get stuck, have questions, or would like to provide any feedback, please don’t hesitate to contact us at mr4c@googlegroups.com. Let’s do big things together.

minos - Minos is beyond a hadoop deployment system.

  •    Python

Minos is a distributed deployment and monitoring system. It was initially developed and used at Xiaomi to deploy and manage the Hadoop, HBase and ZooKeeper clusters used in the company. Minos can be easily extended to support other systems, among which HDFS, YARN and Impala have been supported in the current release. This is the command line client tool used to deploy and manage processes of various systems. You can use this client to perform various deployment tasks, e.g. installing, (re)starting, stopping a service. Currently, this client supports ZooKeeper, HDFS, HBase, YARN and Impala. It can be extended to support other systems. You can refer to the following Using Client to learn how to use it.

Apache Trafodion - Webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop.

  •    C++

Apache Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop. Trafodion builds on the scalability, elasticity, and flexibility of Hadoop. Trafodion extends Hadoop to provide guaranteed transactional integrity, enabling new kinds of big data applications to run on Hadoop.

Apache Tajo - A big data warehouse system on Hadoop

  •    Java

Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources.

gis-tools-for-hadoop - The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data

  •    

The GIS Tools for Hadoop are a collection of GIS tools that leverage the Spatial Framework for Hadoop for spatial analysis of big data. The tools make use of the Geoprocessing Tools for Hadoop toolbox, to provide access to the Hadoop system from the ArcGIS Geoprocessing environment. Start out by navigating to samples and following the instructions provided with each sample.There are also tutorials for using the GP tools and aggregation methods.

HBase - Hadoop database

  •    Java

HBase provides support to handle BigTable - billions of rows X millions of columns. It is a scalable, distributed, versioned, column-oriented store modeled after Google's Bigtable and runs on top of HDFS (Hadoop Distributed Filesystem). It features compression, in-memory operation per-column. Data could be replicated between the nodes. HBase is used in Facebook and Twitter.

Pylot - Performance & Scalability Testing of Web Services

  •    Python

Pylot is a free open source tool for testing performance and scalability of web services. It runs HTTP load tests, which are useful for capacity planning, benchmarking, analysis, and system tuning. Pylot generates concurrent load (HTTP Requests), verifies server responses, and produces reports with metrics. Tests suites are executed and monitored from a GUI or shell/console.

VoltDB - Fast Scalable SQL DBMS with ACID

  •    Java

VoltDB was specifically designed for contemporary software applications that are pushed beyond their limits by high volume data sources. VoltDB provides the ability to capture, store and process incoming data at millions of read/write operations per second. And VoltDB’s relational model opens that data to be analyzed in real-time, using familiar Business Intelligence tools, to identify data patterns and trends, spot anomalies, or perform tracking and alerting.

JMeter - Load and Performance tester

  •    Java

JMeter is a pure Java desktop application designed to load test functional behavior and measure performance. It may be used to test performance both on static and dynamic resources (files, Servlets, Perl scripts, Java Objects, Data Bases and Queries, FTP Servers and more). It can be used to simulate a heavy load on a server, network or object to test its strength or to analyze overall performance under different load types.





We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.