genie - Distributed Big Data Orchestration Service

  •        34

Genie is a federated job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.See the official website to find documentation about Genie and specific documentation for various releases.



Related Projects

spring-beer-sample - Spring Data Couchbase & Spring Boot Sample Application

This is a simple demo application which shows how to use Spring Data Couchbase and Spring Boot together agains the beer-sample couchbase sample dataset.Start the application either by deploying it or just running the example.Boot#main() method in your IDE. Navigate the browser to /beer to get started. Change the DatabaseConfig configuration file according to your environment.

handlebars-spring-boot-starter - Spring Boot auto-configuration for Handlebars

Spring Boot Starter support for (logic-less templates).Spring Boot Starter Handlebars will automatically register handlebars helpers based on project dependencies. Add any handlebars helper to dependencies and you can start using it.

fuel-web - Fuel UI

fuel-web (nailgun) implements REST API and deployment data management. It manages disk volumes configuration data, networks configuration data and any other environment specific data which are necessary for successful deployment. It has required orchestration logic to build instructions for provisioning and deployment in a right order. Nailgun uses SQL database to store its data and AMQP service to interact with workers.

bistro - Bistro is a flexible distributed scheduler, a high-performance framework supporting multiple paradigms while retaining ease of configuration, management, and monitoring

This README is a very abbreviated introduction to Bistro. Visit for a more structured introduction, and for the docs.Bistro is a toolkit for making distributed computation systems. It can schedule and run distributed tasks, including data-parallel jobs. It enforces resource constraints for worker hosts and data-access bottlenecks. It supports remote worker pools, low-latency batch scheduling, dynamic shards, and a variety of other possibilities. It has command-line and web UIs.

go-springcloud - Go support for Spring Cloud (Configuration, Discovery/Netflix Eureka & More)

Go support for Spring Cloud (Configuration, Discovery/Netflix Eureka & More)

serf - Service orchestration and management tool

Serf is a decentralized solution for service discovery and orchestration that is lightweight, highly available, and fault tolerant.Serf runs on Linux, Mac OS X, and Windows. An efficient and lightweight gossip protocol is used to communicate with other nodes. Serf can detect node failures and notify the rest of the cluster. An event system is built on top of Serf, letting you use Serf's gossip protocol to propagate events such as deploys, configuration changes, etc. Serf is completely masterless with no single point of failure.

solar - Resource management and orchestration engine for distributed systems

Solar is a resource manager and orchestration engine for distributed systems.

CFILE (Configuration File Management)

CFILE is a distributed configuration file management system. Unlike many systems that try to provide a nice way to configure individual machines, CFILE provides a way to keep configuration consistant across an entrie group of machines.

VoltDB - Fast Scalable SQL DBMS with ACID

VoltDB was specifically designed for contemporary software applications that are pushed beyond their limits by high volume data sources. VoltDB provides the ability to capture, store and process incoming data at millions of read/write operations per second. And VoltDB’s relational model opens that data to be analyzed in real-time, using familiar Business Intelligence tools, to identify data patterns and trends, spot anomalies, or perform tracking and alerting.

spring-boot-etcd - An etcd client for accessing and storing configuration values in an etcd cluster

Spring-boot-etcd is a Spring Boot library that provides an etcd client to access and manage key-value pairs stored in an etcd cluster. It's useful out-of-the-box.We created zalando-boot-etcd after discovering that current implementations of the etcd client API don’t provide an automatic update mechanism, or use other libraries like Netty to communicate with the etcd cluster. Also, these implementations are not compatible the version of Netty used by the Cassandra driver.

iep - Insight Engineering Platform Components

Set of base libraries used primarily by the Insight Engineering team at Netflix to support applications that need to run internally and externally.Over time some of this functionality was extracted into standalone libraries, many of which have been open sourced as NetflixOSS Common Runtime Services and Libraries. Examples are Archaius (configuration), Eureka (discovery service), Karyon (base server), Ribbon (Eureka aware HTTP client), Governator (dependency injection), and blitz4j (logging).

RocketMQ - Distributed messaging and streaming data platform

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

HPCC System - Hadoop alternative

HPCC is a proven and battle-tested platform for manipulating, transforming, querying and data warehousing Big Data. It supports two type of configuration. Thor is responsible for consuming vast amounts of data, transforming, linking and indexing that data. It functions as a distributed file system with parallel processing power spread across the nodes. Roxie, the Data Delivery Engine, provides separate high-performance online query processing and data warehouse capabilities.

etcd - Distributed reliable key-value store for the most critical data of a distributed system

etcd is a distributed, consistent key-value store for shared configuration and service discovery. It is simple, secure, fast and reliable. it uses the Raft consensus algorithm to manage a highly-available replicated log.

optimus - A framework for building your own configuration management tool

There are Puppet, Chef, SaltStack and Ansible. These are good products, so why create another configuration management tool? Sure, all of them have minor downsides, but that's not enough for kind of reinventing the wheel. But there is one property that they all share: They are tools that you feed with configuration files and scripts. They are big machines with thousands of knobs and switches. Imagine a factory intended for building every model of cars that is available nowadays. How huge and complex would it be? Could you really build ALL models? How many robotic arms would be hanging useless, because they are not required for the current model? Wouldn't it be better to just pick the robotic arms that you need and combine them into a relatively small and effective factory? That's the idea behind Optimus. It is not a tool, it is a framework for building your own configuration management tool.Proof of concept.

Doozer - A consistent distributed data store

Doozer is a highly-available, completely consistent store for small amounts of extremely important data. When the data changes, it can notify connected clients immediately (no polling), making it ideal for infrequently-updated data for which clients want real-time updates. Doozer is good for name service, database master elections, and configuration data shared between several machines.

bigdata - Introduction to Big Data

Download the book in PDF or EPUB.Just like Internet, Big Data is part of our lives today. From search, online shopping, video on demand, to e-dating, Big Data always plays an important role behind the scene. Some people claim that Internet of things (IoT) will take over big data as the most hyped technology @Gartner2014. It may become true. But IoT cannot come alive without big data. In this book, we will dive deeply into big data technologies. But we need to understand what is Big Data first.

ConcourseDB - Self-tuning database designed for both transactions and ad hoc analytics across time

ConcourseDB is a distributed self-tuning database with automatic indexing, version control and ACID transactions. ConcourseDB provides a more intuitive approach to data management that is easy to deploy, access and scale while maintaining the strong consistency of traditional database systems.

razor-server - Razor is next generation provisioning software that handles bare metal hardware and virtual server provisioning

Razor is an advanced provisioning application which can deploy both bare-metal and virtual systems. It's aimed at solving the problem of how to bring new metal into a state where your existing DevOps/configuration management workflows can take it over.Newly added machines in a Razor deployment will PXE-boot from a special Razor Microkernel image, then check in, provide Razor with inventory information, and wait for further instructions. Razor will consult user-created policy rules to choose which tasks to apply to a new node, which will begin to follow the task directions, giving feedback to Razor as it completes various steps. Tasks can include steps for handoff to a DevOps system such as Puppet or to any other system capable of controlling the node (such as a vCenter server taking possession of ESX systems).

ansible - Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy

Ansible is a radically simple IT automation system. It handles configuration-management, application deployment, cloud provisioning, ad-hoc task-execution, and multinode orchestration - including trivializing things like zero downtime rolling updates with load balancers.Many users run straight from the development branch (it's generally fine to do so), but you might also wish to consume a release.