Displaying 1 to 20 from 33 results

awesome-scalability - Scalable, Available, Stable, Performant, and Intelligent System Design Patterns

  •    

An updated and curated list of readings to illustrate best practices and patterns in building scalable, available, stable, performant, and intelligent large-scale systems. Concepts are explained in the articles of prominent engineers and credible references. Case studies are taken from battle-tested systems that serve millions to billions of users. Understand your problems: scalability problem (fast for a single user but slow under heavy load) or performance problem (slow for a single user) by reviewing some design principles and checking how scalability and performance problems are solved at tech companies. The section of intelligence are created for those who work with data and machine learning at big (data) and deep (learning) scale.

school-of-sre - At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role

  •    HTML

Site Reliability Engineers (SREs) sits at the intersection of software engineering and systems engineering. While there are potentially infinite permutations and combinations of how infrastructure and software components can be put together to achieve an objective, focusing on foundational skills allows SREs to work with complex systems and software, regardless of whether these systems are proprietary, 3rd party, open systems, run on cloud/on-prem infrastructure, etc. Particularly important is to gain a deep understanding of how these areas of systems and infrastructure relate to each other and interact with each other. The combination of software and systems engineering skills is rare and is generally built over time with exposure to a wide variety of infrastructure, systems, and software. SREs bring in engineering practices to keep the site up. Each distributed system is an agglomeration of many components. SREs validate business requirements, convert them to SLAs for each of the components that constitute the distributed system, monitor and measure adherence to SLAs, re-architect or scale out to mitigate or avoid SLA breaches, add these learnings as feedback to new systems or projects and thereby reduce operational toil. Hence SREs play a vital role right from the day 0 design of the system.

runbook - A framework for gradual system automation

  •    Ruby

See our blog post for the philosophy behind Runbook and an overview of its features. Runbook provides a DSL for specifying a series of steps to execute an operation. Once your runbook is specified, you can use it to generate a formatted representation of the book or to execute the runbook interactively. For example, you can export your runbook to markdown or use the same runbook to execute commands on remote servers.




cloud-ops-sandbox - Cloud Operations Sandbox is an open source tool that helps practitioners to learn Service Reliability Engineering practices from Google and apply them on their cloud services using Cloud Operations suite of tools

  •    HTML

Cloud Operations Sandbox is an open-source tool that helps practitioners to learn Service Reliability Engineering practices from Google and apply them on their cloud services using Cloud Operations (formerly Stackdriver). It is based on Hipster Shop, a cloud-native microservices application. Google Cloud Operations Suite is a suite of tools that helps you gain full observability of your code and applications. You might want to take Cloud Operations to a "test drive" in order to answer the question, "will it work for my application needs"? The most effective way to learn is by testing the tool in "real-life" conditions, but without risking a production system. With Sandbox, we provide a tool that automatically provisions a new demo cluster, which receives traffic, simulating real users. Practitioners can experiment with various Cloud Operations tools to solve problems and accomplish standard SRE tasks in a sandboxed environment.

marmot - Marmot workflow execution engine

  •    Go

Marmot is a service for processing workflows targeting DevOps/SRE needs.NOTICE This product is still in development and is not production ready.

docker-base-images - Base docker images for Apiary applications

  •    Python

This repository helps keep Apiary environment, services and libraries consistent and in sync. Each directory has an area of responsibility and the libraries should derive from the respective image.

heroku-datadog-drain-golang - Funnel metrics from multiple Heroku apps into DataDog using statsd.

  •    Go

Funnel metrics from multiple Heroku apps into DataDog using statsd. OPTIONAL: Setup Heroku build packs, including the Datadog DogStatsD client. If you already have a StatsD client running, see the STATSD_URL configuration option below.


s3-streaming-upload - s3-streaming-upload is node

  •    CoffeeScript

s3-streaming-upload is node.js library that listens to your stream and upload its data to Amazon S3. It is heavily inspired by knox-mpu, but unlike it, it does not buffer data to disk and is build on top of official AWS SDK instead of knox.

kapo - Wrap any command in a status socket

  •    Go

Wrap any command in a status socket. When a program is executed under kapo a JSON-speaking HTTP server is started and the state of the process is reported to whomever requests it.

Sia-EventUI - User interface for SIA event timeline

  •    Javascript

See the Root repository for full project information and an overview of how services fit together. SIA is configured for Wepback's hot module reloading, so changes should automatically appear in your browser.

provision - Digital Rebar Provision is a simple and powerful Golang executable that provides a complete API-driven DHCP/PXE/TFTP provisioning system

  •    Go

simple, fast and open API-driven server provisioning. Digital Rebar Provision (DRP) is a APLv2 simple Golang executable that provides a simple yet complete API-driven DHCP/PXE/TFTP provisioning and workflow system.

sre-interview-prep-guide - Site Reliability Engineer Interview Preparation Guide

  •    

This repository is an attempt to consolidate useful resources for Site Reliability Engineer (SRE) interview preparation.

gansoi - :alien: Awesome Infrastructure Monitoring and Alerting

  •    Go

Gansoi is a modern monitoring solution inspired by classical network monitoring software (Nagios and friends) updated to modern and best practices. At the moment this software is unusable for anyone but early adopters and developers. It's a just-working proof-of-concept.

stackdriver-sandbox - Stackdriver Sandbox is an open source tool that helps practitioners to learn Service Reliability Engineering practices from Google and apply them on their cloud services using Stackdriver

  •    CSharp

Stackdriver Sandbox is an open-source tool that helps practitioners to learn Service Reliability Engineering practices from Google and apply them on their cloud services using Stackdriver. It is based on Hipster Shop, a cloud-native microservices application. Google Stackdriver is a suite of tools that helps you gain full observability of your code and applications. You might want to take Stackdriver to a "test drive" in order to answer the question, "will it work for my application needs"? The most effective way to learn is by testing the tool in "real-life" conditions, but without risking a production system. With Sandbox, we provide a tool that automatically provisions a new demo cluster, which receives traffic, simulating real users. Practicioners can experiment with various Stackdriver tools to solve problems and accomplish standard SRE tasks in a sandboxed environment.

skinny - The Skinny Distributed Lock Service

  •    Go

Skinny comes with few code dependencies. A Skinny instance is started by running skinnyd, preferably with the --config option.

protected-s3 - Simple application that allows you to display the content of your S3 to authorised users only

  •    Javascript

Simple application that allows you to display the content of your S3 to authorised users only.

ivy - A Node.js queue library focused on easy, yet flexible task execution.

  •    CoffeeScript

Ivy is node.js queue library focused on easy, yet flexible task execution. There are a lot of parts and components in distributed environment. This is how Ivy understands them.

prometheus-slo-burn-example - An end to end example of implementing SLOs with prometheus, grafana and Go

  •    Go

This is home to example code for exposing SLIs using open source code in prometheus. This means that now you can visit http://localhost:30431 and see the grafana dashboard.

index-digest - Analyses your database queries and schema and suggests indices and schema improvements

  •    Python

Analyses your database queries and schema and suggests indices improvements. You can use index-digest as your database linter. The goal is to provide the user with actionable reports instead of just a list of statistics and schema details. Inspired by Percona's pt-index-usage. This tool supports MySQL 5.5, 5.6, 5.7, 8.0 and MariaDB 10.0, 10.2 and runs under Python 2.7, 3.4, 3.5 and 3.6.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.