common-disaster-recovery-scenarios - A list of common Disaster Recovery (DR) scenarios for software companies

  •        9

This is a list of common Disaster Recovery scenarios for software companies. It is nearly-impossible to cover all the scenarios that can happen. However, this list should include some common scenarios that can help companies kick-start their own set of policies.

https://github.com/dastergon/common-disaster-recovery-scenarios

Tags
Implementation
License
Platform

   




Related Projects

Barman - Backup and Recovery manager for PostgreSQL

  •    Python

Barman (Backup and Recovery Manager) is an open source administration tool for disaster recovery of PostgreSQL servers . It allows your organisation to perform remote backups of multiple servers in business critical environments and to help DBAs during the recovery phase. Its features include backup catalogues, incremental backup, retention policies, remote backup and recovery, archiving and compression of WAL files and backups.

Relax-and-Recover

  •    Shell

Linux disaster recovery and system migration solution

awesome-chaos-engineering - A curated list of awesome Chaos Engineering resources.

  •    

A curated list of awesome Chaos Engineering resources. Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production. - Principles Of Chaos Engineering website.

Litmus - Cloud-Native Chaos Engineering

  •    Go

Litmus is a toolset to do cloud-native chaos engineering. Litmus provides tools to orchestrate chaos on Kubernetes to help SREs find weaknesses in their deployments. SREs use Litmus to run chaos experiments initially in the staging environment and eventually in production to find bugs, vulnerabilities. Fixing the weaknesses leads to increased resilience of the system.


WAL-E - A S3 based WAL-shipping disaster recovery and standby toolkit

  •    Python

A S3 based WAL-shipping disaster recovery and standby toolkit

ark - Heptio Ark is a utility for managing disaster recovery, specifically for your Kubernetes cluster resources and persistent volumes

  •    Go

The documentation provides a getting started guide, plus information about building from source, architecture, extending Ark, and more. If you encounter issues, review the troubleshooting docs, file an issue, or talk to us on the #ark-dr channel on the Kubernetes Slack server.

school-of-sre - At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role

  •    HTML

Site Reliability Engineers (SREs) sits at the intersection of software engineering and systems engineering. While there are potentially infinite permutations and combinations of how infrastructure and software components can be put together to achieve an objective, focusing on foundational skills allows SREs to work with complex systems and software, regardless of whether these systems are proprietary, 3rd party, open systems, run on cloud/on-prem infrastructure, etc. Particularly important is to gain a deep understanding of how these areas of systems and infrastructure relate to each other and interact with each other. The combination of software and systems engineering skills is rare and is generally built over time with exposure to a wide variety of infrastructure, systems, and software. SREs bring in engineering practices to keep the site up. Each distributed system is an agglomeration of many components. SREs validate business requirements, convert them to SLAs for each of the components that constitute the distributed system, monitor and measure adherence to SLAs, re-architect or scale out to mitigate or avoid SLA breaches, add these learnings as feedback to new systems or projects and thereby reduce operational toil. Hence SREs play a vital role right from the day 0 design of the system.

awesome-scalability - Scalable, Available, Stable, Performant, and Intelligent System Design Patterns

  •    

An updated and curated list of readings to illustrate best practices and patterns in building scalable, available, stable, performant, and intelligent large-scale systems. Concepts are explained in the articles of prominent engineers and credible references. Case studies are taken from battle-tested systems that serve millions to billions of users. Understand your problems: scalability problem (fast for a single user but slow under heavy load) or performance problem (slow for a single user) by reviewing some design principles and checking how scalability and performance problems are solved at tech companies. The section of intelligence are created for those who work with data and machine learning at big (data) and deep (learning) scale.

cloud-ops-sandbox - Cloud Operations Sandbox is an open source tool that helps practitioners to learn Service Reliability Engineering practices from Google and apply them on their cloud services using Cloud Operations suite of tools

  •    HTML

Cloud Operations Sandbox is an open-source tool that helps practitioners to learn Service Reliability Engineering practices from Google and apply them on their cloud services using Cloud Operations (formerly Stackdriver). It is based on Hipster Shop, a cloud-native microservices application. Google Cloud Operations Suite is a suite of tools that helps you gain full observability of your code and applications. You might want to take Cloud Operations to a "test drive" in order to answer the question, "will it work for my application needs"? The most effective way to learn is by testing the tool in "real-life" conditions, but without risking a production system. With Sandbox, we provide a tool that automatically provisions a new demo cluster, which receives traffic, simulating real users. Practitioners can experiment with various Cloud Operations tools to solve problems and accomplish standard SRE tasks in a sandboxed environment.

Make CD-ROM Recovery

  •    C

Make CD-ROM Recovery (mkCDrec) makes a bootable (El Torito) disaster recovery image, including backups of the linux system to the same CD-ROM if space permits, or to a multi-volume CD-ROM set.

INSERT - Inside Security rescue toolkit

  •    

The Inside Security Rescue Toolkit is a multi-purpose disaster recovery and network analysis system. It runs from a credit card-sized CD-ROM for convenient transport or download. It has read-write support for NTFS-partitions, full partition handling supp

nodejs-storage - Node

  •    TypeScript

Node.js idiomatic client for Cloud Storage. Cloud Storage allows world-wide storage and retrieval of any amount of data at any time. You can use Google Cloud Storage for a range of scenarios including serving website content, storing data for archival and disaster recovery, or distributing large data objects to users via direct download.

Rook - Storage Orchestration for Kubernetes

  •    Go

Rook is an open source cloud-native storage orchestrator for Kubernetes, providing the platform, framework, and support for a diverse set of storage solutions to natively integrate with cloud-native environments.

OpenDJ - LDAPv3 compliant directory service

  •    Java

OpenDJ is a new LDAPv3 compliant directory service, providing a high performance, highly available and secure store for the identities managed by enterprises. Its easy installation process, combined with the power of the Java platform makes of OpenDJ the simplest and fastest directory server to deploy and manage.

postgres-operator-examples - Examples for deploying applications with PGO, the Postgres Operator from Crunchy Data

  •    Smarty

This repository contains examples for deploying PGO, the Postgres Operator from Crunchy Data, using a variety of examples. The examples are grouped by various tools that can be used to deploy them.

Disaster Management System

  •    Pascal

The Disaster Management System is a group of applications that will help in the orginazation and response to large scale disasters and emergencies. Currently, accountability of personell, equipment and vehicles are done on pen and paper. This software i

Sahana disaster management system

  •    Javascript

Sahana is a web based disaster / crisis / emergency management Please note that Sahana is no longer actively maintained on Sourceforge - please visit the main website to get the latest developments.

muxy - Chaos engineering tool for simulating real-world distributed system failures

  •    Go

Proxy for simulating real-world distributed system failures to improve resilience in your applications.Muxy is a proxy that mucks with your system and application context, operating at Layers 4, 5 and 7, allowing you to simulate common failure scenarios from the perspective of an application under test; such as an API or a web application.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.