Lesson 4: High Availability and Disaster Recovery

High Availability (HA)

High Availability is about ensuring your application is accessible when users need it. It involves removing single points of failure across the stack. In the cloud, this typically means deploying resources across multiple Availability Zones (AZs) or regions.

Disaster Recovery (DR) Concepts

Disaster Recovery involves a set of policies, tools, and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster.

Key Metrics

Recovery Time Objective (RTO): The maximum acceptable delay between the interruption of service and the restoration of service.
Recovery Point Objective (RPO): The maximum acceptable amount of data loss measured in time.

DR Strategies in the Cloud

Backup and Restore: Cheapest option, highest RTO/RPO. Data is backed up and restored to new infrastructure when needed.
Pilot Light: A minimal version of the environment is always running, ready to scale up quickly in the event of a disaster.
Warm Standby: A scaled-down version of a fully functional environment is always running.
Multi-Site Active/Active: Zero downtime. Traffic is routed to multiple regions simultaneously.

Lesson 3: Cloud-Native Architecture Lesson 5: Hybrid and Multi-Cloud Strategies