#High Availability

#High Availability

  • High Availability means Designing For failure . We should design a system such that it is available in case of any failure .
  •  Designing Multi Availability Zone (multi- AZ) :- It means deploying resources in more than one zone within same region . Each region has at least three AZ’s – geographically co-located & connected via low latency NW but isolated from each other .  
  • Back-up and DR Strategy :- Based on RTO and RPO
  • RTO :- Recovery Time Objective means within how much time since failure we want system back up and running as per accepted SLA .
  • RPO :- Recovery Point Objective simply means time till where data can be recovered .
  • DR strategy is about RTO (service up time) and RPO – point in time (data backup)
  • RDS can be deployed as multi AZ – data replication is synchronous . If primary fails , with failover trigger , standby instance can become primary .
  • If solution is deployed as multi Region , with high RPO and RTO snapshot of database can be taken and then copy to other region . Copy can take few hours . If shorter RPO , data migration service can be used to transfer data .
  • Four main strategies - 1) Back-up & Restore 2) Pilot Lite 3) Warm standby & 4) Multi site Active - Active

  1. #Back-up & Restore :- Cheapest , highest RPO & RTO , uses AWS Backup service , Can be used within Same Region Or Different Region

No alt text provided for this image
Single AZ failure and Restoration of another AZ in same Region

Below picture depicts backup across regions using #DLM (Data Lifecycle Manager Service)

· DLM is used to copy EBS vol snapshots stored in S3 to other region

· EFS can be restored to another region using AWS Backup service

· RDS can be restored to another region if we have implemented cross region automated backup which will be stored in S3 

No alt text provided for this image
Cross Region Restoration

2) #Pilot Light :-

  • Data is restored using continuous Asynchronous Data Replication bet primary and DR region hence faster recovery than backup . Here all read replicas / clusters used from diff. region.
  • Core infra already available (vpc , auto scaling group,(for load balancer) and aurora replica ) hence quick time to up services . These are always up.
  •  Application Servers are switched off . Ready for DR.
  •  Any changes made needs to be replicated in DR region in core infra. AWS CloudFormation service can help here.
  •  Aurora and DynamoDb can have global database / table that can be available even from diff region without any data replication.

No alt text provided for this image
Pilot Light in working

3) #Warm Stand-by :-

  • It uses features of Back up / Restore as well as Pilot Light .
  • The only diff. is it will have scale down resources of primary site in DR region (web and app servers also available in DR)
  • Scale down means if primary has multi AZ then DR will have only single AZ with tier 1 and 2 (web and app server) 

4) #Multi Site Active / Active :-

  • Lowest RPO & RTO
  • High Cost
  • Both sites are full scale and operational – multi region deployment
  • No DR as such
  • No failover triggered 




To view or add a comment, sign in

More articles by Rupali Giri

Others also viewed

Explore content categories