- High Availability means Designing For failure . We should design a system such that it is available in case of any failure .
- Designing Multi Availability Zone (multi- AZ) :- It means deploying resources in more than one zone within same region . Each region has at least three AZ’s – geographically co-located & connected via low latency NW but isolated from each other .
- Back-up and DR Strategy :- Based on RTO and RPO
- RTO :- Recovery Time Objective means within how much time since failure we want system back up and running as per accepted SLA .
- RPO :- Recovery Point Objective simply means time till where data can be recovered .
- DR strategy is about RTO (service up time) and RPO – point in time (data backup)
- RDS can be deployed as multi AZ – data replication is synchronous . If primary fails , with failover trigger , standby instance can become primary .
- If solution is deployed as multi Region , with high RPO and RTO snapshot of database can be taken and then copy to other region . Copy can take few hours . If shorter RPO , data migration service can be used to transfer data .
- Four main strategies - 1) Back-up & Restore 2) Pilot Lite 3) Warm standby & 4) Multi site Active - Active
- #Back-up & Restore :- Cheapest , highest RPO & RTO , uses AWS Backup service , Can be used within Same Region Or Different Region
Below picture depicts backup across regions using #DLM (Data Lifecycle Manager Service)
· DLM is used to copy EBS vol snapshots stored in S3 to other region
· EFS can be restored to another region using AWS Backup service
· RDS can be restored to another region if we have implemented cross region automated backup which will be stored in S3
- Data is restored using continuous Asynchronous Data Replication bet primary and DR region hence faster recovery than backup . Here all read replicas / clusters used from diff. region.
- Core infra already available (vpc , auto scaling group,(for load balancer) and aurora replica ) hence quick time to up services . These are always up.
- Application Servers are switched off . Ready for DR.
- Any changes made needs to be replicated in DR region in core infra. AWS CloudFormation service can help here.
- Aurora and DynamoDb can have global database / table that can be available even from diff region without any data replication.
- It uses features of Back up / Restore as well as Pilot Light .
- The only diff. is it will have scale down resources of primary site in DR region (web and app servers also available in DR)
- Scale down means if primary has multi AZ then DR will have only single AZ with tier 1 and 2 (web and app server)
4) #Multi Site Active / Active :-
- Lowest RPO & RTO
- High Cost
- Both sites are full scale and operational – multi region deployment
- No DR as such
- No failover triggered