Understanding the RPO and RTO helps you when you have to answer the question: How much downtime are you willing to tolerate? In worst-case-scenario how much data are you willing to loose?
What is RPO?
RPO – Recovery Point Objective – it is the point in time to which systems and data must be recovered after an outage. It defines the amount of data loss that a business can endure.
How to understand that? Simple – if you take a nightly backup of your data your RPO is 24 hours, which means that in the worst case scenario you will loose 24 hours.
There are few general solutions for the RPO:
- RPO of 24 hours – backups are created at an offsite tape library every night. The corrseponding recovery strategy is to restore data from the set of last backup tapes
- RPO of 1 hour – shipping database logs to the remote site every hour.
- RPO in order of minutes – mirroring data asynchronously to the remote site
- Near zero RPO – mirroring data synchronously to a remote site
What is RTO?
RTO – Recovery Time Objective – it is the time within which systems and applications must be recovered after and outage. It defines the amout of dowintime that a business can endure and survive.
There are few general solutions for the RTO:
- RTO of 72 hours – restore from tapes available at a cold site
- RTO of 12 hours – restore from tapes available at a hot site
- RTO of few hours – Use of data vault at a hot site
- RTO of a few seconds – cluster production servers with bidirectional mirroring (for example NetApp metro-cluster)
Explaination of the terms:
Data vault – a repository at a remote site where data can be copied
Hot site – a site where an enterprise’s operations can be moved in the event of a disaster. The site has required hardware, OS, apps, network to perform business operations, and the euqipment is available and running at all times
Cold site – a site where an enteprise’s operations can be moved in the event of disaster, with mininum IT infrastructure and environmental facilities in place, but no activated
RTO vs RPO
To understand the meaning of those two try to study this example:
When reviewing the disaster recovery plan for two data centers, you find that:
- The copy of data at remote Site B will lag behind the production data at Site A by 5 minutes
- It will take 2 hours after an outage at Site A to shift production to Site B.
- Three more hours will be needed to power up the servers, bring up the network, and redirect users to Site B.
What is the recovery point objective (RPO) of this plan?
What is the recovery time objective (RTO) of this plan?