e10-001

RPO and RTO – Understanding the difference

Understanding the RPO and RTO helps you when you have to answer the question: How much downtime are you willing to tolerate? In worst-case-scenario how much data are you willing to loose?
 
What is RPO?

RPO – Recovery Point Objective  – it is the point in time to which systems and data must be recovered after an outage. It defines the amount of data loss that a business can endure.

How to understand that? Simple – if you take a nightly backup of your data your RPO is 24 hours, which means that in the worst case scenario you will loose 24 hours.

There are few general solutions for the RPO:

  • RPO of 24 hours – backups are created at an offsite tape library every night. The corrseponding recovery strategy is to restore data from the set of last backup tapes
  • RPO of 1 hour – shipping database logs to the remote site every hour.
  • RPO in order of minutes – mirroring data asynchronously to the remote site
  • Near zero RPO – mirroring data synchronously to a remote site

What is RTO?
 
RTO – Recovery Time Objective  – it is the time within which systems and applications must be recovered after and outage. It defines the amout of dowintime that a business can endure and survive.

There are few general solutions for the RTO:

  • RTO of 72 hours – restore from tapes available at a cold site
  • RTO of 12 hours – restore from tapes available at a hot site
  • RTO of few hours – Use of data vault at a hot site
  • RTO of a few seconds – cluster production servers with bidirectional mirroring (for example NetApp metro-cluster)

Explaination of the terms:
Data vault  – a repository at a remote site where data can be copied
Hot site – a site where an enterprise’s operations can be moved in the event of a disaster. The site has required hardware, OS, apps, network to perform business operations, and the euqipment is available and running at all times
Cold site – a site where an enteprise’s operations can be moved in the event of disaster, with mininum IT infrastructure and environmental facilities in place, but no activated

 RTO vs RPO

To understand the meaning of those two try to study this example:

When reviewing the disaster recovery plan for two data centers, you find that:

  • The copy of data at remote Site B will lag behind the production data at Site A by 5 minutes
  • It will take 2 hours after an outage at Site A to shift production to Site B. 
  • Three more hours will be needed to power up the servers, bring up the network, and redirect users to Site B.

 

What is the recovery point objective (RPO) of this plan?


What is the recovery time objective (RTO) of this plan? 

Disk Drive Performance

A disk drive is an electromechanical device that govern the performance of the storage system environment.

Basic Disc Drive Components

  • Platter – it is a circular disk that the data is recorded on in binary codes. The typical HDD cnsist of more than one platter
  • Spindle – a spindle connects all the platters and is connected to a motor. The motor of the spindle rotates with a constant speed (revolutions per minute). The most common speeds are:
    • 5400 rpm
    • 7200 rpm
    • 10000 rpm
    • 15000 rpm
  •  Read/Write Head – Each platter have two R/W heads – one for each surface of the platter. The R/W head changes the magnetic polarization on the surface of the platter when writing data
  • Actuator Arm Assembly – R/W heads are mounted to the actuator arm assembly, which position the R/W head for the location on the platter where the data needs to be written or read.

Disk Service Time

Seek Time  

Also called access time. Seek Time describes the time taken to position the R/W head across the platter with a radial movement (moving along the radius of the platter). So to speak, seek time is the time taken to position and settle the arm and the head over the correct track.
The average seek time on a modern disk is typically in the range of 3 to 15 milliseconds.  High seek time has a big impact on the read operation of random tracks. To minimize the seek time, data can be written to only a part of available space in cylinders. This results in lower usable capacity, and is known as short-stroking the drive.

Rotational Latency

To access data, the actuator arm moves the R/W head over the platter to a particular track while the platter spins to position the requested sector under the R/W head. The time taken by the platter to rotate and position the data under the R/W head is called rotational latency.
As you can notice, the average rotational latency depends on rpm of the disk. For example and average rotational latency for 5,400-rpm disk is about 5.5 ms, while for a 15,000-rpm disk is about 2.0 ms.

Data Transfer Rate

In a read operation the data first moves from the disk platters to R/W heads. Then it moves to the drive’s internal buffer. Finally data moves from the buffer thru the interface to the host HBA.
In a write operation the data moves from the HBA to the internal buffer of the disk thru the drive’s interface. The data then moves from the buffer to the R/W heads. Finally, it moves from the R/W heads to the platters. The data transfer rate is the average amonut of data per unit time that the drive can deliver to the HBA

Internal Transfer Rate

Internatl transfer rate is the speed at which data moves from platter’s surface to the internal buffer (cache) of the disk.

Check your knowledge

What describes a landing zone in a disk drive?

A. Area on which the read/write head rests
B. Area where the read/write head lands to access data
C. Area where the data is buffered before writing to platters
D. Area where sector-specific information is stored on the disk

What defines the time taken to position the read/write head across the platter with a radial movement in a disk drive?

A. Seek time
B. Rotational latency
C. Data transfer time
D. Service time

How is the internal transfer rate of disk drives defined?

A. Speed at which data moves from the read/write head to the platter
B. Speed at which data moves from a platter’s surface to the internal buffer
C. Speed at which data moves from internal buffer to the host interface
D. Speed at which data moves from the innermost cylinder to the read/write head

Key Characteristics of a Data Center

When it comes to the Data Center we always hear terms like Availability, Security, Data integrity etc.. I would like to explain it a little bit more, so you can actually understand what those terms mean:

Manageability – that should be the first one. A data center should provide easy and integrated management of all its elements. That can be achieved thru automation and reduction of human intervention in common tasks.

Availability – a data center should ensure the availability of information when required. What does it mean? Well it simply means no downtime. Unavailability of information could cost a lot of money per hour to business.

Security – all the policies, procedures and core element integration gather together to prevent unauthorized access to the information.

Scalability simple – build an infrastructure that can grow. Business growth almost always requires deploying more severs, new applications, additional databases etc.

Performance –  first you have to establish service levels. Performance management is to make sure that all the elements of the DC provide optimal performance to the required service levels.

Data integrity – make sure that data is stored and retrieved exactly as it was received.

Capacity – when capacity requires increase, the data center must provide additional capacity without interrupting availability or with minimal disruption.

Monitoring – it is a continuous process of gathering information on vaiuos elemnts and sevices running in the data center. The reason  is obvious – to predict unpredictable 🙂

Reporting – an resource perfomance, capacity and utilization gathered together in a point of time.

Provisioning – it is a process of providing the hardware, software and other resources required to run a data center.

Check your knowledge

Which data center requirement refers to applying mechanisms that ensure data is stored and retrieved as it was received?

A. Integrity
B. Availability
C. Security
D. Performance

What is the process of continuously gathering information on various elements and services in a data center? 

A. Reporting
B. Alerting
C. Provisioning
D. Monitoring