Storage theory

EMC VNX hardware overview

The VNX series is the family of midrange-to-enterprise products. It unifies file-based and block-based offering into a single product.

In this post I would like to describe hardware components a little bit more.

Lets take VNX5500 as an example:

Figure 1. Block and File (Unified) VNX5500 platform

Figure 1. Block and File (Unified) VNX5500 platform

Disk-processor enclosure (DPE)

The enclosure is 3U in size and houses each storage processor and the first tray of disk. It can be found on the figure above with 25 (2.5 inch) disks. Let’s take a look at rear view:

Figure 3. Back view of DEP with SP A (right) and SPB (left)

Figure 2. Back view of DEP with SP A (right) and SPB (left)

Storage processors (SPs)

SP support block data with UltraFles I/O technology – supporting Fibre Channel, iSCSI and FCoE protocols. Storage Processors provide access for all external hosts and the file side of the VNX array.

Figure 4.  Closeup of DPE-based storage processor

Figure 3. Closeup of DPE-based storage processor

On above example SPs are located in Disk Processor Enclosure (DPE).

Storage processor enclosure (SPE)

The enclosure is 2U in size and houses each storage processor. It can be found in VNX5700 and VNX7500 models, support maximum of 500 (VNX5700) and 1,000 drives (VNX7500)

 Control Station

Control Station are 1U in size and provide management functions to the file-side components (referred to as Blades or Data Movers). The CS is responsible for Blade failover. To provide HA Control Station might be configured with a matching secondary CS.

Figure 4. Control Station rear view

Figure 4. Control Station rear view

Data Mover

Data Mover (or Blade) access data from the back-end and provide host access using the same UltraFlex I/o technology that supports NFS, CIFS, pNFS protocols. The Data Movers in each array are scalable and provide redundancy.

Data Mover Enclosure (DME)

The DME is 2U in size and houses the Data Movers (Blades).  The DME is similar in front to SPE and is used on all VNX models that support file.

Figure 5. DME rear view

Figure 5. DME rear view

Data protection – NetApp way

When I say data protection I mean the features to back up data and to be able to recover it when needed. Basically you need to back up data for the following reasons:

  • to protect data from accidentally deleted files, application crashers, viruses, data corruption etc.
  • to archive data for future use or for legal purposes
  • to recover from a distaster

NetApp developed many methods of protecting data. To use some of them you need an extra licence, some of them are the standards features of Data ONTAP.

 aggr copy

aggr copy gives up fast block copy of data stored in aggregates. Just a quick remain, all data served by NetApp are located on the aggr. With the aggr copy you can make an exact copy of existing aggregate. It means that all volumes and qtrees that are on the source aggregate will be copied as well.
You can use aggr copy to copy the aggregate within the same filer or to another filer. If the destination is on another filer make sure that rsh authentication is enabled on the source and destination.
The basic example:

filerB> aggr restrict aggr_dest
filerB> aggr copy start filerA:aggr_source filerB:aggr_dest

snapshot copy

NetApp allows you to manually or automatically create and maintain many snapshot copies. Snapshot itself doesn’t copy the data when created, but copies the data that changes between the snapshot and the current state. It means that if you have a snapshot made yesterday at 12:00 you can at any time recover files or even the whole snapshot image to the point of yesterday 12:00.
The basic example:

filerA> snap create volume_01 snapshot_0001

With the snapshot ans SnapRestore (extra license is needed) you can easily recover single file or the whole volume from snapshot.

SnapMirror

With the snapmirror you can replicate the whole volume or the selected qtree to other location (extra license is needed) . You can set SnapMirror in three modes: sync, a-sync and semi-sync. More about SnapMirror you can find in this post.

SnapVault

SnapVault is the backup feature that requires and extra license.  Within the SnapVault you can back up the entire qtree, set up different snapshot schedule on the destination. More about SnapMirror vs SnapVault you can find in this post.

vol copy

With the vol copy you can copy all data from one volume to another, either on the same or different system. Similar to aggr copy, you can initiate a volume copy with the vol copy start command. Teh result is a restricted volume containing the same data as the source volume at the time you initiated the copy opreation.

filerA> vol create vol1 aggr1 50g
filerB> vol create vol1_copy aggr1 50g
filerB> vol restrict vol1_copy
filerB> vol copy start filerA:vol1 filerB:vol1_copy
 […]
filerA> vol status -b 
Volume     Block Size   Vol Size  FS Size 
 ——      ——        ——      ——
 vol1           4096             4346752            4346752
filerB> vol status -b
Volume     Block Size   Vol Size  FS Size 
——      ——        ——      ——
vol1_copy     4096             4346752            4346752 

filerB> vol online vol1_copy

Of course that’s just a simple example.

SyncMirror

Continous mirroring of data to two separate aggregates. This features allows for real-time mirroring of data to matching aggregates physically connected to the same storage system.

RPO and RTO – Understanding the difference

Understanding the RPO and RTO helps you when you have to answer the question: How much downtime are you willing to tolerate? In worst-case-scenario how much data are you willing to loose?
 
What is RPO?

RPO – Recovery Point Objective  – it is the point in time to which systems and data must be recovered after an outage. It defines the amount of data loss that a business can endure.

How to understand that? Simple – if you take a nightly backup of your data your RPO is 24 hours, which means that in the worst case scenario you will loose 24 hours.

There are few general solutions for the RPO:

  • RPO of 24 hours – backups are created at an offsite tape library every night. The corrseponding recovery strategy is to restore data from the set of last backup tapes
  • RPO of 1 hour – shipping database logs to the remote site every hour.
  • RPO in order of minutes – mirroring data asynchronously to the remote site
  • Near zero RPO – mirroring data synchronously to a remote site

What is RTO?
 
RTO – Recovery Time Objective  – it is the time within which systems and applications must be recovered after and outage. It defines the amout of dowintime that a business can endure and survive.

There are few general solutions for the RTO:

  • RTO of 72 hours – restore from tapes available at a cold site
  • RTO of 12 hours – restore from tapes available at a hot site
  • RTO of few hours – Use of data vault at a hot site
  • RTO of a few seconds – cluster production servers with bidirectional mirroring (for example NetApp metro-cluster)

Explaination of the terms:
Data vault  – a repository at a remote site where data can be copied
Hot site – a site where an enterprise’s operations can be moved in the event of a disaster. The site has required hardware, OS, apps, network to perform business operations, and the euqipment is available and running at all times
Cold site – a site where an enteprise’s operations can be moved in the event of disaster, with mininum IT infrastructure and environmental facilities in place, but no activated

 RTO vs RPO

To understand the meaning of those two try to study this example:

When reviewing the disaster recovery plan for two data centers, you find that:

  • The copy of data at remote Site B will lag behind the production data at Site A by 5 minutes
  • It will take 2 hours after an outage at Site A to shift production to Site B. 
  • Three more hours will be needed to power up the servers, bring up the network, and redirect users to Site B.

 

What is the recovery point objective (RPO) of this plan?


What is the recovery time objective (RTO) of this plan?