Storage theory

Storage Performance – IO characteristics

What is IO

When it comes to performance issues the term you hear really often is IO. IO is a shortcut for input/output and it is basically communication between storage array and the host. Inputs are the data received by the array, and outputs are the data sent from it. To analyze and tune performance, you must understand the workload that an application and/or host is generating. Application workloads have IO characteristics. Those characteristics may be described in terms of:

  • IO size
  • IO access pattern
  • thread count
  • read/write ratio

In this post I would like to shortly go through those characteristics, because many people understands IO only as a “number of operations”,  without the awareness of what this number actually means and how to understand it. Very often when people are talking about IO, what they  might mean is IOPS – which is basically number of IOs per second. But to talk about IOPS and understand the number – you first have to understand and consider IO characteristics.

 IO size

IO request size has an affect on performance throughput – in general, the larger the IO size, the higher the storage bandwidth. What is very often overlooked is that in most cases tools are showing the average IO size. Bare in mind that most production workloads have a mix of IO sizes.

IO access pattern

In terms of access patterns we very often use terms:

  • random read/write – there is no sequential activity at all (near zero), the read and writes are purely random, almost impossible to boost performance with cache.
  • sequential read/write – the exact opposite – purely sequential access, with no randomness at all. In such environments using storaga cache can really boost the performance.

In the real word you will almost never find 100% random or 100% sequential data access.  Even in sequential environment there might be number of different sequential activities at the same time, which actually build randomness when switching between them.

IO access pattern may relate to IO size, with larger IO size (like 64kB etc) you most often deal with more sequential data access, whereas with small IO size (8kB for example) the access is most often random

Thread count

How many different activities are going on in the same time. If you have a single threaded IO access patterns, it means that host sends a write to a storage system, and waits for the acknowledge from the storage system that the write has completed. And once its completed it will send next write and so on. In real word most applications will produce multiple threads at the same time. They don’t have to wait for single response to send another request. If we go deeper here – one disk can do only one operation at a time. If multiple IO are targeted to the same disk we have a queue. Using queues storage system can optimize the way it works. The worst type of performance is when the tread count is 1 – how can you optimize single thread?

Read/write ratio

Writes are most expensive (performance wise) then reads. Mostly because the system has to determine where to put new chunk of data. Once it decides where to place the data the write itself is time consuming due to RAID write penalty (Of course it depends on the RAID level placed underneath).  With enterprise storage system it actually changes – very often writes are faster, because they are placed into cache, and then the acknowledge to the host is send (later on writes from cache to actual hard drives are send in optimized way by storage system). Whereas reads – especially random reads, are very often un-cached and have to be fetched from hard drives.

Read/Write ratio is really important in terms of replication. Obviously only writes are being replicated, reads are not.


Workload IO and (some of the most popular) RAID Types

RAID 1/0

  • Best for small random write-intensive workloads.  The RAID penalty with RAID 1/0 is only 2. That’s the lowest of all the RAID types (that actually give you some kind of failure protection)


  • Good mix of performance and protection. But the RAID penalty is 4, so  the performance is much worst with small random writes.
  • Best with client write IO size of 64 KB or lager – full stripe writes can really boost the performance, you can write the entire stripe without a need of reading the parity information – which of course lower the RAID penalty.
  • Best practice is to use RAID 5 for workloads that have random writes of 30% or less.


  • More protection due to two parity disks, but at the same time higher RAID penalty – which is 6.
  • Very often used with NL-SAS drives (or SATA drives) which are the cheapest in terms of TB/$, but are the slowest.

Fibre Channel addressing

What is Fabric?

Fabric can be explained as a number of switching elements connected together. As I mentioned in part 1 Fabric can be just one switch, or bunch of interconnected switches (domains). One Fibre Channel Switch = one Fabric Domain.




All devices in Fibre Channel network have an identity.

Worldwide name (WWN)

All FC devices have a unique identity that is called a worldwide name (WWN). This identification can be compared with Ethernet cards and MAC addresses. Each Node Port has its own WWN, but it is also possible for a device with more than one Fibre Channel adapted to have its own WWN as well. WWN is 64-bit address, and if two WWN addresses are put into the frame header, this leave 16 bytes of data just for identyfying the destination and source address. So 64-bit addresses can affect routing performance. Each device in the SAN is identified by a unique WWN. The WWN contains vendor identifier field, which is defined and maintened by the IEEE.

There are two WWN addressing scheme, old one starts with 10:00 followed by company ID and vendor specific info. New scheme start with hex 5 or 6 in the first half-byte followed by the vendor intenfier in the next 3 bytes:

WWN addressing scheme

WWN addressing scheme

Vendor ID and company ID are assigned by the IEEE standards, and can be found here:


A worldwide node name (WWNN) is a globally unique 64-bit identifier that is assigned to each Fibre Channel node or device. For servers and hosts, WWNN is unique for each HBA (host bus adapter), and in a case of a server with two HBAs, they have two WWNNs. For a SAN switch, the WWNN is a common for the chassis. For storage, the WWNN is common for each controller unit of midrage storage. In case of high-end enterprise storage, the WWNN is unique for the entire array.

A worldwide port number (WWPN) is unique identifier for each FC port of any Fibre Channel device. For server, we have a WWPN for each port of the HBA. For a SAN switch, the WWPN is available for each port in the chassis. For Storage, each host port has an individual WWPN.

Port address

Additionally to WWNs there is another addressing scheme that is used in Fibre Channel networks. This scheme is used to address ports in the switched fabric. Each port in the switched fabric has its own unique 24-bit address.

24-bit Port Addressing scheme

24-bit Port Addressing scheme

Those 24-bit addressess are used strictly for routing processes. It contains Domain ID, Area ID and Node ID.

  • Domain

This byte is the address of the switch itself. A domain ID is a unique number that identifies the switch or director to a fabric. One byte allows up to 256 possible addresses. Because some of these addresses are reserved, as for the one for broadcast, there are only 239 addresses available. This number means that you can theoretically have as many as 239 switches in your SAN environment.

  • Area

The area field provides 256 addresses. This part of the address is used to identify the individual ports. Hence, to have more than 256 ports in one switch in a director class of switches, you must follow the shared area addressing.

  • Port / Node

The final part of the address provides 256 addresses for identifying attached N_Ports and NL_Ports.

Introduction to Fibre Channel

What is Fibre Channel

In few words Fibre Channel combines the best of two worlds. It’s a channel transport that shares best of the characteristics of an I/OI bus (like SCSI), as a consequence hosts and applications see the disk devices as locally attached storage. But Fibre Channel also incorporates the best of the network as FC allows multiple protocol support, such as SCSI, IP, FICON.

At the beginning, when storage was connected directly to servers, it was a good solution due to high-speed channel from server to storage (most often SCSI Bus). The biggest limitaion was 15 devices for BUS. The limitation also was with sharing the data, clustering etc.

Parallel SCSI Bus

Parallel SCSI Bus

Due these limitations there was a need for network flexibility, keeping up the channel performance, as a block level data. These are the origins of the Storage Area Network (SAN). The dominant technology used in modern SANs is Fiber Channel. Biggest benefits of FC are:

  • Speed up to 16 Gbit/sec
  • Initiator negotiates for access before transmitting to the target, which gives channel-like access to the target
  • all SCSI commands and user data is sent over 212 bye Fibre Channel payload frames

Fabric => SAN

SAN and Fabric

SAN and Fabric

Fabric is a collection of Fibre Channel switches, Directors, and connected devices, such as server hosts and storage. Fabric is an most popular implementation of SAN.

SAN Components

SAN provides any-to-any connectivity in the fabric. SANs employ fiber optic and copper connections to create dedicated networks for servers and their storage systems. To do so, there are couple of components needed:

  • Fibre Channel Switches and/or Directors – note, the smalles Fabric is just one switch
  • HBAs (Host bus Adapters) – HBAs are similar to NICs (Network Interface Card). There are used to connect devices to Fabre Channel Switches. They replaced SCSI controllers.


Host Bus Adapter

Host Bus Adapter

  • Storage System
  • Optical cables
  • Management Software
  • Tape Drives – for backup purposes

In couple of next entries I will go a little deeper into SAN, I will explain more in detail what Fabric is, how the Fabric can be scalled-out, what are the different ports in the Fabric, etc.