Storage Performance – what to measure?

When you talk about performance you need to have some numbers. Many people are confused here and just talk about “wee need more IOPS” without even understand what those are. If you wish to analyze your environment performance you have to know what to measure. In this short post I would like to describe some of the definitions.

Bandwidth

The amount of data that is sent transferred along a channel at a give amount of time. Very often measured in MB/s  (megabytes per second). Here you are not concered with how much IOPS you are dealing with, but how much data is being send in the fixed amount of time.

Throughput

It’s a number of IO operations that are processed per second over a period of time (IOPS). You are not considered of how much data is being moved, but of the number of  transactions.

Bandwidth vs Throughput

So, which characteristic is more important? It all depends what are you dealing with. If your workload is more or less sequential the bandwidth is much more crucial.  If your workload is more random and IO size is small much more important is throughput. Of course , it’s never black-or-white type of situation, but in most cases you should be aware what is more important for you – bandwidth or throughput

Response Time

In simple words it’s the interval of time between submitting a command (request), and receiving a response. Normally measured in miliseconds (ms).

Average Seek Distance

Amount of data that the disk transverses during a seek. The larger the distance, the more random the IO. Longer seek distances result in a longer seek times and therefore higher response times. Avg seek distance is measured usually in GB. It is measured in GB, not in tracks, because different disk manufactures might have different design of a hard drives, therefore the numbers wouldn’t be consistent, when using drives from different vendors in one storage system. With the avg seek distance you can check how much randomness there is on a disk level. But remember, randomness on disk level is not the same as randomness on the LUN level, one disk, being a part of a RAID group (or/and pool) can host many different LUNs and you have to consider that.

Queue Length

Number of request within a certain time interval, that are waiting to be served by the component. Recognizing a queue length is crucial, because, my optimizing it, you can resolve many of your performance issues.

Utilization

Utilization measures the fraction of time that a device is busy serving requests, usually reported as a percentage busy.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *