NetApp cDOT (ONTAP 9) – RDB and Cluster Replication Ring

In my last entry I have written about quorum and epsilon. In this article, I have explained that a Ontap Cluster always have a master node – often called RDB Master.  In this entry I would like to explain what RDB is. This article is valid for both NetApp cDOT (ONTAP 8.x) as well as ONTAP 9.x

RDB

A Replicated Database (often referred as RDB) is a basis of ONTAP Cluster. This database contains cluster configuration information divided into RDB units. The RDB does not contain any user data, instead it provides a single point of management of all nodes within the cluster.

RDB is devided into 5 RDB units:

  • mgwd (Management Gateway) – this unit contains info that provides access to CLI (Command Line). It also enables management of the cluster from any node.
  • vifmgr – VIF (Virtaul Interface(s)) Manager.  This unit stores and monitor LIF configuration, as well as LIF failover policies. It is also responsible for automatic and manual LIF failover
  • vldb – Volume Location Database. The VLDB tracks where the volumes and aggregates are across the cluster. It is and index of which aggregates owns a volume, and also, which host owns the aggregate.
  • bcomd – Block Configuration and Operations Management. This unit stores LUN map definitions, and igroups (initiator groups) configuration.
  • crs – Confiugration Replication Services. This unit is related to MetroCluster configuration. It is responsible for being able to replicate confiugration and operation data between clusters. This unit is available from ONTAP 8.3 family.

Cluster Replication Ring

The RDB units contain data necessary to manage the cluster. These databases are replicated to each node. And each copy is always synchronized across all nodes within the cluster. RDB database reads are performed locally, buut  and RDB write is only performed only on RDB Master  (master node). Each write is guaranteed to be updated on each replication of RDB. If, for some reason, the replication of new data is not successful, the write gets rolled back on all instances of the ring.

Each RDB unit has its own ring. An RDB ring is a total of all RDB units of each type across all nodes in the cluster. For instance a 6 node cluster – 6 vifmgr units make the vifmgr ring. Each of the RDB rings elects a master. And each RDB unit’s ring can independently chose the ring master. When data is written to a unit, it is written to the database on the master, then it is immediately replicated to the secondary databases on other nodes. Only when the write is successfully replicated to all databases instances, the write is officially confirmed.

To verify the status of cluster replication ring from Command Line:

cluster1::> set advanced

Warning: These advanced commands are potentially dangerous; use them only when directed to do so by NetApp personnel.
Do you want to continue? {y|n}: y

cluster1::*> cluster ring show
Node      UnitName Epoch    DB Epoch DB Trnxs Master    Online
--------- -------- -------- -------- -------- --------- ---------
cluster1_03 mgmt   4        4        148638   cluster1_04 secondary
cluster1_03 vldb   2        2        226467   cluster1_04 secondary
cluster1_03 vifmgr 4        4        8065     cluster1_04 secondary
cluster1_03 bcomd  2        2        13       cluster1_04 secondary
cluster1_03 crs    2        2        12       cluster1_04 secondary
cluster1_04 mgmt   4        4        148638   cluster1_04 master
cluster1_04 vldb   2        2        226467   cluster1_04 master
cluster1_04 vifmgr 4        4        8065     cluster1_04 master
cluster1_04 bcomd  2        2        13       cluster1_04 master
cluster1_04 crs    2        2        12       cluster1_04 master
cluster1_05 mgmt   4        4        148638   cluster1_04 secondary
cluster1_05 vldb   2        2        226467   cluster1_04 secondary
cluster1_05 vifmgr 4        4        8065     cluster1_04 secondary
cluster1_05 bcomd  2        2        13       cluster1_04 secondary
cluster1_05 crs    2        2        12       cluster1_04 secondary
cluster1_06 mgmt   4        4        148638   cluster1_04 secondary
cluster1_06 vldb   2        2        226467   cluster1_04 secondary
cluster1_06 vifmgr 4        4        8065     cluster1_04 secondary
cluster1_06 bcomd  2        2        13       cluster1_04 secondary
cluster1_06 crs    2        2        12       cluster1_04 secondary
20 entries were displayed.

cluster1::*>

In above example you can notice 4-node cluster, all 5 RDB units have they own rings, for all of them the master is node cluster1_04. To make it a bit more read-friendly I have bolded the mgmt ring.

DB Trnxs column present DB transaction number for each application. When there is a write to the specific database (RDB unit), the DB Trnxs is updated. You can notice that for each UnitName (each node) DB Trnxs has exact same value on all nodes.

If RDB Unit daemon is restarted, the Epoch number will be incremented, and DB Trnxs value will be set to 1. RDB applications can be restarted for many reasons, for example a node reboot.

Leave a Reply

Your email address will not be published. Required fields are marked *