Fabric OS Encryption Administrator's Guide

Fabric OS Encryption Administrator’s Guide 217
53-1002159-03
Encryption group merge and split use cases
6
Adjusting heartbeat signaling values
Encryption group nodes use heartbeat signaling to communicate to one another and to their
associated key vaults. A configurable threshold of heartbeat misses determined how long an
encryption group leader will wait before declaring a member node unreachable. The default
heartbeat signaling values are three heartbeat misses, each followed by a two second heartbeat
time-out. If three consecutive heartbeats are missed (by default, a time interval of six seconds
without a heartbeat signal), the encryption group leader node declares a member node as
unreachable, resulting in an encryption group split scenario (EG split).
If the management network becomes congested or unreliable resulting in excessive auto-recovery
processing or the need for manual recovery from EG splits, it is possible to set larger heartbeat and
heartbeat time-out values to mitigate the chances of having the EG split while the network issues
are being addressed. The following commands are issued from the encryption group leader nodes
to change the heartbeat signaling values.
switch:admin->cryptocfg --set -hbmisses <number>
switch:admin->cryptocfg --set -hbtimeout <time>
Where:
NOTE
The collective time allowed (the heartbeat time-out value multiplied by the heartbeat misses) cannot
exceed 30 seconds (this is enforced by Fabric OS).
EG split possibilities requiring manual recovery
In the event the encryption group (EG) splits and is unable to auto-recover, manual intervention is
required to get your encryption group re-converged. It is important to note that while the encryption
group is in a split condition, data traffic is NOT impacted in any way. However, EG splits do impact
the control plane which means that during an EG split, modification to your encryption group
configuration will not be possible.
When an EG split occurs, communications between one or more members of the encryption group
is lost, and EG islands form. An EG island is simply a grouping of EG nodes that still have the ability
to communicate with one another. As part of the normal recovery procedure, each EG island will
select a group leader (GL) node.
<number>
Sets the number of heartbeat misses allowed in a node that is part of an
encryption group before the node is declared unreachable. This value is
set in conjunction with the time-out value. It must be configured at the
group leader node and is distributed to all member nodes in the
encryption group. The value entered specifies the number of heartbeat
misses. The default value is 3. The range is 1-15 in integer increments
only.
<time> Sets the time-out value for the heartbeat. This parameter must be
configured at the group leader node and is distributed to all member
nodes in the encryption group. The value entered specifies the heartbeat
time-out in seconds. The default value is 2 seconds. Valid values are
integers in the range between 1 and 30 seconds.