Service Manual

RDMA Over Converged Ethernet (RoCE) Overview
This functionality is supported on the platform.
RDMA is a technology that a virtual machine (VM) uses to directly transfer information to the memory of another VM, thus enabling
VMs to be connected to storage networks. With RoCE, RDMA enables data to be forwarded without passing through the CPU and
the main memory path of TCP/IP. In a deployment that contains both the RoCE network and the normal IP network on two dierent
networks, RRoCE combines the RoCE and the IP networks and sends the RoCE frames over the IP network. This method of
transmission, called RRoCE, results in the encapsulation of RoCE packets to IP packets. RRoCE sends Inni Band (IB) packets over
IP. IB supports input and output connectivity for the internet infrastructure. Inni Band enables the expansion of network topologies
over large geographical boundaries and the creation of next-generation I/O interconnect standards in servers.
When a storage area network (SAN) is connected over an IP network, the following conditions must be satised:
Faster Connectivity: QoS for RRoCE enables faster and lossless nature of disk input and output services.
Lossless connectivity: VMs require the connectivity to the storage network to be lossless always. When a planned upgrade of the
network nodes happens, especially with top-of-rack (ToR) nodes where there is a single point of failure for the VMs, disk I/O
operations are expected to occur in 20 seconds. If disk in not accessible in 20 seconds, unexpected and undened behavior of
the VMs occurs. You can optimize the booting time of the ToR nodes that experience a single point of failure to reduce the
outage in trac-handling operations.
RRoCE is bursty and uses the entire 10-Gigabit Ethernet interface. Although RRoCE and normal data trac are propagated in
separate network portions, it may be necessary in certain topologies to combine both the RRoCE and the data trac in a single
network structure. RRoCE trac is marked with dot1p priorities 3 and 4 (code points 011 and 100, respectively) and these queues are
strict and lossless. DSCP code points are not tagged for RRoCE. Both ECN and PFC are enabled for RRoCE trac. For normal IP or
data trac that is not RRoCE-enabled, the packets comprise TCP and UDP packets and they can be marked with DSCP code
points. Multicast is not supported in that network.
RRoCE packets are received and transmitted on specic interfaces called lite-subinterfaces. These interfaces are similar to the
normal Layer 3 physical interfaces except for the extra provisioning that they oer to enable the VLAN ID for encapsulation.
You can congure a physical interface or a Layer 3 Port Channel interface as a lite subinterface. When you congure a lite
subinterface, only tagged IP packets with VLAN encapsulation are processed and routed. All other data packets are discarded.
A normal Layer 3 physical interface processes only untagged packets and makes routing decisions based on the default Layer 3
VLAN ID (4095).
To enable routing of RRoCE packets, the VLAN ID is mapped to the default VLAN ID of 4095 using VLAN translation. After the
VLAN translation, the RRoCE packets are processed in the same way as normal IP packets that a Layer 3 interface receives and
routes in the egress direction. At the egress interface, the VLAN ID is appended to the packet and transmitted out of the interface as
a tagged packet with the dot1Q value preserved.
To provide lossless service for RRoCE, the QoS service policy must be congured in the ingress and egress directions on lite sub
interfaces.
Preserving 802.1Q VLAN Tag Value for Lite Subinterfaces
This functionality is supported on the platform.
All the frames in a Layer 2 VLAN are identied using a tag dened in the IEEE 802.1Q standard to determine the VLAN to which the
frames or trac are relevant or associated. Such frames are encapsulated with the 802.1Q tags. If a single VLAN is congured in a
network topology, all the trac packets contain the same do1q tag, which is the tag value of the 802.1Q header. If a VLAN is split
into multiple, dierent sub-VLANs, each VLAN is denoted by a unique 8021.Q tag to enable the nodes that receive the trac frames
determine the VLAN for which the frames are destined.
290
Flex Hash and Optimized Boot-Up