Reference Guide

7 Dell EMC Ready Solutions for AI Deep Learning with NVIDIA | v1.0
2 Solution Architecture
The hardware comprises of a cluster with a master node, compute nodes, shared storage and networks. The
master node or head node roles can include deploying the cluster of compute nodes, managing the compute
nodes, user logins and access, providing a compilation environment, and job submissions to compute nodes.
The compute nodes are the work horse and execute the submitted jobs. Software from Bright Computing called
Bright Cluster Manager is used to deploy and manage the whole cluster.
Figure 2 shows the high-level overview of the cluster which includes one head node, n compute nodes, the
local disks on the cluster head node exported over NFS, Isilon storage, and two networks. All compute nodes
are interconnected through an InfiniBand switch. The head node is also connected to the InfiniBand switch as
it uses IPoIB to export the NFS share to the compute nodes. All compute nodes and the head node are also
connected to a 1 Gigabit Ethernet management switch which is used by Bright Cluster Manager to administer
the cluster. An Isilon storage solution is connected to the FDR-40GigE Gateway switch so that it can be
accessed by the head node and all compute nodes.
Figure 2: The overview of the cluster
2.1 Head Node Configuration
The Dell EMC PowerEdge R740xd is recommended for the role of the head node. This
socket, 2U rack server that can support the memory capacities, I/O needs and network options required of the
head node. The head node will perform the cluster administration, cluster management, NFS server, user login
node and compilation node roles.
The suggested configuration of the PowerEdge R740xd is listed in Table 1. It includes 12 x 12TB NL SAS local
disks that are formatted as an XFS file system and exported via NFS to the compute nodes over IPoIB. RAID
50 is used instead of RAID6/RAID60 to take into consideration faster rebuild time and capacity advantages
provided by the former. Details of each configuration choice are described in the following sections. For more
information on this server model please refer to PowerEdge R740/740xd Technical Guide.