Administrator Guide

13 RAPIDS Scaling on Dell EMC PowerEdge Servers
4 Results on Multi Node on C4140-M
To run RAPIDS in multi-node mode we used Dask CUDA to extend Dask distributed with GPU support.
There are different methods to set up the multi-node mode depending on the target cluster, for more
options see Dask documentation as reference [5-8]. In this case we will set up the cluster with 2 nodes:
The primary compute node C4140-M server hosting the dask-scheduler
Number of GPU's (workers) in primary compute node: 4
Jupyter notebook on the primary node
A secondary compute node C4140-M server with additional 4 GPU’s (workers)
Total GPU's in the cluster: 8
R740xd server hosting the NFS with the dataset
Scale Out RAPIDS on C4140-M versus C4140-M Single Node:
The server C4140-M 4xV100-SXM2-16GB in multi node was tested first with 2014-year dataset only, the
results were compared with its performance in single node to determine the workflow acceleration (with
RMM disable in both cases); in multi node mode the system speeded up around 55% faster than single
node. The main acceleration was reflected at the ETL phase, 99.5 seconds versus 50.4 seconds. See
Figure 11 below.
Figure 11. Performance on Server C4140-M in Single Node vs Multi Node
Scale Out RAPIDS on C4140-M versus R940xa Single Node:
Since the server C4140-M in multi node has total device capacity of 128GB, we compared it versus the
server R940xa in Single Node which has the same total device capacity of 128GB; in this case the
servers were configured with RMM enable. We found that although both systems have the same total
device capacity, the C4140-M multi node performed 58% faster than R940xa single node. The faster
speed-up times is based on the number of GPUs allocated i.e. C4140-M with 8x GPUs in multi-node vs
R940xa with 4x GPUs. See Figure 12.