Administrator Guide

13 RAPIDS Scaling on Dell EMC PowerEdge Servers

4 Results on Multi Node on C4140-M

To run RAPIDS in multi-node mode we used Dask CUDA to extend Dask distributed with GPU support.

There are different methods to set up the multi-node mode depending on the target cluster, for more

options see Dask documentation as reference [5-8]. In this case we will set up the cluster with 2 nodes:

• The primary compute node C4140-M server hosting the dask-scheduler

• Number of GPU's (workers) in primary compute node: 4

• Jupyter notebook on the primary node

• A secondary compute node C4140-M server with additional 4 GPU’s (workers)

• Total GPU's in the cluster: 8

• R740xd server hosting the NFS with the dataset

Scale Out RAPIDS on C4140-M versus C4140-M Single Node:

The server C4140-M 4xV100-SXM2-16GB in multi node was tested first with 2014-year dataset only, the

results were compared with its performance in single node to determine the workflow acceleration (with

RMM disable in both cases); in multi node mode the system speeded up around 55% faster than single

node. The main acceleration was reflected at the ETL phase, 99.5 seconds versus 50.4 seconds. See

Figure 11 below.

Figure 11. Performance on Server C4140-M in Single Node vs Multi Node

Scale Out RAPIDS on C4140-M versus R940xa Single Node:

Since the server C4140-M in multi node has total device capacity of 128GB, we compared it versus the

server R940xa in Single Node which has the same total device capacity of 128GB; in this case the

servers were configured with RMM enable. We found that although both systems have the same total

device capacity, the C4140-M multi node performed 58% faster than R940xa single node. The faster

speed-up times is based on the number of GPUs allocated i.e. C4140-M with 8x GPUs in multi-node vs

R940xa with 4x GPUs. See Figure 12.