Administrator Guide

Terminology

19 RAPIDS Scaling on Dell EMC PowerEdge Servers

B Terminology

RAPIDS: Suite of software libraries, built on CUDA-X AI, that gives the freedom to execute end-to-end

data science and analytics pipelines entirely on GPUs

End to End workflow: Data science pipeline that includes the three phases of ETL (Extract, Transform,

Load), data conversion, and training

Dask: Open source freely available that provides advanced parallelism for analytics. It is developed in

coordination with other community projects like Numpy, Pandas, and Scikit-Learn

XGBoost: Open-source software library which provides a gradient boosting framework for C++, Java,

Python, R, and Julia. It works on Linux, Windows, and macOS

Docker mounted volume: An existing directory on the host that is “mounted” to be available inside the

container, useful for sharing files between the host and the container

Cluster: Group of computers communicating through fast interconnection

Node: Group of processors communicating through shared memory

Socket: Group of cores communicating through shared cache

Core: Group of functional units communicating through registers

Pipeline: Sequence of instructions sharing functional units

Threads: The smallest sequence of programmed instructions that can be managed independently by a

scheduler, which is typically a part of the operating system

Dask-scheduler: Coordinates and execute the task graphs on parallel hardware

Dask-worker: Computes tasks as directed by the schedules, stores and serves computed results to other

workers or clients

Dask-cuda: Allows deployment and management of Dask workers on CUDA-enabled systems

Diagnostic dashboard: Interactive dashboard containing several plots and tables with live information

about task runtimes, communication, statistical profiling, load balancing, memory use, and so on.

Bokeh: Interactive visualization library that targets modern web browsers for presentation