Users Guide

Table Of Contents
GPU has to be in ready state before the command fetches the data. GPUStatus field in Inventory shows the availability of the
GPU and whether GPU device is responding or not. If the GPU status is ready, GPUStatus shows OK, otherwise the status
shows Unavailable.
The GPU offers multiple health parameters which can be pulled through the SMBPB interface of the NVIDIA controllers. This
feature is limited only to NVIDIA cards. Following are the health parameters retrieved from the GPU device:
Power
Temperature
Thermal
NOTE: This feature is only limited to NVIDIA cards. This information is not available for any other GPU that the server may
support. The interval for polling the GPU cards over the PBI is 5 seconds.
The host system must have the NVIDIA driver installed and running for the Power consumption, GPU target temperature,
Min GPU slowdown temperature, GPU shutdown temperature, Max memory operating temperature, and Max GPU operating
temperature features to be available. These values are shown as N/A if the GPU driver is not installed.
In Linux, when the card is unused, the driver down-trains the card and unloads in order to save power. In such cases, the
Power consumption, GPU target temperature, Min GPU slowdown temperature, GPU shutdown temperature, Max memory
operating temperature, Max memory operating temperature, and Max GPU operating temperature features are not available.
Persistent mode should be enabled for the device to avoid unload. You can use nvidia-smi tool to enable this using the command
nvidia-smi -pm 1.
You can generate GPU reports using Telemetry. For more information on telemetry feature, see Telemetry Streaming on page 206
NOTE: In Racadm, You may see dummy GPU entries with empty values. This may happen if device is not ready to respond
when iDRAC queries the GPU device for the information. Perform iDRAC racrest operation to resolve this issue.
FPGA Monitoring
Field-programmable Gate Array (FPGA) devices needs real-time temperature sensor monitoring as it generates significant heat
when in use. Perform the following steps to get FPGA inventory information:
Power off the server.
Install FPGA device on the riser card.
Power on the server.
Wait until POST is complete.
Login to iDRAC GUI.
Navigate to System > Overview > Accelerators. You can see both GPU and FPGA sections.
Expand the specific FPGA component to see the following sensor information:
Power consumption
Temperature details
NOTE: You must have iDRAC Login privilege to access FPGA information.
NOTE: Power consumption sensors are available only for the supported FPGA cards and is available only with Datacenter
license.
Checking the system for Fresh Air compliance
Fresh Air cooling directly uses outside air to cool systems in the data center. Fresh Air compliant systems can operate above its
normal ambient operating range (temperatures up to 113 °F (45 °C)).
NOTE:
Some servers or certain configurations of a server may not be Fresh Air compliant. See the specific server manual
for details related to Fresh Air compliance or contact Dell for more details.
To check the system for Fresh Air compliance:
1. In the iDRAC Web interface, go to System > Overview > Cooling > Temperature overview.
The Temperature overview page is displayed.
2. See the Fresh Air section that indicates whether the server is fresh air compliant or not.
Viewing iDRAC and managed system information
119