White Papers

26 CheXNet – Inference with Nvidia T4 on Dell EMC PowerEdge R7425 | Document ID

the best possible performance; and if not, it is recommended to convert it to CHW. Overall,

CHW is generally better for GPUs, while HWC is generally better for CPUs. [6]

Build the Optimized Runtime Engine in fp16 or iInt8 mode (calibration optional for

INT8int8 inference) [15]:

//Configure the builder

builder->setMaxBatchSize(gParams.batchSize);

builder->setMaxWorkspaceSize(gParams.workspaceSize << 20);

//To run in fp16 mode

if (gParams.fp16)

{

builder->setFp16Mode(gParams.fp16);

}

//To run in Int8 mod (calibration optional for int8 inference)

if (gParams.int8)

{

builder->setInt8Mode(true);

builder->setInt8Calibrator(&calibrator);

}

//Build the engine

ICudaEngine* engine = builder->buildCudaEngine(*network);

Highlights:

• After the network has been built, it can be used as default in FP32fp32 precision mode, for

example, inputs and outputs remain in 32-bit floating point.

• Setting the builder’s fp16 mode flag enables 16-bit precision inference mode

Setting the builder flag to int8 enables int8 precision inference mode. Calibration is an additional

step required when building networks for int8. The application must provide TensorRT™ with