White Papers

26 CheXNet Inference with Nvidia T4 on Dell EMC PowerEdge R7425 | Document ID
the best possible performance; and if not, it is recommended to convert it to CHW. Overall,
CHW is generally better for GPUs, while HWC is generally better for CPUs. [6]
Build the Optimized Runtime Engine in fp16 or iInt8 mode (calibration optional for
INT8int8 inference) [15]:
//Configure the builder
builder->setMaxBatchSize(gParams.batchSize);
builder->setMaxWorkspaceSize(gParams.workspaceSize << 20);
//To run in fp16 mode
if (gParams.fp16)
{
builder->setFp16Mode(gParams.fp16);
}
//To run in Int8 mod (calibration optional for int8 inference)
if (gParams.int8)
{
builder->setInt8Mode(true);
builder->setInt8Calibrator(&calibrator);
}
//Build the engine
ICudaEngine* engine = builder->buildCudaEngine(*network);
Highlights:
After the network has been built, it can be used as default in FP32fp32 precision mode, for
example, inputs and outputs remain in 32-bit floating point.
Setting the builder’s fp16 mode flag enables 16-bit precision inference mode
Setting the builder flag to int8 enables int8 precision inference mode. Calibration is an additional
step required when building networks for int8. The application must provide TensorRT™ with