White Papers

24 CheXNet – Inference with Nvidia T4 on Dell EMC PowerEdge R7425 | Document ID

Files used for development:

Script:

tensorrt_chexnet.py

Base model script:

tensorrt.py

Labels file

labellist_chest_x_ray.json

4.2 TensorRT™ using TensorRT C++ API

In this section, we present how to run optimized inferences with an existing TensorFlow

model using TensorRT C++ API. The first step is to convert the frozen graph model to uff

file format with the C++ UFF parser API which supports TensorFlow models, then follow the

workflow in the Figure 9 to create the TensorRT™ engine for optimized inferences:

• Create a TensorRT™ network definition from the existing trained model

• Invoke the TensorRT™ builder to create an optimized runtime engine from the network

• Serialize and deserialize the engine so that it can be rapidly recreated at runtime

• Feed the engine with data to perform inference

For the current implementation, we are using Nvidia script trtexec.cpp and referenced the

TensorRT™ Developer Guide to document the steps described below [15].

Figure 9: Workflow for Creating a TensorRT Inference Graph using the TensorRT C++ API