White Papers

24 CheXNet Inference with Nvidia T4 on Dell EMC PowerEdge R7425 | Document ID
Files used for development:
Script:
tensorrt_chexnet.py
Base model script:
tensorrt.py
Labels file
labellist_chest_x_ray.json
4.2 TensorRT™ using TensorRT C++ API
In this section, we present how to run optimized inferences with an existing TensorFlow
model using TensorRT C++ API. The first step is to convert the frozen graph model to uff
file format with the C++ UFF parser API which supports TensorFlow models, then follow the
workflow in the Figure 9 to create the TensorRT™ engine for optimized inferences:
Create a TensorRT™ network definition from the existing trained model
Invoke the TensorRT™ builder to create an optimized runtime engine from the network
Serialize and deserialize the engine so that it can be rapidly recreated at runtime
Feed the engine with data to perform inference
For the current implementation, we are using Nvidia script trtexec.cpp and referenced the
TensorRT™ Developer Guide to document the steps described below [15].
Figure 9: Workflow for Creating a TensorRT Inference Graph using the TensorRT C++ API