White Papers

Conclusions and Future Work
In this blog, we presented the inference performance in deep learning with NVIDIA® TensorRT library on
P40 and M40 GPUs. As a result, the INT8 support in P40 is about 3x faster than FP32 mode in P40 and 4.4x
faster than FP32 mode in the previous generation GPU M40. Multiple GPUs can increase the inferencing
performance linearly because of no communications and synchronizations. We also noticed that higher
batch size leads to higher inference performance and the largest batch size is only limited by GPU memory
size. In the future work, we will evaluate the inference performance with real world deep learning
applications.