White Papers

Conclusions and Future Work

In this blog, we presented the inference performance in deep learning with NVIDIA® TensorRT library on

P40 and M40 GPUs. As a result, the INT8 support in P40 is about 3x faster than FP32 mode in P40 and 4.4x

faster than FP32 mode in the previous generation GPU M40. Multiple GPUs can increase the inferencing

performance linearly because of no communications and synchronizations. We also noticed that higher

batch size leads to higher inference performance and the largest batch size is only limited by GPU memory

size. In the future work, we will evaluate the inference performance with real world deep learning

applications.