Concept Guide

12 Addressing the Memory Bottleneck in AI Model – Training for Healthcare

Conclusions

In this white paper, we presented the multimodal brain tumor analysis for medical diagnosis,

highlighted the computing challenges, and presented the 3D U-Net model for the task of

volumetric image segmentation. We pre-calculated the memory requirement of the model and

analyzed 3 different server configurations with varying memory capacity: from a “dev server” with

192 GB of memory to a “memory-rich” server with over 1 TB of memory. With the memory-rich

sever, we trained the 3D U-Net model using the BraTS dataset (a medical segmentation

benchmark) and achieved close to state-of-the-art accuracy of 0.997 and dice coefficient of 0.83.

The maximum memory utilization of the model during training also corresponds to our pre-

calculated memory requirement, suggesting the generalizability of our approach to other memory-

bound deep learning algorithms.

To the best of our knowledge, the results presented in this paper represent the first milestone

in training a deep neural network having large memory footprint (close to 1 TB) on a single-node

server without hardware accelerators like GPUs. Further, by enabling Deep Neural Network

Library (DNNL) optimizations, we achieved a speedup of 3.4x per training step compared to stock

TensorFlow. By replicating the single-node, memory-rich configuration described in this paper

into a multi-node CPU cluster setup, we can expect to see greatly enhanced training performance

of the 3D U-Net model as well as that of other complex 3D models and data sets, potentially

reducing organizations TCO [13].

Acknowledgments

Center for Space High-Performance and Resilient Computing (SHREC), University of Florida

University de Montreal

NEUROMOD

Dell EMC

Intel