Concept Guide
12 Addressing the Memory Bottleneck in AI Model – Training for Healthcare
Conclusions
In this white paper, we presented the multimodal brain tumor analysis for medical diagnosis,
highlighted the computing challenges, and presented the 3D U-Net model for the task of
volumetric image segmentation. We pre-calculated the memory requirement of the model and
analyzed 3 different server configurations with varying memory capacity: from a “dev server” with
192 GB of memory to a “memory-rich” server with over 1 TB of memory. With the memory-rich
sever, we trained the 3D U-Net model using the BraTS dataset (a medical segmentation
benchmark) and achieved close to state-of-the-art accuracy of 0.997 and dice coefficient of 0.83.
The maximum memory utilization of the model during training also corresponds to our pre-
calculated memory requirement, suggesting the generalizability of our approach to other memory-
bound deep learning algorithms.
To the best of our knowledge, the results presented in this paper represent the first milestone
in training a deep neural network having large memory footprint (close to 1 TB) on a single-node
server without hardware accelerators like GPUs. Further, by enabling Deep Neural Network
Library (DNNL) optimizations, we achieved a speedup of 3.4x per training step compared to stock
TensorFlow. By replicating the single-node, memory-rich configuration described in this paper
into a multi-node CPU cluster setup, we can expect to see greatly enhanced training performance
of the 3D U-Net model as well as that of other complex 3D models and data sets, potentially
reducing organizations TCO [13].
Acknowledgments
Center for Space High-Performance and Resilient Computing (SHREC), University of Florida
University de Montreal
NEUROMOD
Dell EMC
Intel