Concept Guide

7 Addressing the Memory Bottleneck in AI Model Training for Healthcare
3D U-Net Model
Convolutional neural networks (CNNs) such as U-Net have been widely successfully in 2D
segmentation in computer vision problems [6]. However, most medical data used in clinical
practice consists of 3D volumes. Since only 2D slices can be displayed on a computer screen,
annotating these large volumes with segmentation labels in a slice-by-slice manner is
cumbersome and inefficient. 3D U-Net [7], based on U-Net architecture, performs volumetric
segmentation by taking 3D volumes as input and processing them with corresponding 3D
operations: 3D convolutions, 3D max-pooling, 3D up-sampling, etc. The resulting output is a
trained model that reasonably generalizes well since the image slices contain mostly repetitive
structures with corresponding variation. In general, the 3D U-Net model is both computation- and
memory-intensive.
Memory Profiling
Memory footprint is as important to deep-learning training as is raw processing throughput or
Floating-Point Operations per Second (FLOPs), especially when dealing with volumetric data and
large models such as 3D U-Net. Table 1. shows the breakdown of the memory requirement of the
3D U-Net model at the largest available image size (240x240x144 in the case of the BraTS
dataset) using a kernel size of 3x3x3. As indicated, the estimated system memory requirement is
a little less than 1 TB for a batch size of 16 MRI scans. On our development server equipped with
only 192 GB of system memory (Table 2), it took only a couple of minutes after starting model
training before the system ran out of memory and the whole experiment came to a stall.
Figure 2. 3D U-Net architecture. Each box corresponds to a multi-channel feature map; the
arrows denote different operations. [8]