Technical White Paper Dell EMC Ready Solutions for HPC Digital Manufacturing with AMD EPYC™ Processors— Altair Performance Abstract This Dell EMC technical white paper discusses performance benchmarking results and analysis for Altair HyperWorks™ on Dell EMC Ready Solutions for HPC Digital Manufacturing with AMD EPYC™ processors.
Revisions Revisions Date Description November 2019 Initial release with AMD EPYC™ 7002 series processors Acknowledgements This paper was produced by the following: Authors: Joshua Weage Martin Feyereisen The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose.
Table of contents Table of contents Revisions.............................................................................................................................................................................2 Acknowledgements .............................................................................................................................................................2 Table of contents .................................................................................................
Introduction 1 Introduction This technical white paper discusses the performance of the Altair HyperWorks™ products, Altair Radioss™, and Altair AcuSolve™ on Dell EMC Ready Solutions for HPC Digital Manufacturing with AMD EPYC™ processors. Dell EMC Ready Solutions for HPC were designed and configured specifically for Digital Manufacturing workloads, where Computer Aided Engineering (CAE) applications are critical for virtual product development.
System Building Blocks 2 System Building Blocks The Dell EMC Ready Solutions for HPC Digital Manufacturing are designed using preconfigured building blocks. The building block architecture allows an HPC system to be optimally designed for specific end-user requirements, while still making use of standardized, domain-specific system recommendations. The available building blocks are infrastructure servers, storage, networking, and compute building blocks.
System Building Blocks A recommended base configuration for infrastructure servers is: • • • • • • • • Dell EMC PowerEdge R6515 server AMD EPYC 7302P processor 128GB of RAM (8 x 16GB 3200 MTps DIMMs) PERC H335 RAID controller 2 x 480GB Mixed-Use SATA SSD RAID 1 Dell EMC iDRAC9 Enterprise 2 x 550W Power Supplies Mellanox ConnectX-6 InfiniBandTM HCA (optional) The recommended base configuration for the infrastructure server is described as follows. The PowerEdge R6515 server is suited for this role.
System Building Blocks Table 1 Recommended Configurations for the Compute Building Block Platforms Dell EMC PowerEdge R6525 Dell EMC PowerEdge C6525 Processor Options Dual AMD EPYC 7302 (16 cores per socket) Dual AMD EPYC 7402 (24 cores per socket) Dual AMD EPYC 7452 (32 cores per socket) Dual AMD EPYC 7502 (32 cores per socket) Dual AMD EPYC 7552 (48 cores per socket) Dual AMD EPYC 7702 (64 cores per socket) Memory Options 256 GB (16 x 16GB 3200 MTps DIMMs) 512 GB (16 x 32GB 3200 MTps DIMMs) Storage
System Building Blocks Additionally, two BBB’s can be directly coupled together via a high-speed network cable, such as InfiniBand or Ethernet, without need of an additional high-speed switch if additional compute capability is required for each simulation run (HPC Couplet).
System Building Blocks Operational storage is typically sized based on the number of expected users. For fewer than 30 users, a single storage server, such as the Dell PowerEdge R7515 is often an appropriate choice. A suitably equipped storage server may be: • • • • • • • • • • Dell EMC PowerEdge R7515 server AMD EPYC 7302P processor 128GB of memory, 8 x 16GB 3200 MTps DIMMs PERC H745 RAID controller 2 x 240GB Mixed-use SATA SSD in RAID-1 (For OS) 12 x 12TB 3.
System Building Blocks For customers desiring a shared high-performance parallel filesystem, the Dell EMC Ready Solutions for HPC Lustre Storage solution shown in Figure 3 are appropriate. These solutions can scale up to multiple petabytes of storage. Figure 3 Dell EMC Ready Solutions for Lustre Storage Reference Architecture 2.5 System Networks Most HPC systems are configured with two networks—an administration network and a high-speed/lowlatency switched fabric.
System Building Blocks 2.7 Services and Support The Dell EMC Ready Solutions for HPC Digital Manufacturing are available with full hardware support and deployment services, including additional HPC system support options. 2.8 Workload Management Workload management and job scheduling on the Dell EMC Ready Solutions for HPC Digital Manufacturing can be handled efficiently with Altair PBS Professional™, part of the Altair PBS Works™ suite.
Reference System 3 Reference System Performance benchmarking was performed in the Dell EMC HPC and AI Innovation Lab using system configurations as listed in Table 2.
Reference System The software versions used for the benchmarks are listed in Table 4. Table 4 Software Versions 13 Component Version Operating System RedHat Enterprise Linux 7.6 Kernel 3.10.0-957.27.2.el7.x86_64 OFED Mellanox 4.6-1.0.1.1 Bright Cluster Manager 8.
Altair AcuSolve Performance 4 Altair AcuSolve Performance Altair AcuSolve is a Computational Fluid Dynamics (CFD) tool commonly used across a very wide range of CFD and multi-physics applications. AcuSolve is a robust solver with proprietary numerical methods that yield stable simulations and accurate results regardless of the quality and topology of mesh elements.
Altair AcuSolve Performance Performance Relative to 64 Cores Figure 5: AcuSolve Parallel Scaling 8.0 Riser Windmill Nozzle 4.0 2.0 1.0 64(1) 128(2) 256(4) 512(8) Number of Cores (Number of Nodes) These benchmarks were carried out on a cluster of eight servers, each with dual 7452 processors. The results are presented in relative performance compared with the single node results.
Altair AcuSolve Performance Performance relative to one node (64 cores) Figure 6: AcuSolve Hybrid Parallel Scaling 8.00 R-1 W-1 N-1 R-2 W-2 N-2 R-4 W-4 N-4 R-8 W-8 N-8 4.00 2.00 1.00 64(1) 128(2) 256(4) 512(8) Number of cores (number of nodes) Again, the Riser(R) model shows initial better overall parallel scaling than the larger Windmill(W) and Nozzle(N) models, primarily from cache effects. All models display similar behavior when the number of shared memory threads is varied.
Altair Radioss Performance 5 Altair Radioss Performance Altair Radioss is a leading structural analysis solver for highly non-linear problems under dynamic loadings. It is used across all industries worldwide to improve the crashworthiness, safety, and manufacturability of structural designs. Radioss is similar to AcuSolve in that it typically scales well across multiple processor cores and servers, has modest memory capacity requirements, and performs minimal disk I/O while in the solver section.
Altair Radioss Performance Performance Relative to sinlge node Figure 8: Radioss Parallel Performance 8.0 Neon Taurus 4.0 2.0 1.0 64(1) 128(2) 256(4) 512(8) Number of Cores (Number of Nodes) These benchmarks were carried out on a cluster of eight servers, each with dual 7452 processors. The results are presented in relative performance compared with the single node results. The parallel speedup for the large Taurus model is nearly linear up to eight nodes (512 cores).
Altair Radioss Performance Here, the results are significantly different than the results obtained with the hybrid parallel version of AcuSolve. For the larger Taurus (T) model, the best performance was always obtained using a single shared memory thread (the same as non-hybrid distributed memory MPI). For the smaller Neon model, there was a small benefit using more than one thread at four or more nodes.
Conclusion 6 Conclusion This technical white paper presents the Dell EMC Ready Solutions for HPC Digital Manufacturing with AMD EPYC 7002 Series processors. The detailed analysis of the building block configurations demonstrate that the system is architected for a specific purpose—to provide a comprehensive HPC solution for the manufacturing domain. Use of this building block approach allows customers to easily deploy an HPC system optimized for their specific workload requirements.