Technical White Paper Dell EMC Ready Solution for HPC Digital Manufacturing—Dassault Systѐmes’ Simulia Abaqus Performance Abstract This Dell EMC technical white paper discusses performance benchmarking results and analysis for Simulia Abaqus on the Dell EMC Ready Solution for HPC Digital Manufacturing.
Revisions Revisions Date Description January 2018 Initial release with Intel® Xeon® Scalable processors (code name Skylake) June 2019 Revised with 2nd Generation Intel Xeon Scalable processors (code name Cascade Lake) Acknowledgements This paper was produced by the following: Authors: Joshua Weage Martin Feyereisen The information in this publication is provided “as is.” Dell Inc.
Table of contents Table of contents Revisions.............................................................................................................................................................................2 Acknowledgements .............................................................................................................................................................2 Table of contents .................................................................................................
Introduction 1 Introduction This technical white paper discusses the performance of Dassault Systѐmes’ Simulia Abaqus on the Dell EMC Ready Solution for HPC Digital Manufacturing. This Dell EMC Ready Solution for HPC was designed and configured specifically for Digital Manufacturing workloads, where Computer Aided Engineering (CAE) applications are critical for virtual product development.
System Building Blocks 2 System Building Blocks The Dell EMC Ready Solution for HPC Digital Manufacturing is designed using preconfigured building blocks. The building block architecture allows an HPC system to be optimally designed for specific end-user requirements, while still making use of standardized, domain-specific system recommendations. The available building blocks are infrastructure servers, storage, networking, and compute building blocks.
System Building Blocks A recommended base configuration for infrastructure servers is: • • • • • • • • Dell EMC PowerEdge R640 server Dual Intel® Xeon® Bronze 3106 processors 192 GB of RAM (12 x 16GB 2667 MTps DIMMs) PERC H330 RAID controller 2 x 480GB Mixed-Use SATA SSD RAID 1 Dell EMC iDRAC9 Enterprise 2 x 750 W power supply units (PSUs) Mellanox EDR InfiniBandTM (optional) The recommended base configuration for the infrastructure server is described as follows.
System Building Blocks Table 1 Recommended Configurations for the Compute Building Block Platforms Processors Dual Intel Xeon Gold 6242 (16 cores per socket) Dual Intel Xeon Gold 6248 (20 cores per socket) Dual Intel Xeon Gold 6252 (24 cores per socket) Memory Options 192 GB (12 x 16GB 2933 MTps DIMMs) 384 GB (12 x 32GB 2933 MTps DIMMs) 768 GB (24 x 32GB 2933 MTps DIMMs, R640 only) Storage Options PERC H330, H730P or H740P RAID controller 2 x 480GB Mixed-Use SATA SSD RAID 0 4 x 480GB Mixed-Use SATA S
System Building Blocks applications, such as implicit FEA, often have large file system I/O requirements and four Mixed-use SATA SSD’s in RAID 0 are used to provide fast local I/O. The compute nodes do not normally require extensive OOB management capabilities; therefore, an iDRAC9 Express is recommended.
System Building Blocks as the performance aspect of archival data tends to not impede HPC activities. Our experience in working with customers indicates that there is no ‘one size fits all’ operational and archival storage solution. Many customers rely on their corporate enterprise storage for archival purposes and instantiate a high performance operational storage system dedicated for their HPC environment. Operational storage is typically sized based on the number of expected users.
System Building Blocks For customers desiring a shared high-performance parallel filesystem, the Dell EMC Ready Solution for HPC Lustre Storage solution shown in Figure 3 is appropriate. This solution can scale up to multiple petabytes of storage. Figure 3 Dell EMC Ready Solution for Lustre Storage Reference Architecture 2.5 System Networks Most HPC systems are configured with two networks—an administration network and a high-speed/lowlatency switched fabric.
System Building Blocks 2.7 Services and Support The Dell EMC Ready Solution for HPC Digital Manufacturing is available with full hardware support and deployment services, including additional HPC system support options.
Reference System 3 Reference System The reference system was assembled in the Dell EMC HPC and AI Innovation Lab using the building blocks described in section 2. The building blocks used for the reference system are listed in Table 2.
Reference System The software versions used for the reference system are listed in Table 4. Table 4 Software Versions 13 Component Version Operating System Kernel RHEL 7.6 Windows Server 2016 (BBB) 3.10.0-957.el7.x86_64 OFED Mellanox 4.5-1.0.1.0 Bright Cluster Manager 8.
Abaqus Performance 4 Abaqus Performance Abaqus is a multi-physics Finite Element Analysis (FEA) software commonly used in multiple engineering disciplines. Depending on the specific problem types, FEA codes may or may not scale well across multiple processor cores and servers. Implicit FEA problems often place large demands on the memory and disk I/O sub-systems. Abaqus contains several solver options, both implicit and explicit.
Abaqus Performance Solver Elapsed Time (sec) Figure 5: Abaqus Explicit Performance 750 700 650 600 550 500 450 400 350 300 250 200 150 100 50 0 E1 E2 E5-2697Av4 E3 6142 E4 6242 6248 E5 E6 6252 These results are consistent with the Standard results in Figure 4, where the newer Cascade Lake processors with the most cores performed the best.
Abaqus Performance Solver Elapsed Time (sec) Figure 6: "mp_host_split" Performance 1200 1150 1100 1050 1000 950 900 850 800 750 700 650 600 550 500 450 400 350 300 250 200 150 100 50 0 S2A S4B S4D 1 2 4 S6 8 For all of the models test, substantial performance gains can be made using multiple domains per node, where using 8 domains (6 threads per domain) delivers the optimal performance. Users are encouraged to examine this option with their models to determine the optimal value.
Abaqus Performance These results demonstrate that while the performance increase is not linear with respect to the number of cores per node, there can be a substantial benefit in using system with large numbers of cores, and typically the best performance is obtained when using all of the cores available. The exception for these cases is the modal analysis model S3D, which is only thread parallel. The MPI mode is required to take full advantage of several cores per server.
Abaqus Performance Solver Elapsed Time (sec) Figure 9: Abaqus GPU Performance 1150 1100 1050 1000 950 900 850 800 750 700 650 600 550 500 450 400 350 300 250 200 150 100 50 0 S2A S3D GPU-0 S4B GPU-1 GPU-2 S4D S6 GPU-4 For each benchmark, the wall clock time (in sec) is shown. For the GPU enabled runs, all 40 Xeon cores were used. Benchmarks were carried out using the base system with no GPU acceleration, and using 1,2,4 GPUs.
Performance (relativite to Linux 6142 based R640) Abaqus Performance Figure 10: Basic Building Block Performance 1.75 1.5 1.25 1 0.75 0.5 32-core S3D S4B 64-core S4D S6 E1 E3 E6 Overall, these benchmarks display the performance of the Window based Basic Building Block is comparable to or greater than the typical dual-socket based Linux server.
Conclusion 5 Conclusion This technical white paper presents the Dell EMC Ready Solution for HPC Digital Manufacturing. The detailed analysis of the building block configurations demonstrate that the system is architected for a specific purpose—to provide a comprehensive HPC solution for the manufacturing domain. Use of this building block approach allows customers to easily deploy an HPC system optimized for their specific workload requirements.