DELL EMC HPC Solution for Life Sciences v1.
Revisions Date Description October 2016 Initial release THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND. Copyright © - Dell Inc. All rights reserved. Dell and the Dell EMC logo are trademarks of Dell Inc. in the United States and/or other jurisdictions.
Table of contents Revisions.............................................................................................................................................................................2 Executive summary.............................................................................................................................................................4 Audience .............................................................................................................................
Executive summary In October 2015, Dell technologies introduced Genomic Data Analysis Platform (GDAP) v2.0 to answer the growing necessities of rapid genomic analysis due to the availability of next-generation sequencing technologies. Upon the successful implementation of GDAP v2.0 which is capable of processing up to 133 genomes per day while consuming 2 Kilowatt-hour (kWh) per genome, we started to explore the life science domains beyond genomics.
1 Introduction Dell HPC Solution for Life Sciences is a pre-integrated, tested, tuned and purpose-built platform, leveraging the most relevant of Dell’s High Performance Computing line of products and best-in-class partner products due to the high diversity in life sciences applications. It encompasses all the hardware resources required for various life sciences data analysis while providing an optimal balance of compute density, energy efficiency, and performance from Enterprise server line-up of Dell.
2 System Design The first step in designing the system is to decide upon the following four basic design considerations: • • • • Type of workload - Genomics/NGS data analysis only - General purpose and Genomics/NGS data analysis - Adding molecular dynamics simulation capacity Parameter for sizing - Number of compute nodes - Genomes per day to be analyzed Form factor of servers - 2U shared infrastructure of high density that can host 4 compute nodes in one chassis (C6320) - 2U shared infrastructure of v
The master node controls the OS imaging and administration of the cluster. One master node is default, and high availability of the master node is optional. The configuration of Power Edge R430 which is the recommended server for a master node is provided below: • • • • • • • • 2.1.
• • • • 2.1.4 8 x 16GB RDIMM, 2133MT/s, Dual Rank 200GB Solid State Drive uSATA Mix Use Slim MLC 6Gbps 1.8inHot-plug Drive Interconnect: Mellanox ConnectX-3 FDR Mezzanine adapter iDRAC8 Enterprise Common Internet File System (CIFS) gateway Configuration A Dell PowerEdge R430 is used as the CIFS gateway for transferring data generated by the next generation sequencing machines into the storage. The configuration for CIFS gateway is provided below: • • • • • • • • 2.1.
• • • • 2.2 - 4GB/8GB/16GB/32GB DDR4 up to 2400MT/s Up to 2 x 1.8” SATA SSD boot drives Optional 96-lane PCIe 3.0 switch for certain accelerator configurations iDRAC8, Dell OpenManage Essentials 4 x K80 GPUs Network Configuration The Dell HPC Solutions for Genomics is available in Intel OPA and two IB variants. There is also a Force10 S3048-ON GbE switch which is used in both configurations whose purpose is described here.
2.2.2 High-Speed Interconnects In high performance computing application performance depends on the number of CPU/GPU cores, memory, interconnect, storage performance and so on. For a server to perform better, lower latency and higher bandwidth is needed for these systems to communicate with each other. The type of network chosen for computational traffic depends upon the latency, bandwidth, packet size at peak bandwidth and the message rate.
Dell Networking S4820T • • 1U high performance ToR switch provides (48) 1/10G BASE-T ports that support 100Mb/1 Gb/10Gb and four 40GbE QSFP+ uplinks. Each 40GbE QSFP+ uplink can be broken out into four 10GbE ports using breakout cables Dell Networking N4032F SFP Switch • • • • 2.
• • • The NFS servers were connected to the clients by using the public network. This network was Intel OPA. For the HA functionality of the NFS servers, a private 1 Gigabit Ethernet network was configured to monitor server health and heartbeat, and to provide a route for the fencing operations by using a Dell Networking 3048-ON Gigabit Ethernet switch. Power to the NFS servers was provided by two APC switched PDUs on two separate power buses. Figure 2 NSS7.0-HA test bed 2.3.
Figure 3 Lustre based storage solution components Figure 4 Dell HPC Lustre Storage Solution Components Overview 13 DELL EMC HPC Solution for Life Sciences v1.
2.4 Software Configuration Along with the hardware components, the solution includes the following software components: • • 2.4.1 Bright Cluster Manager® BioBuilds Bright Cluster Manager Bright Computing is a commercial software that provides comprehensive software solutions for deploying and managing HPC clusters, big data clusters and OpenStack in the data center and in the cloud. Bright cluster Manager can be used to deploy complete clusters over bare metal and manage them effectively.
3 Sample Architectures 3.1 Case 1: PowerEdge C6320 compute subsystem with Intel® OPA fabric Figure 5 Dell HPC Solution for Life Sciences with PowerEdge C6320 rack servers and Intel® OPA fabric 15 DELL EMC HPC Solution for Life Sciences v1.
3.1.1 Solution summary This solution is nearly identical to the solutions with IB EDR and 10 GbE versions except for a couple of changes in the switching infrastructure and network adapters. As shown in Figure 5, this solution uses one 48U rack and requires extra deep enclosure. Bright Cluster Manager is the default tool and a proprietary software solution stack from Bright Computing.
3.2 Case 2: PowerEdge FC430 compute subsystem with IB FDR fabric Figure 6 Dell HPC Solution for Life Sciences with PowerEdge FC430 rack servers with IB FDR fabric 17 DELL EMC HPC Solution for Life Sciences v1.
3.2.1 Solution summary The FC30 solution with IB FDR interconnect is nearly identical to the 10 GbE version except for a couple of changes in the switching infrastructure and network adapters and have 2:1 blocking FDR connectivity to the top of rack FDR switch. • • • • • The port assignment of the Dell Networking S3048-ON switch for the Intel® OPA or IB versions of the solution is as follows.
4 Conclusion The HPC Life Science System Builder provides the minimum architecture that could achieve the targeted NGS workload, informed decision making and increased efficiency. However, the configuration provided by Dell HPC Life Science System Builder Tool is intended to be used as a starting point only. Dell suggests that you contact the technical sales representative to review this quote for completeness and include other variables not included as input to the tool.