DELL EMC HPC SYSTEM FOR LIFE SCIENCES V1.1 Designed for genomics sequencing analysis, bioinformatics and computational biology ABSTRACT Designing flexible HPC architecture requires the best technologies and clever strategies. Dell EMC HPC System for Life Sciences is the result of Dell EMC’s ongoing effort to provide customers the most suitable and cost-effective solution. Improving variant analysis performance by more than 17% from Dell EMC HPC Solution for Genomics v2.
TABLE OF CONTENTS TABLE OF CONTENTS .............................................................................................................2 EXECUTIVE SUMMARY ...........................................................................................................4 AUDIENCE ........................................................................................................................................ 4 INTRODUCTION ............................................................................
NAMD benchmark on Apoa1, F1atpase and STMV ................................................................................. 18 CONCLUSION ........................................................................................................................ 19 APPENDIX A: BENCHMARK COMMANDS .......................................................................... 20 BWA scaling test command ..................................................................................................................
EXECUTIVE SUMMARY In October 2015, Dell announced the Genomic Data Analysis Platform (GDAP) v2.0 to address the growing necessity of rapid genomic analysis due to the availability of next-generation sequencing (NGS) technologies [1]. Upon the successful implementation of GDAP v2.0, which is capable of processing up to 163 genomes per day1 while consuming 2 kilowatt-hour (kWh) per genome, we started to explore the life science domains beyond genomics.
The most accurate simulation of the human brain to date has been carried out in a Japanese supercomputer, with a single second’s worth of activity from just one percent of the complex organ taking one of the world’s most powerful supercomputers 40 minutes to calculate [3].
Figure 1: Dell EMC HPC System for Life Sciences with PowerEdge C6320 rack servers and Intel® OPA fabric
Compute and Management Components There are several considerations when selecting the servers for master node, login node, compute node, fat node and accelerator node. For master and login node, 1U form factor PowerEdge R430 is recommended. Master node is responsible for managing the compute nodes and optimizing the overall compute capacity. Login node is used for user access, compilations and job submissions.
Dell EMC PowerEdge R430 for master node, login node and CIFS gateway The solution includes four Dell EMC PowerEdge R430 servers. Two of these servers are designated as login nodes. Users can log in to these nodes and submit, monitor or delete jobs. The other two nodes function as redundant head nodes for the cluster, which are used by Bright Cluster Manager® to provision, manage and monitor the cluster in a high availability (HA) configuration.
Dell EMC HPC Lustre Storage Solution The Dell EMC Ready Bundle for HPC Lustre Storage, referred to as Dell EMC HPC Lustre Storage is designed for academic and industry users who need to deploy a fully-supported, easy-to-use, high-throughput, scale-out and cost-effective parallel file system storage solution. The solution uses the Intel® Enterprise Edition (EE) for Lustre® software v.3.0 [8].
o o o o Ports 01-04 and 27–52 are assigned to the cluster’s private management network to be used by Bright Cluster Manager® connecting master, login, CIFS gateway and compute nodes. The PowerEdge C6320 server’s ethernet and iDRAC constitute a majority of these ports. Ports 06–09 are used for the private network associated with NSS7.0-HA. The rest of the port 05 and ports 12–26 are allocated to the Lustre solution for its private management network Port 10 and 11 are used for the PDUs.
o o o o o MPSS 3.6.1 MLNX OFED 3.2 OM 8.3 and DTK update Lab7 Bio-Builds Molecular dynamics simulation Bright Cluster Manager® 7.2 Bright Cluster Manager® (BCM) for Dell EMC is a comprehensive solution for provisioning, monitoring, and managing Dell EMC clusters [10]. Two Dell EMC PowerEdge R430 servers are deployed as head nodes in a HA active-passive configuration by using the NSS7.0-HA solution as shared storage .
BWA shows stable scalability Figure 4 shows the run times of BWA on various sequence data sizes ranging from 2 to 208 million fragments (MF) and different number of threads. Oversubscription is avoided to ensure each thread runs on a single physical core. As shown in Figure 4 and Figure 5, BWA scales linearly over both input size and the number of cores. 30.00 20.00 167 84 10.00 40 2 0.
Table 2: Speed-up by increasing parallelism on E5-2680 v3/DDR4-2133 Sequence Data Size (Million Fragments) Speed-up 2 10 40 70 1 4 8 Number Of Cores 12 16 20 24 1.00 3.84 7.38 10.14 12.62 14.20 15.35 1.00 3.84 7.59 11.30 14.88 18.25 21.72 1.00 3.77 7.26 10.72 14.05 17.34 20.37 1.00 3.72 7.43 10.97 14.41 17.71 20.64 Table 3: Speed-up by increasing parallelism on E5-2690 v4/DDR4-2400 Sequence Data Size (Million Fragments) Speed-up 2 10 40 70 1 4 8 Number Of Cores 12 16 20 24 1.00 3.48 6.26 8.47 10.
o o o o Best Practices Phase 1: Pre-processing Best Practices Phase 2A: Calling germline variants Best Practices Phase 2B: Calling somatic variants Best Practices Phase 3: Preliminary analyses Here we tested phase 1, phase 2A and phase 3 for a germline variant calling pipeline. The details of commands used in the benchmark are in APPENDIX A.
The throughput of Dell EMC HPC System for Life Sciences Total run time is the elapsed wall time from the earliest start of Phase 1, Step 1 to the latest completion of Phase 3, Step 2. Time measurement for each step is from the latest completion time of the previous step to the latest completion time of the current step as illustrated in Figure 6.
GPUs can only be programmed with CUDA, OpenACC and the OpenCL framework. Most of the life sciences community is not familiar with these frameworks, and so few biologists or bioinformaticians can make efficient use of GPU architectures. However, GPUs have been making inroads into the molecular dynamics and electron microscopy fields.
Figure 8 Benchmark results from Amber
HOOMD-blue benchmark suite HOOMD-blue is a general-purpose particle simulation toolkit. It scales from a single CPU core to thousands of GPUs. Performance results in Figure 9 are reported in hours to complete ten million Monte Carlo sweeps, where one sweep is N trial moves on 1,048,576 particles for 10e6 steps. HOOMD-blue shows almost linear scaling behavior on both CPUs and GPUs. Unlike the results from Amber, HOOMD-blue does not take an advantage of Lustre for this hexagon benchmark.
Figure 11: F1-ATPase benchmark results Figure 12: STMV benchmark results CONCLUSION This white paper is focused on testing working solutions for diverse life sciences applications. Upon successful iteration of Dell EMC’s HPC Solution for genomics v2.0, we incorporated a molecular dynamics simulation solution into the flexible architecture in addition to improving the performance of the genomics data analysis platform.
APPENDIX A: Benchmark commands BWA scaling test command bwa mem -M -t [number of cores] -v 1 [reference] [read fastq 1] [read fastq 1] > [sam output file] BWA-GATK commands Phase 1. Pre-processing Step 1. Aligning and sorting bwa mem -c 250 -M -t [number of threads] -R ‘@RG\tID:noID\tPL:illumine\tLB:noLB\tSM:bar’ [reference chromosome] [read fastq 1] [read fastq 2] | samtools view -bu - | sambamba sort -t [number of threads] -m 30G --tmpdir [path/to/temp] -o [sorted bam output] /dev/stdin Step 2.
Step 2. GenotypeGVCFs java -d64 -Xms8g -Xmx30g -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -nt [number of threads] -R [reference chromosome] -V [gvcf output] -o [raw vcf] Phase 3. Preliminary analyses Step 1. Variant recalibration java -d64 -Xms512m -Xmx2g -jar GenomeAnalysisTK.jar -T VariantRecalibrator -R [reference chromosome] --input [raw vcf] -an QD -an DP -an FS -an ReadPosRankSum -U LENIENT_VCF_PROCESSING --mode SNP --recal_file [raw vcf recalibration] -tranches_file [raw vcf tranches] Step 2.
APPENDIX B: Test data descriptions Sequence data for BWA scaling tests Table 6: Data sets for BWA scaling tests Sample File Size (GB) SRR1060762 0.21 SRR593165 1.59 Read Length 76 102 # of Million Fragments 1.987 10.083 SRR786500 3.92 51 39.251 ERR754356 11.58 102 69.225 ERR754362 15.51 102 83.950 SRR1299474 11.27 52 122.253 ERR754364 29.61 102 167.366 SRR1299472 20.24 52 207.
APPENDIX C: Running time details of BWA-GATK pipeline Table 8: Detailed running time measure for BWA-GATK pipelines Steps Human 10x Human 50x Cow 12x C6320 OPA 3.23 FC430 FDR 3.93 C6320 OPA 10.51 FC430 FDR 15.79 C6320 OPA 4.42 FC430 FDR 5.77 C6320 OPA 0.99 0.66 0.66 3.28 2.62 0.65 0.73 0.17 0.29 0.42 1.08 0.20 2.25 2.50 7.65 8.90 0.99 1.82 4.12 HaplotypeCaller 3.74 4.52 GenotypeGVCFs Variant Recalibration Apply Recalibration 0.
REFERENCES [1] "Dell HPC System for Life Sciences White Paper," [Online]. Available: http://en.community.dell.com/techcenter/blueprints/blueprint_for_hpc/m/mediagallery/20441607. [2] "Bioinformatics," [Online]. Available: https://en.wikipedia.org/wiki/Bioinformatics. [3] "Supercomputer models one second of human brain activity," [Online]. Available: http://www.telegraph.co.uk/technology/10567942/Supercomputer-models-one-second-ofhuman-brain-activity.html.