DELL EMC READY BUNDLE FOR HPC LIFE SCIENCES Refresh with 14th Generation servers ABSTRACT Dell EMC’s flexible HPC architecture for Life Sciences has been through a dramatic improvement with new Intel® Xeon® Scalable Processors. Dell EMC Ready Bundle for HPC Life Sciences equipped with better 14G servers, faster CPUs, and more memory bring a much higher performance in terms of throughput compared to the previous generation especially in genomic data processing.
TABLE OF CONTENTS EXECUTIVE SUMMARY ...........................................................................................................3 AUDIENCE ........................................................................................................................................ 3 INTRODUCTION ........................................................................................................................4 SOLUTION OVERVIEW .................................................................
EXECUTIVE SUMMARY Since Dell EMC announced Dell EMC HPC solution for Life Science in September 2016, the current Dell EMC Ready Bundle for HPC Life Sciences can process 485 genomes per dayi with 64x C6420s and Dell EMC Isilon F800 in our benchmarking. This is roughly twofold improvement from Dell EMC HPC System for Life Science v.1.
INTRODUCTION Although the successful completion of the Human Genome Project was announced on April 14, 2003 after a 13-year-long endeavor and numerous exciting breakthroughs in technology and medicine, there’s still a lot of work ahead for understanding and using the human genome.
management easier. Figure 1 shows the components of two fully loaded racks using 64x Dell EMC PowerEdge C6420 rack server chassis as a compute subsystem, Dell EMC PowerEdge R940 as a fat node, Dell EMC PowerEdge C4130 as an accelerator node, Dell EMC Ready Bundle for HPC NFS Storage, Dell EMC Ready Bundle for HPC Lustre Storage and Intel® OPA as the cluster’s high speed interconnect.
Ideally, the compute nodes in a cluster should be as identical as possible since the performance of parallel computation is bounded by the slowest component in the cluster. Heterogeneous clusters do work, but it requires careful execution to achieve the best performance; however, for Life Sciences applications, heterogeneous clusters make perfect sense to handle completely independent workloads such as DNA-Seq, De Novo assembly or Molecular Dynamics Simulations.
Dell EMC Ready Bundle for HPC NFS Storage (NSS7.0-HA) NSS 7.0 HA is designed to enhance the availability of storage services to the HPC cluster by using a pair of Dell EMC PowerEdge servers with Dell EMC PowerVault™ storage arrays, Red Hat HA software stack. The two PowerEdge servers have shared access to disk-based Dell EMC PowerVault storage in a variety of capacities, and both are directly connected to the HPC cluster using Intel® OPA, IB or 10GbE.
Figure 3 Lustre-based storage solution components Dell EMC Isilon F800 All-flash and Dell EMC Isilon Hybrid Scale-out NSA H600 A single Isilon storage cluster can host multiple node types to maximize deployment flexibility. Node types range from the Isilon F (All Flash) to H (Hybrid), and A (Archive) nodes. Each provides a different optimization point for capacity, performance, and cost.
seamlessly integrate with existing Isilon clusters to accelerate the performance of an enterprise data lake and lower the overall total cost of ownership (TCO) of a multi-tiered all-flash and high capacity SATA solution.
Management traffic typically communicates with the Baseboard Management Controller (BMC) on the compute nodes using IPMI. The management network is used to push images or packages to the compute nodes from the master nodes and for reporting data from client to the master node. Dell EMC Networking S3048-ON is recommended for management network. Figure 7 Dell Networking S3048-ON Interconnect Figure 8 describes how the network components are configured for different storages.
Bright Cluster Manager Bright Computing is a commercial software that provides comprehensive software solutions for deploying and managing HPC clusters, big data clusters and OpenStack in the data center and in the cloud (6). Bright cluster Manager can be used to deploy complete clusters over bare metal and manage them effectively. Once the cluster is up and running, the graphical user interface monitors every single node and reports if it detects any software or hardware events.
Hence, one might want to know what the best alignment software is; however, it is hard to answer to the question since the answer can be drawn from many different conditions. Even if we could compare all different alignment software, there will not be any conclusion which alignment tool is the best.
number of samples increases. A subtle pitfall is a storage cache effect. Since all of the simultaneous runs will read/write roughly at the same time, the run time would be shorter than real cases. Despite these built-in inaccuracies, this variant analysis performance test can provide valuable insights to estimating how much resources are required for an identical or even similar analysis pipeline with a defined workload.
The number of compute nodes used for the tests are 64x C6420s and 63x C6320s (64x C6320s for testing H600). The number of samples per node was increased to get the desired total number of samples processed concurrently. For C6320 (13G), 3 samples per node was the maximum number of samples each node can process. 64, 104, and 126 test results for 13G system (blue) were with 2 samples per node while 129, 156, 180, 189 and 192 sample test results were obtained from 3 samples per node.
Amber benchmark suite This suite includes the Joint Amber-Charmm (JAC) benchmark considering dihydrofolate reductase (DHFR) in an explicit water bath with cubic periodic boundary conditions. The major assumptions are that the DHFR molecule presents in water without surface effect and its movement assumed to follow microcanonical (NVE) ensemble which assumes constant amount of substance (N), volume (V), and energy (E).
LAMMPS Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is a classical molecular dynamics code and has potentials for solid-state materials (metals, semiconductors) and soft matter (biomolecules, polymers) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale.
CRYO-EM PERFORMANCE The purpose of this study was to validate the optimized Relion (for REgularised LIkelihood OptimizatioN) on Dell EMC PowerEdge C6420s with Skylake CPUs. Relion was developed from the Scheres lab at MRC Laboratory of Molecular Biology. It uses an empirical Bayesian approach to refine multiple 3D images or 2D class averages for the data generated from CryoElectron Microscopy (Cryo-EM).
CONCLUSION Overall, 14th generation servers with Skylake and larger/faster memory size (due to higher number of memory channels compare with Broadwell) show a better throughput on BWA-GATK pipeline. The throughput for this type of work improved from four 30x genomes per day per C6320 to seven 30x genomes per day per C6420.
APPENDIX A BWA scaling test command bwa mem -M -t [number of cores] -v 1 [reference] [read fastq 1] [read fastq 1] > [sam output file] BWA-GATK commands Phase 1. Pre-processing Step 1. Aligning and sorting bwa mem -c 250 -M -t [number of threads] -R ‘@RG\tID:noID\tPL:illumine\tLB:noLB\tSM:bar’ [reference chromosome] [read fastq 1] [read fastq 2] | samtools view -bu - | sambamba sort -t [number of threads] -m 30G --tmpdir [path/to/temp] -o [sorted bam output] /dev/stdin Step 2.
java -d64 -Xms8g -Xmx30g -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -nt [number of threads] -R [reference chromosome] -V [gvcf output] -o [raw vcf] Phase 3. Preliminary analyses Step 1. Variant recalibration java -d64 -Xms512m -Xmx2g -jar GenomeAnalysisTK.jar -T VariantRecalibrator -R [reference chromosome] --input [raw vcf] -an QD an DP -an FS -an ReadPosRankSum -U LENIENT_VCF_PROCESSING --mode SNP --recal_file [raw vcf recalibration] --tranches_file [raw vcf tranches] Step 2.
APPENDIX B # 3d Lennard‐Jones melt variable variable variable variable variable variable N string off w equal 10 t equal 7900 m equal 1 n equal 0 p equal 0 # Newton Setting # Warmup Timesteps # Main Run Timesteps # Main Run Timestep Multiplier # Use NUMA Mapping for Multi‐Node # Use Power Measurement variable variable variable x equal 4 y equal 2 z equal 2 variable variable variable variable xx equal 20*$x yy equal 20*$y zz equal 20*$z rr equal floor($t*$m) newton if "$n > 0" $N then "processors * *
REFERENCES 1. Blueprint for High Performance Computing. Dell TechCenter. [Online] http://en.community.dell.com/techcenter/blueprints/blueprint_for_hpc/m/mediagallery/2044347 3. 2. ETL: The Silent Killer of Big Data Projects. insideBIGDATA. [Online] https://insidebigdata.com/2015/07/23/etl-the-silent-killer-of-big-data-projects/. 3. Dell EMC PowerEdge Servers. [Online] https://www.dellemc.com/en-us/servers/index.htm. 4. Dell EMC Ready Bundles for HPC Storage. [Online] https://si.cdn.dell.