Dell PowerEdge R730xd Performance and Sizing Guide for Red Hat Ceph Storage This technical white paper provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Revisions Date Description August 2016 Initial release THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND. Copyright © 2016 Dell Inc. All rights reserved. Dell and the Dell logo are trademarks of Dell Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies.
Contents Revisions ..................................................................................................................................................................................................2 Glossary .................................................................................................................................................................................................. 4 Executive Summary .........................................................................
Glossary CephFS Ceph Filesystem. The portable operating system interface (POSIX) filesystem components of Ceph. iDRAC The integrated Dell Remote Access Controller, an embedded server management interface. HDD Hard Disk Drive. KVM Kernel Virtual Machine, a hypervisor. MON The Ceph monitor software. Node One of the servers in the cluster. OSD Object Storage Device, a physical or logical unit. RBD Ceph RADOS Block Device. RGW RADOS Gateway, the S3/Swift gateway component of Ceph.
Executive Summary Data storage requirements are staggering, and growing at an ever-accelerating rate. These demanding capacity and growth trends are fueled in part by the enormous expansion in unstructured data, including music, image, video, and other media; database backups, log files, and other archives; financial and medical data; and large data sets, aka “big data”. Not to mention the growing data storage requirements expected by the rise of the internet of things (IoT).
1 Introduction Unstructured data has demanding storage requirements across the access, management, maintenance, and particularly the scalability dimensions. To address these requirements, Red Hat Ceph Storage provides native object-based data storage and enables support for object, block, and file storage.
The PowerEdge R730xd is an exceptionally flexible and scalable two-socket 2U rack server, that delivers high performance processing and a broad range of workload-optimized local storage possibilities, including hybrid tiering. Designed with an incredible range of configurability, the PowerEdge R730xd is well suited for Ceph. PowerEdge servers allow users to construct and manage highly efficient infrastructures for data centers and small businesses.
For the latest specifications and full spec sheet on the PowerEdge R730xd server, please see http://www.dell.com/us/business/p/poweredge-r730xd/pd. The Dell PowerEdge R730xd offers advantages that include the ability to drive peak performance by: Accelerating application performance with the latest technologies and dynamic local storage. Scaling quickly and easily with front-accessible devices, ranging from low-cost SATA hard drives to 2.
PowerEdge R730xd and Ceph Storage Configurations used for Benchmark Tests Configurations PowerEdge R730xd 12+3, 3x Replication PowerEdge R730xd 12+3, EC 3+2 PowerEdge R730xd 16+1, 3x Replication PowerEdge R730xd 16+1, EC 3+2 PowerEdge R730xd 16j+1, 3x Replication PowerEdge R730xd 16j+1, EC 3+2 PowerEdge R730xd 16+0, 3x Replication PowerEdge R730xd 16+0, EC3+2 Brief Description PowerEdge R730xd with 12 hard disk drives (HDDs) and 3 solid state drives (SSDs), 3X data replication and single-drive RAID0 mode.
2 Overview of Red Hat Ceph Storage A Ceph storage cluster is built from large numbers of Ceph nodes for scalability, fault-tolerance, and performance.
Storage offers mature interfaces for enterprise block and object storage, making it well suited for active archive, rich media, and cloud infrastructure workloads like OpenStack®. Delivered in a unified selfhealing and self-managing platform with no single point of failure, Red Hat Ceph Storage handles data management so businesses can focus on improving application availability.
Ceph cluster design considerations Optimization Criteria Capacityoptimized Throughputoptimized 2.
Pools: A Ceph storage cluster stores data objects in logical dynamic partitions called pools. Pools can be created for particular data types, such as for block devices, object gateways, or simply to separate user groups. The Ceph pool configuration dictates the number of object replicas and the number of placement groups (PGs) in the pool. Ceph storage pools can be either replicated or erasure-coded, as appropriate for the application and cost model.
RADOS Layer in the Ceph Architecture Writing and reading data in a Ceph storage cluster is accomplished by using the Ceph client architecture. Ceph clients differ from competitive offerings in how they present data storage interfaces. A range of access methods are supported, including: RADOSGW: Bucket-based object storage gateway service with S3 compatible and OpenStack Swift compatible RESTful interfaces.
Replicated storage pools: Replication makes full copies of stored objects, and is ideal for quick recovery. In a replicated storage pool, Ceph configuration defaults to a replication factor of three, involving a primary OSD and two secondary OSDs. If two of the three OSDs in a placement group become unavailable, data may be read, but write operations are suspended until at least two OSDs are operational.
3 Test Setup and Methodology This section describes the Red Hat Ceph Storage on Dell PowerEdge R730xd Testbed. It also describes the testing performed on the testbed. The following subsections cover: 3.1 Testbed hardware configuration Installation of Red Hat Ceph Storage software Benchmarking procedure Physical setup Figure 6 illustrates the testbed for the Red Hat Ceph Storage on Dell PowerEdge R730xd.
The 5-node Red Hat Ceph Storage cluster based on Dell PowerEdge R730xd servers Dell PowerEdge R730xd Performance and Sizing Guide for Red Hat Ceph Storage - A Dell Red Hat Technical White Paper 17
3.2 Hardware and Software Components Tables 4 and 5 give details on the testbed hardware. Hardware Components used for Testbed Testbed Details Ceph tier OSD MON/RGW CLIENT Platform Dell PowerEdge R730xd Dell PowerEdge R630 Dell PowerEdge R220 2x Intel Xeon E5-2630 v3 2.4GHz 2x Intel Xeon E5-2650 v3 2.3 GHz 1x Intel Celeron G1820 2.
While the overall physical setup, server types, and number of systems remain unchanged, the configuration of the OSD node’s storage subsystems was altered. Throughout the benchmark tests, different I/O subsystem configurations are used to determine the best performing configuration for a specific usage scenario. Table 6, Table 7, and Table 8 list the configurations used in the benchmark tests.
Server Configurations Server configuration OS disk Data disk type PowerEdge R730xd 12+3, 3xRep 2x 500 GB 2.5" PowerEdge PowerEdge R730xd 16+0, R730xd 16r+1, EC3+2 3xRep 2x 500 GB 2.5" 2x 500 GB 2.5" PowerEdge R730xd 16+1, EC 3+2 PowerEdge R730xd 16j+1, 3xRep 2x 500 GB 2.5" 2x 500 GB 2.5" HDD 7.2K SAS HDD 7.2K SAS HDD 7.2K SAS HDD 7.2K SAS HDD 7.
3.3 Previous benchmark data has shown that per-disk read-ahead settings had no effect on Ceph performance. Deploying Red Hat Enterprise Linux (RHEL) Red Hat Ceph Storage is a software-defined object storage technology which runs on RHEL. Thus, any system that can run RHEL and offer block storage devices is able to run Red Hat Ceph Storage. For the purpose of repeated execution, the configuration of the R730xd and R630 nodes as well as the deployment of RHEL on top of them has been automated.
3.
Network performance measurements have been taken by running point-to-point connection tests following a fully-meshed approach; that is, each server’s connection has been tested towards each available endpoint of the other servers. The tests were run one by one and thus do not include measuring the switch backplane’s combined throughput. Although the physical line rate is 10000 MBit/s for each individual link, the results are within ~1.5% of the expected TCP/IP overhead. The MTU used was 1500.
Disk IO Baseline (results are average) Disk Type OSD Journal Journal Seagate 4TB SAS Intel DC S3700 200GB Intel DC P3700 800GB Random Read 314 IOPS (8K blocks) 72767 IOPS (4K blocks) 374703 IOPS (4K blocks) Random Write 506 IOPS (8K blocks) 56483 IOPS (4K blocks) 101720 IOPS (4K blocks) Sequential Read 189.92 MB/s (4M blocks) 514.88 MB/s (4M blocks) 2201 MB/s (4M blocks) Sequential Write 158.16 MB/s (4M blocks) 298.35 MB/s (4M blocks) 1776 MB/s (4M blocks) Read Latency 12.
CBT Diagram The utility is installed on the admin VM.
The CBT job files are specified in YAML syntax, for example: cluster: user: ‘cbt’ head: “r220-01” clients: [“r220-01”, “r220-02”] osds: [“r730xd-01”, “r730xd-02”, “r730xd-03”, “r730xd-04”, “r730xd-05”] mons: r630-01: a: “192.168.100.101:6789” r630-02: a: “192.168.100.102:6789” r630-03: a: “192.168.100.
The file is divided in two sections: cluster and benchmarks. The first describes the cluster with the most essential data. The user specified here is a system user which needs to be present on all nodes and needs passwordless sudo access without the requirement for an interactive terminal. The head nodes, clients and osds are listed by their domain name of IP address. The MONs are specified in a syntax that distinguishes between a front-end and back-end network for Ceph.
rados_seq_suite_<1..10>_clients.yml o librados-level benchmark on a replicated pool, replication factor 3 In each benchmark, these files are called with CBT in a loop. CAUTION When executing multiple instances of CBT subsequently in a loop, as in this benchmark, it is important to note that CBT will delete any existing pool with the same name. This is an asynchronous process that triggers purging object structures on the backend file store.
4 Benchmark Test Results Many organizations are trying to understand how to configure hardware for optimized Ceph clusters that meet their unique needs. Red Hat Ceph Storage is able to run on a myriad of diverse industry-standard hardware configurations, however designing a successful Ceph cluster requires careful analysis of issues related to application, capacity, and workload.
4.1 Comparing Server Throughput in Different Configurations This test compares the read and write throughputs of all configurations tested. Reads: The Ceph-replicated configurations generally yield higher read-throughput compared to erasure-coded configurations. This is because the erasure-coded reads must reassemble data objects from erasure-coded chunks.
4.2 Comparing Overall Solution Price/Performance Based on the highest measured write price/performance, the R730xd 16+1, 3xRep configuration yielded optimal price performance for mixed read/write, and throughput-oriented workloads. However, for readmostly workloads, the R730xd 12+3, 3xRep configuration is an attractive alternative based on its superior read price/performance.
4.3 Comparing Overall Solution Price/Capacity For capacity-archive workloads, erasure-coded configurations are significantly less expensive per GB data archived. Write-heavy capacity-archives should use the R730xd 16+1, EC configuration, because adding an SSD write-journal increases total $/GB by only a small value and increases write performance.
4.4 Comparing Server Throughput in Replication vs. Erasure-coded Keeping everything else constant, replicated reads perform much better than erasure-coded reads. However, erasure-coded writes perform better than replicated writes. Performance Comparison Replication vs. Erasure‐coding R730xd 16+1, EC8+3 R730XD 16+1, EC3+2 Writes R730xd 16j+1, 3xRep Reads R730xd 16r+1, 3xRep 0 200 400 600 800 1000 1200 1400 MBps per Server (4MB seq IO) Comparison of server throughput in replication vs.
4.5 Comparing Server Throughput in JBOD and RAID0 Modes The R730xd configurations in this study used the PERC H730 RAID controller. Ceph OSDs are typically configured in a 1:1 ratio with HDDs. Therefore, the RAID controller can either be configured in JBOD mode or with each HDD as single-drive RAID0 volumes. Summary: RAID0 configurations provide better throughput than JBOD configurations.
5 Dell Server Recommendations for Ceph Ceph operators frequently request simple, optimized cluster configurations for different workload types. Common requests are for throughput-optimized and capacity-optimized workloads. IOPS-intensive workloads on Ceph are also emerging. Based on extensive testing by Red Hat and Dell on various Dell PowerEdge server configurations, this matrix provides general guidance on sizing Ceph clusters built on Dell PowerEdge servers.
6 Conclusions After testing different combinations of Red Hat and Ceph Storage on Dell PowerEdge R730xd servers to provide a highly-scalable enterprise storage solution, the following conclusions were made: The 3x replication configurations provided high throughput for read operations because the erasure-coded reads have to reassemble data objects from the erasure-coded chunks.
7 References Additional information can be obtained by emailing ceph_info@Dell.com. If you need additional services or implementation help, please contact your Dell sales representative. Red Hat Ceph Storage 1.3 Hardware Guide: https://access.redhat.com/webassets/avalon/d/Red_Hat_Ceph_Storage-1.3-Hardware_Guideen-US/Red_Hat_Ceph_Storage-1.3-Hardware_Guide-en-US.pdf Ceph: http://ceph.com/ http://docs.ceph.com/docs/master/architecture/ Dell PowerEdge R730xd: http://www.dell.