HP Data Protector Software Performance White Paper

15
RAID
The use of RAID for a disk backup should be carefully considered. There are five main levels of
RAID each with their own strengths. The raw I/O speed of the disk backup device significantly
affects the overall backup performance.
There are five main levels of RAID that are commonly referred to:
RAID 0 (Striping)
Data is striped across the available disks, using at least two disk drives. This method offers
increased performance, but there is no protection against disk failure. It will allow you to use
the maximum performance the disks have to offer, but without redundancy.
RAID 1+0 (Disk Drive Mirroring and Striping)
The disks are mirrored in pairs and data blocks are striped across the mirrored pairs, using at
least four disk drives. In each mirrored pair, the physical disk that is not busy answering other
requests answers any read request sent to the array; this behavior is called load balancing. If
a physical disk fails, the remaining disk in the mirrored pair can still provide all the necessary
data. Several disks in the array can fail without incurring data loss, as long as no two failed
disks belong to the same mirrored pair. This fault-tolerance method is useful when high
performance and data protection are more important than the cost of physical disks. The
major advantage of RAID 1+0 is the highest read and write performance of any fault-tolerant
configuration.
RAID 1 (Mirroring)
Data is written to two or more disks simultaneously, providing a spare mirror copyin case
one disk fails. This implementation offers high assurance of disk transfers completing
successfully, but adds no performance benefit. It also adds cost, as you do not benefit from
the additional disk capacity.
RAID 3 (Striping and Dedicated Parity)
This type of array uses a separate data protection disk to store encoded data. RAID 3 is
designed to provide a high transfer rate. RAID 3 organizes data by segmenting a user data
record into either bit- or byte-sized chunks and evenly spreading the data across several
drives in parallel. One of the drives acts as a parity drive. In this manner, every record that is
accessed is delivered at the full media rate of the drives that comprise the stripe group. The
drawback is that every record I/O stripe accesses every drive in the group. RAID 3
architecture should only be chosen in a case where it is virtually guaranteed that there will be
only a single, long process accessing sequential data. Video servers and graphics servers are
good examples of appropriate RAID 3 applications. RAID 3 architecture is also beneficial for
backups but becomes a poor choice in most other cases due to its limitations.
RAID 5 (Striping and Distributed Parity)
Mode data is striped across all available drives, with parity information distributed across all of
them. This method provides high performance, combined with failure protection, but requires
at least three disk drives. If one disk fails, all data will be recoverable from the remaining disks
due to the parity bit, which allows data recovery. The disk write performance on RAID 5 will be
slower than RAID 0 because the parity has to be generated and written to disk.
Tape drive
The mechanical impact on tape drives of providing data at too slow a rate is generally
underestimated and can result in slow performance and broken-down tape drives.
The tape drive should be operating at or above its lowest streaming data rate to achieve the
best life for the head, mechanism, and tape. If the data is not sent fast enough, the internal
buffer will empty and the drive will not write a continuous stream of data. At that point, the drive
has to perform head repositioning. This is also known as shoe shiningand causes excessive
wear to the tape, the tape drive heads, and the mechanical tape drive components. Tape drives
have buffers in the data path that are large enough to prevent head repositioning from explicitly
slowing the backup further, however, the increased error rate from worn heads and media
causes more tape to be used and additional retries to be performed. This will slow the backup
down and it will get worse over time.
Storage Area Network (SAN)
The standard topology for mid-size and large environments is SAN-based. The SAN has its own
components and data paths. Each of them could become a bottleneck:
Any host bus adapter (of server, disk, tape and tape library bridge)