HP Data Protector Software Performance White Paper

14
During backup, these servers should not be under heavy load from other applications that run
I/O and CPU intensive operations, such as virus scans or a large number of database
transactions.
Backup servers demand special attention for proper sizing because they are central to the
backup process as they run the required agents. The data is passed into and out of the server’s
main memory as it is read from the disk subsystem or network and written to tape. The server
memory should be sized accordingly, for example, in the case of an online database backup,
where the database uses a large amount of memory. Backup servers that receive data from
networks rely also on fast connections. If the connection is too slow, a dedicated backup LAN or
moving to a SAN architecture could improve the performance.
Application servers without any backup devices depend basically on a good performance of
connected networks and disks. In some cases, file systems with millions of small files (such as
Windows NTFS) could be an issue.
Backup application
For database applications (such as Oracle, Microsoft SQL and Exchange), use the backup
integration provided by those applications, as they are tuned to make best use of their data
structures.
Use concurrency (multi-threading) if possible; this allows multiple backups to be interleaved to
the tape, thus reducing the effect of slow APIs and disk seeks for each one. Note that this can
have an impact on restore times as a particular file set is interleaved among other data.
File system
There is a significant difference between the raw read data rate of a disk and the file system
read rate. This is because traversing a file system requires multiple, random disk accesses
whereas a continuous read is limited only by the optimized data rates of the disk.
The difference between these two modes becomes more significant as file size reduces. For file
systems where the files are typically smaller than 64 KB, sequential backups (such as from raw
disk) could be considered to achieve the required data rates for high-speed tape drives.
File system fragmentation could be also an issue. It causes additional seeks and slower
throughput.
Disk
If the system performing the backup has a single hard disk drive, the major factor restricting
backup performance is most likely to be the maximum transfer rate of the single disk. In a typical
environment, maximum disk throughputs for a single spindle can be as low as 8 MB/s.
High capacity disk
A high capacity disk is still one spindle with its own physical limitations. Vendors tend to
advertize them as “best price per MB,” but a single spindle can cause serious problems in high
performance environments.
Two smaller spindles provide twice the performance of one large spindle. The backup
performance of large disks may be acceptable without any application load. But if an application
writes in parallel to that disk, the total disk performance can go below 5 MB/s and the hit ratio of
a disk array read cache below 60%.
Disk Array
Benchmarks have shown that theoretical disk array performance cannot be achieved with
standard backup tools. The problem lies in the concurrency of read processes, which cannot be
distributed equally among all I/O channels and disk drives. The disk array can be seen as a
bunch of disks, where the internal organization and configuration is hidden for the backup
software. High capacity disks can cause additional problems that intelligent disk array caches
cannot overcome. They are not able to provide reasonable throughput for backup and restore
tasks; the number of sequential reads and writes is too high.
The 50% backup performance rule became a standard for disk array sizing.