Administrator Guide

Table Of Contents
Damaged
Hardware
Typical Symptom Detected By Possible Corrective Actions
Set up email alerts on group and
SAN Headquarters
As best practice, use the SAN Headquarters GUI to help identify hardware-related issues. SAN Headquarters easily tracks the array
model, service tag, and serial number, plus RAID status and policy, and rmware version. In particular, SAN Headquarters provides
information about:
Hardware alerts
The SAN Headquarters Alerts panel shows hardware problems that might aect performance, such as a failed disk or a network
connection that is not Gigabit Ethernet.
Network retransmissions
A sustained high TCP retransmit rate (greater than 1 percent) might indicate a network hardware failure, insucient server
resources, or insucient network bandwidth.
RAID status
A degraded, reconstructing, or verifying RAID set might adversely aect performance. In some cases, performance might return
to normal when an operation completes.
Low pool capacity
Make sure free space in each pool does not fall below the following level (whichever is smaller):
5 percent of pool capacity
100GB times the number of pool members
Otherwise, load-balancing, member-removal, and replication operations do not perform optimally. Low free space also negatively
aects the performance of thin-provisioned volumes.
About Analyzing SAN
If you are sure that no hardware problems exist, it is best practice to use SAN Headquarters to review performance statistics to
identify other potential problems. These statistics provide a good indication of overall group performance and might help you identify
areas where performance can be optimized.
The following statistics provide common indicators of performance problems: I/O latency, I/O load, IOPS, I/O size, network load,
network rate, and queue depth.
Average I/O Latency
One of the leading indicators of a healthy SAN is latency. Latency is the time from the receipt of the I/O request to the time that the
I/O is returned to the server.
Latency must be considered along with the average I/O size, because large I/O operations take longer to process than small I/O
operations.
The following guidelines apply to I/O operations with an average size of 16KB or less:
Less than 20 ms — In general, average latencies of less than 20 ms are acceptable.
20 ms to 50 ms — Sustained average latencies between 20 ms and 50 ms should be monitored closely. You might want to
reduce the workload or add additional storage resources to handle the load.
51 ms to 80 ms — Sustained average latencies between 51 ms and 80 ms should be monitored closely. Applications might
experience problems and noticeable delays. You might want to reduce the workload or add additional storage resources to handle
the load.
Greater than 80 ms — An average latency of more than 80 ms indicates a problem, especially if this value is sustained over time.
Most enterprise applications will experience problems if latencies exceed 100 ms. You should reduce the workload or add
additional storage resources to handle the load.
If the average I/O operation size is greater than 16KB, these latency guidelines might not apply. If latency statistics indicate a
performance problem, examine the total IOPS in the pools. The storage array conguration (disk drives and RAID level) determines
334
About Monitoring