Deployment Guide

ManualsBrandsDell ManualsserversDell Brocade 5100

53-1003147-01

27 June 2014

Monitoring and Alerting

Policy Suite

Administrator's Guide

Supporting Fabric OS v7.3.0

Summary of content (114 pages)

PAGE 1
53-1003147-01 27 June 2014 Monitoring and Alerting Policy Suite Administrator's Guide Supporting Fabric OS v7.3.
PAGE 2
© 2014, Brocade Communications Systems, Inc. All Rights Reserved. Brocade, the B-wing symbol, Brocade Assurance, ADX, AnyIO, DCX, Fabric OS, FastIron, HyperEdge, ICX, MLX, MyBrocade, NetIron, OpenScript, VCS, VDX, and Vyatta are registered trademarks, and The Effortless Network and the On-Demand Data Center are trademarks of Brocade Communications Systems, Inc., in the United States and in other countries. Other brands and product names mentioned may be trademarks of others.
PAGE 3
Contents Preface..................................................................................................................................... 7 Document conventions......................................................................................7 Text formatting conventions.................................................................. 7 Command syntax conventions.............................................................. 7 Notes, cautions, and warnings....................................
PAGE 4
Switch Policy Status..........................................................................33 MAPS Groups, Policies, Rules, and Actions............................................................................35 MAPS groups overview.................................................................................35 Viewing group information ................................................................35 Predefined groups.............................................................................
PAGE 5
Additional MAPS Features.......................................................................................................85 Fabric performance monitoring using MAPS.................................................. 85 Enabling MAPS Fabric Performance Impact monitoring.....................86 Bottleneck detection with the MAPS dashboard ................................ 86 MAPS Fabric Performance Impact monitoring and legacy bottleneck monitoring...........................................................
PAGE 6
Monitoring and Alerting Policy Suite Administrator's Guide 53-1003147-01
PAGE 7
Preface ● Document conventions......................................................................................................7 ● Brocade resources............................................................................................................ 9 ● Contacting Brocade Technical Support.............................................................................9 ● Document feedback........................................................................................................
PAGE 8
Notes, cautions, and warnings Convention Description value In Fibre Channel products, a fixed value provided as input to a command option is printed in plain text, for example, --show WWN. [] Syntax components displayed within square brackets are optional. Default responses to system prompts are enclosed in square brackets. {x|y|z} A choice of required parameters is enclosed in curly brackets separated by vertical bars. You must select one of the options.
PAGE 9
Brocade resources Brocade resources Visit the Brocade website to locate related documentation for your product and additional Brocade resources. You can download additional publications supporting your product at www.brocade.com. Select the Brocade Products tab to locate your product, then click the Brocade product name or image to open the individual product page. The user manuals are available in the resources module at the bottom of the page under the Documentation category.
PAGE 10
Document feedback • Brocade Supplemental Support augments your existing OEM support contract, providing direct access to Brocade expertise. For more information, contact Brocade or your OEM. • For questions regarding service levels and response times, contact your OEM/Solution Provider. Document feedback To send feedback and report errors in the documentation you can use the feedback form posted with the document or you can e-mail the documentation team.
PAGE 11
About This Document ● Supported hardware and software.................................................................................. 11 ● What's new in this document...........................................................................................12 Supported hardware and software In those instances in which procedures or parts of procedures documented here apply to some switches but not to others, this list identifies exactly which switches are supported and which are not.
PAGE 12
What's new in this document TABLE 2 Brocade DCX Backbone family Gen 4 platform (8-Gpbs) Gen 5 platform (16-Gbps) Brocade DCX Brocade DCX 8510-4 Brocade DCX-4S Brocade DCX 8510-8 What's new in this document The following content is new or significantly revised for this release of this document: • • • • • • • • • • • • • 12 Added new options in mapsSam --show command to show specific detail Added new options in mapsRule and logicalGroup commands to force changes Added new options in mapsConfig comman
PAGE 13
Monitoring and Alerting Policy Suite Overview ● MAPS overview ..............................................................................................................13 ● MAPS license requirements............................................................................................14 ● MAPS configuration files................................................................................................. 14 ● MAPS interoperability with other features.............................................
PAGE 14
MAPS license requirements CAUTION MAPS activation is a non-reversible process. Downgrading the switch firmware to an earlier version of Fabric OS will enable Fabric Watch with its last configured settings. If you then reupgrade the switch firmware back to the later version (such as Fabric OS 7.3.0), Fabric Watch will continue to be enabled. MAPS automatically monitors the management port (Eth0 or Bond0), as the rule for Ethernet port monitoring is present in all three default policies.
PAGE 15
Restrictions on MAPS monitoring TABLE 3 Interactions between Fabric OS features and MAPS Feature MAPS interaction Virtual Fabrics When using Virtual Fabrics, different logical switches in a chassis can have different MAPS configurations. Fabric Watch MAPS cannot coexist with Fabric Watch. Refer to Fabric Watch to MAPS migration on page 16 for information on this migration.
PAGE 16
Firmware downgrade considerations Firmware downgrade considerations When downgrading from Fabric OS 7.3.0 to any previous version of the operating system, the following MAPS-related behaviors should be expected: • When an active Command Processor (CP) is running Fabric OS 7.3.0 or 7.2.0 with MAPS disabled, and the standby device has an earlier version of Fabric OS, High Availability will be synchronized, but MAPS will not be allowed to be enabled until the firmware on the standby device is upgraded.
PAGE 17
Differences between Fabric Watch and MAPS configurations Differences between Fabric Watch and MAPS configurations The monitoring and alerting configurations available in the MAPS are not as complex as those available in Fabric Watch; as a consequence MAPS lacks some of the functionality available in Fabric Watch. The following table shows the differences between Fabric Watch and MAPS configurations and functionality.
PAGE 18
Differences between Fabric Watch and MAPS configurations 18 Monitoring and Alerting Policy Suite Administrator's Guide 53-1003147-01
PAGE 19
MAPS Setup and Operation ● Initial MAPS setup...........................................................................................................19 ● Monitoring across different time windows....................................................................... 21 ● Setting the active MAPS policy....................................................................................... 22 ● Pausing MAPS monitoring............................................................................................
PAGE 20
Enabling MAPS without using Fabric Watch rules The following example enables MAPS, loads the policy “dflt_conservative_policy”, sets the actions to “none”, and then sets approved actions. switch:admin> mapsconfig --fwconvert switch:admin> mapsconfig --enablemaps -policy dflt_conservative_policy WARNING: This command enables MAPS and replaces all Fabric Watch configurations and monitoring. Once MAPS is enabled, the Fabric Watch configuration can't be converted to MAPS.
PAGE 21
Monitoring across different time windows The following example enables MAPS, loads the policy “fw_aggressive_policy”, sets the actions to “none”, and then sets approved actions. switch:admin> mapsconfig --enablemaps -policy fw_aggressive_policy WARNING: This command enables MAPS and replaces all Fabric Watch configurations and monitoring. Once MAPS is enabled, the Fabric Watch configuration can't be converted to MAPS.
PAGE 22
Setting the active MAPS policy Both of the following cases could indicate potential issues in the fabric. Configuring rules to monitor these conditions allows you to correct issues before they become critical. In the following example, the definition for crc_severe specifies that if the change in the CRC counter in the last minute is greater than 5, it must trigger an e-mail alert and SNMP trap. This rule monitors for the severe condition.
PAGE 23
Pausing MAPS monitoring The following example sets “dflt_moderate_policy” as the active MAPS policy. switch:admin> mapspolicy --enable -policy dflt_moderate_policy switch:admin> mapspolicy --show -summary Policy Name Number of Rules -----------------------------------------------------------dflt_aggressive_policy : 196 dflt_conservative_policy : 198 dflt_moderate_policy : 198 fw_default_policy : 109 fw_custom_policy : 109 fw_active_policy : 109 Active Policy is 'dflt_moderate_policy'.
PAGE 24
Resuming MAPS monitoring 24 Monitoring and Alerting Policy Suite Administrator's Guide 53-1003147-01
PAGE 25
MAPS Elements and Categories ● MAPS structural elements...............................................................................................25 ● MAPS monitoring categories ..........................................................................................25 MAPS structural elements The Monitoring and Alerting Policy Suite (MAPS) has the following structural elements: categories, groups, rules, and policies.
PAGE 26
Port Health • • • • • • Security Violations on page 28 Fabric State Changes on page 29 Switch Resource on page 30 Traffic Performance on page 31 FCIP Health on page 32 Fabric Performance Impact on page 32 In addition to being able to set alerts and other actions based on these categories, the MAPS dashboard displays their status. Refer to MAPS dashboard overview on page 75 for information on using the MAPS dashboard.
PAGE 27
Port health and CRC monitoring TABLE 6 Port Health category parameters (Continued) Monitored parameter Description Class 3 timeouts (C3TXTO) The number of Class 3 discard frames because of timeouts. State changes (STATE_CHG) The state of the port has changed for one of the following reasons: SFP current (CURRENT) The amperage supplied to the SFP transceiver in milliamps. Current area events indicate hardware failures. SFP receive power (RXP) The power of the incoming laser in microwatts (µW).
PAGE 28
Security Violations The following table below lists the monitored parameters in this category. Possible states for all FRU measures are faulty, inserted, on, off, ready, and up. TABLE 7 FRU Health category parameters Monitored parameter Description Power Supplies (PS_STATE) State of a power supply has changed. Fans (FAN_STATE) State of a fan has changed. Blades (BLADE_STATE) State of a slot has changed. SFPs (SFP_STATE) State of the SFP transceiver has changed.
PAGE 29
Fabric State Changes TABLE 8 Security Violations category parameters (Continued) Monitored parameter Description TS out of sync (SEC_TS) Time Server (TS) violations, which occur when an out-of-synchronization error has been detected. Fabric State Changes The Fabric State Changes category contains areas of potential inter-device problems, such as zone changes, fabric segmentation, E_Port down, fabric reconfiguration, domain ID changes, and fabric logins.
PAGE 30
Switch Resource TABLE 9 Fabric State Changes category parameters (Continued) Monitored parameter Description Percentage of devices in a Monitors the percentage of active devices in a Fibre Channel router-enabled FCR-enabled backbone fabric backbone fabric relative to the maximum number of devices permitted in the (LSAN_DEVCNT_PER) metaSAN. This percentage includes devices imported from any attached edge fabrics.
PAGE 31
Traffic Performance Traffic Performance The Traffic Performance category groups areas that track the source and destination of traffic. You can use traffic thresholds and alarms to determine traffic load and flow and to reallocate resources appropriately. The following table lists the monitored parameters in this category. TABLE 11 Traffic Performance category parameters Monitored parameter Description Receive bandwidth The percentage of port bandwidth being used by RX traffic.
PAGE 32
FCIP Health FCIP Health The FCIP Health category enables you to define rules for FCIP health, including circuit state changes, circuit state utilization, and packet loss. The following tables list the monitored parameters in this category. The first table lists those FCIP Health parameters monitored on all Brocade platforms.
PAGE 33
Switch Policy Status enough. To achieve this, MAPS monitors ports for the following states: IO_PERF_IMPACT and IO_FRAME_LOSS. The following table lists the monitored parameters in this category. TABLE 14 Fabric Performance Impact category parameters Monitored Parameter Description IO_PERF_IMPACT When a port does not quickly clear the frames sent through it, this can cause a backup in the fabric.
PAGE 34
MAPS Elements and Categories TABLE 15 Switch Policy Status category parameters (Continued) Monitored parameter Description WWN (WWN_DOWN) Faulty WWN card (applies to modular switches only). Core Blade (DOWN_CORE) Faulty core blades (applies to modular switches only). Faulty blades (FAULTY_BLADE) Faulty blades (applies to modular switches only). High Availability (HA_SYNC) Switch does not have a redundant CP (this applies to modular switches only).
PAGE 35
MAPS Groups, Policies, Rules, and Actions ● MAPS groups overview...................................................................................................35 ● MAPS policies overview..................................................................................................43 ● MAPS conditions.............................................................................................................49 ● MAPS rules overview.......................................................................
PAGE 36
Predefined groups logicalGroup --show fpm1 for the active Flow Vision flow “fpm1” that has been imported into, and being monitored through, MAPS.
PAGE 37
MAPS Groups, Policies, Rules, and Actions TABLE 16 Predefined MAPS groups (Continued) Predefined group name Object type Description ALL_TUNNELS Tunnel All FCIP tunnels in the switch. This is supported on the Brocade 7840 switch only. ALL_SFP SFP All small form-factor pluggable (SFP) transceivers. ALL_10GSWL_SFP SFP All 10-Gbps Short Wavelength (SWL) SFP transceivers on FC Ports in the logical switch.
PAGE 38
User-defined groups TABLE 16 Predefined MAPS groups (Continued) Predefined group name Object type Description ALL_TUNNEL_F_QOS QoS SWITCH Switch Default group used for defining rules on parameters that are global for the whole switch level, for example, security violations or fabric health. CHASSIS Chassis Default group used for defining rules on parameters that are global for the whole chassis, for example, CPU or flash.
PAGE 39
Modifying a static user-defined group add or remove a member from the group you would have to use the logicalGroup command and specify what you want to do (add or remove a member). To create a static group containing a specific set of ports, complete the following steps. 1. Connect to the switch and log in using an account with admin permissions. 2. Enter logicalGroup -create group_name -type port -members "member_list".
PAGE 40
MAPS Groups, Policies, Rules, and Actions As an example of a dynamic definition, you could specify a port name or an attached device node WWN and all ports which match the port name or device node WWN will be automatically included in this group. As soon as a port meets the criteria, it is automatically added to the group. As soon as it ceases to meet the criteria, it is removed from the group.
PAGE 41
Modifying a dynamic user-defined group Modifying a dynamic user-defined group MAPS allows you to change the definition pattern used to specify a dynamic user-defined group after you have created it. To modify a dynamic user-defined group after you have created it, complete the following steps. NOTE The values for group_name and feature_name must match existing values for the group and feature names. You can only specify one feature as part of a group definition. 1.
PAGE 42
Restoring group membership group and then deletes the group. If a logical group is present in user-defined rules, the -force option deletes all the rules that are configured with the given group and then deletes the group. The following example shows that the user-defined group GOBLIN_PORTS exists, deletes the group, and then shows that the group has been deleted.
PAGE 43
MAPS policies overview The following example restores all the deleted members and removes the added members of the GOBLIN_PORTS group. First it shows the detailed view of the modified GOBLIN_PORTS group, then restores the membership of the group and then it shows the post-restore group details. Notice the changes in the MemberCount, Members, Added Members, and Deleted Members fields between the two listings.
PAGE 44
Predefined policies defALL_D_PORTSLOSS_SYNC_3 defALL_D_PORTSCRC_H90 defALL_D_PORTSPE_H90 defALL_D_PORTSITW_H90 defALL_D_PORTSLF_H90 defALL_D_PORTSLOSS_SYNC_H90 defALL_D_PORTSCRC_D1500 defALL_D_PORTSPE_D1500 defALL_D_PORTSITW_D1500 defALL_D_PORTSLF_D1500 defALL_D_PORTSLOSS_SYNC_D1500 RASLOG,SNMP,EMAIL RASLOG,SNMP,EMAIL RASLOG,SNMP,EMAIL RASLOG,SNMP,EMAIL RASLOG,SNMP,EMAIL RASLOG,SNMP,EMAIL RASLOG,SNMP,EMAIL RASLOG,SNMP,EMAIL RASLOG,SNMP,EMAIL RASLOG,SNMP,EMAIL RASLOG,SNMP,EMAIL ALL_D_PORTS(LOSS_SYNC/MIN>3
PAGE 45
Fabric Watch legacy policies Fabric Watch legacy policies When you migrate from Fabric Watch to MAPS, the following three policies are automatically created if you have used mapsConfig --fwconvert. If you do not use this command, then these policies are not created. • fw_custom_policy This policy contains all of the monitoring rules based on the custom thresholds configured in Fabric Watch.
PAGE 46
MAPS Groups, Policies, Rules, and Actions The following example shows the result of using the --show -summary option. switch:admin> mapsPolicy --show -summary Policy Name Number of Rules -----------------------------------------------------------dflt_aggressive_policy : 196 dflt_conservative_policy : 198 dflt_moderate_policy : 198 fw_default_policy : 109 fw_custom_policy : 109 fw_active_policy : 109 Active Policy is 'dflt_moderate_policy'.
PAGE 47
Creating a policy The following example shows an excerpted result of using the --show -all option. The entire listing is too long (over 930 lines) to include.
PAGE 48
Enabling a policy • To create a rule, enter mapsRule --create rule_name -group group_name -monitor monitored_threshold -timebase timebase -op op_value -value value -action action -policy policy_name. • To clone an existing rule, enter mapsRule --clone rule_name -name clone_rule_name. • To modify existing rules, enter mapsRule --config rule_name parameters. The following example creates a policy by cloning another policy, and then adds a rule to the new policy.
PAGE 49
Modifying a default policy The following example adds a rule to the policy named daily_policy, displays the policy, and then re-enables the policy so the change can become active.
PAGE 50
Threshold values this threshold must be exceeded during the 60-second time base. If the counter reaches 11 within that 60 seconds, the rule would trigger. NOTE MAPS conditions are applied on a per-port basis, not switch- or fabric-wide. For example, 20 ports that each get 1 CRC counter would not trigger a “greater than 10” rule. Threshold values Thresholds are the values at which potential problems may occur. In configuring a rule you can specify a threshold value that, when exceeded, triggers an action.
PAGE 51
Enabling or disabling rule actions at a global level be issued but the port would not be fenced. To enable global actions, use the mapsConfig --actions commands. For more details, refer to Enabling or disabling rule actions at a global level on page 51. Refer to the Fabric OS Command Reference for further details on using the mapsConfig command.
PAGE 52
RASLog messages To disable all actions, enter mapsConfig --actions none. The keyword none cannot be combined with any other action. The following example shows that RASLog notification (raslog) is not an active action on the switch, and then adds it to the list of allowed actions. switch:admin> mapsconfig --show Configured Notifications: EMAIL,DECOM Mail Recipient: admin@mycompany.
PAGE 53
E-mail alert SNMP MIB support MAPS requires SNMP management information base (MIB) support on the device for management information collection. For additional information on SNMP MIB support, refer to the Fabric OS Administrator's Guide. E-mail alert An “e-mail alert” action sends information about the event to one or more specified e-mail addresses. The e-mail alert specifies the threshold and describes the event, much like an error message.
PAGE 54
Port decommissioning and firmware downgrades BNA can decommission F_Ports based on CRC, ITW, PE, LR, STATE_CHG, or C3TXTO criteria. MAPS notifications are integrated with BNA, which in turn must coordinate with the switch and the end device to orchestrate the port decommissioning. If BNA is not configured on a switch, MAPS will fence the F_Port. For more information on port fencing, port decommissioning, and related failure codes, refer to the Fabric OS Administrator's Guide.
PAGE 55
Enabling port fencing The following example enables port fencing and port decommissioning for a switch and then displays the confirmation. switch246:FID128:admin> mapsconfig --actions fence,decom switch246:admin> mapsconfig --show Configured Notifications: FENCE,DECOM Mail Recipient: Not Configured Paused members : =============== PORT : CIRCUIT : SFP : The following example makes port fencing and port decommissioning part of a rule and then displays the confirmation.
PAGE 56
Switch critical The following example makes port fencing part of a rule and then displays the confirmation.
PAGE 57
Creating a rule The following example shows all rules on the switch. Notice that the policies are not shown in the output.
PAGE 58
Modifying a MAPS policy rule Creating a rule for a flow To accommodate creating a rule for a flow, mapsRule accepts a flow name as a value for the -group parameter. The following example illustrates the structure. switch246:FID24:admin> mapsrule --create check_crc2 -monitor crc -group MyFlow -t min -op g -value 15 -action raslog -policy daily_policy2 Modifying a MAPS policy rule You can modify only user-defined MAPS policy rules. You cannot modify the default MAPS policy rules.
PAGE 59
Cloning a rule Changing multiple parameters The following example modifies the rule “check_crc2” to generate a RASLog message and an e-mail message if the CRC counter for a group of critical ports is greater than 15 in an hour (rather than 10 in a minute). This rule is part of the active policy, so the policy is re-enabled for the change to take effect.
PAGE 60
Rule deletion Cloning a rule and changing its values When you clone a rule, you can also specify the parameters you want to be different from the old rule in the new rule. To modify the rule, use the --config keyword. The following example clones “myOldRule” as “myNewRule” and changes the flow that is being monitored to “flow2” and assigns it the monitor “monitor2”. It then displays the rule.
PAGE 61
Sending alerts using e-mail The following example shows that the rule port_test_rule35 exists in test_policy_1, deletes the rule from that policy using the -force keyword, and then shows that the rule has been deleted from the policy.
PAGE 62
E-mail alert testing Specifying multiple e-mail addresses for alerts The following example specifies multiple e-mail addresses for e-mail alerts on the switch, and then displays the settings. It assumes that you have already correctly configured and validated the e-mail server. switch:admin> mapsconfig --emailcfg -address admin1@mycompany.com, admin2@mycompany.com switch:admin> mapsconfig --show Configured Notifications: RASLOG,EMAIL,FENCE,SW_CRITICAL Mail Recipient: admin1@mycompany.com, admin2@mycompany.
PAGE 63
Viewing configured e-mail server information 1. Connect to the switch and log in using an account with admin permissions. 2. Enter relayConfig --config -rla_ip relay IP address -rla_dname “relay domain name”. The quotation marks are required. There is no confirmation of this action. 3. Optional: Enter relayConfig --show. This displays the configured e-mail server host address and domain name. The following example configures the relay host address and relay domain name for the switch, and then displays it.
PAGE 64
MAPS Groups, Policies, Rules, and Actions The following example deletes the configured relay host address and relay domain name for the switch, and then shows that these items have been deleted. switch:admin> relayconfig --delete switch:admin> relayconfig --show Relay Host: Relay Domain Name: For additional information on the relay host and the relayConfig command, refer to the Fabric OS Command Reference.
PAGE 65
Port Monitoring Using MAPS ● Port monitoring and pausing........................................................................................... 65 ● Monitoring similar ports using the same rules.................................................................65 ● Port monitoring using port names................................................................................... 66 ● Port monitoring using device WWNs ..............................................................................
PAGE 66
Port monitoring using port names Port monitoring using port names Fabric OS allows you to monitor ports based on their assigned names. Because the port name is an editable attribute of a port, you can name ports based on the device to which they are connected. You can then group the ports based on their port names. For example, if ports 1 to 10 are connected to devices from the ABC organization, you can name these ports ABC_port1, ABC_port2, and so on through ABC_port10.
PAGE 67
Adding missing ports to a group 1. Connect to the switch and log in using an account with admin permissions. 2. Enter logicalGroup --addmember group_name -member member_list The element you want to add must be the same type as those already in the group (port, circuit, or SFP transceiver). You can specify either a single port, or specify multiple ports as either individual IDs separated by commas, or a range where the IDs are separated by a hyphen. 3.
PAGE 68
Removing ports from a group The following example walks through the steps above for the group ALL_HOST_PORTS, first showing that port 5 is not part of the group, then adding it to the group, then showing that it has been added to the group.
PAGE 69
Port Monitoring Using MAPS Rules based on the ALL_D_PORTS group are part of the default policies, and have error thresholds spanning multiple time windows or bases. If any of the rules are triggered, MAPS triggers the action configured for the rule, alerts the fabric service module if appropriate, and caches the data in the dashboard.
PAGE 70
Port Monitoring Using MAPS Using the mapsDb --show command shows any error or rule violation during diagnostics tests on a D_Port.
PAGE 71
Monitoring Flow Vision Flows with MAPS ● Viewing Flow Vision Flow Monitor data with MAPS........................................................ 71 ● Examples of using MAPS to monitor traffic performance................................................73 ● Examples of monitoring flows at the sub-flow level.........................................................
PAGE 72
Restrictions on Flow Vision flow monitoring ‐ ‐ Number of SCSI I/O bytes read as recorded for the flow. Number of SCSI I/O bytes written as recorded for the flow. For more information on Flow Vision, refer to the Flow Vision Administrator's Guide. Statistics produced by the FV Flow Monitor feature are displayed in the MAPS dashboard in the “Switch Health Report” section's Traffic Performance subsection. This data is not included in the History Data section of the MAPS dashboard.
PAGE 73
Sub-flow monitoring and MAPS RASLog message is generated. If you are certain that you want to import that flow and monitor it using the existing rules for that flow, you must use the -force keyword as part of the mapsConfig --import command. The following example demonstrates importing a flow named “myExFlow” using the -force keyword. switch:admin> mapsconfig --import myExFlow -force Sub-flow monitoring and MAPS MAPS supports monitoring both static and learned flows (flows created using an asterisk (*)).
PAGE 74
Examples of monitoring flows at the sub-flow level Monitoring frames for a specified set of criteria The following example watches for frames in a flow going through a port that contain SCSI ABORT sequence markers. switch246:admin> flow --create abtsflow -feature mon -ingrport 128 -frametype abts switch246:admin> mapsconfig --import abtsflow You can then define rules for this flow (group).
PAGE 75
MAPS Dashboard ● MAPS dashboard overview.............................................................................................75 ● MAPS dashboard sections..............................................................................................76 ● Viewing the MAPS dashboard........................................................................................
PAGE 76
MAPS dashboard sections MAPS dashboard sections The MAPS dashboard output is divided into three main sections: high-level dashboard information, general switch health information, and categorized switch health information. A history section is displayed if you enter mapsDb --show all. Dashboard high-level information section The dashboard high-level information section displays basic dashboard data: the time the dashboard was started, the name of the active policy, and any fenced ports.
PAGE 77
MAPS Dashboard • FCIP Health on page 32 • Fabric Performance Impact on page 32 The following output extract shows a sample Summary Report section. 3.
PAGE 78
Notes on dashboard data information recorded since the previous midnight. The historical data log stores the last seven days on which errors were recorded (not the last seven calendar days, but the last seven days, irrespective of any interval between these days). If a day has no errors, that day is not included in the count or the results. Using this information, you can get an idea of the errors seen on the switch even though none of the rules might have been violated.
PAGE 79
Viewing a summary switch status report 1. Connect to the switch and log in using an account with admin permissions. 2. Enter mapsDb --show followed by the scope parameter: all, history, or details. Entering details allows you to specify either a specific day or a specific hour of the current day. The following example shows a typical result of entering mapsDb --show all.
PAGE 80
MAPS Dashboard 1. Connect to the switch and log in using an account with admin permissions. 2. Enter mapsDb --show with no other parameters to display the summary status. The following example displays the general status of the switch (CRITICAL) and lists the overall status of the monitoring categories for the current day (measured since midnight) and for the last seven days. If any of the categories are shown as being “Out of range”, the last five conditions that caused this status are listed.
PAGE 81
Viewing a detailed switch status report Sub-flow rule violation summaries In the MAPS dashboard you can view a summary of all sub-flows that have rule violations. When a rule is triggered, the corresponding RASLog rule trigger appears in the “Rules Affecting Health” sub-section of the dashboard as part of the Traffic Performance category. In this category, the five flows or sub-flows with the highest number of violations since the previous midnight are listed.
PAGE 82
MAPS Dashboard 1. Connect to the switch and log in using an account with admin permissions. 2. Enter mapsDb --show all to display the detailed status. The following example shows the detailed switch status. The status includes the summary switch status, plus port performance data for the current day (measured since midnight). If a monitoring rule is triggered, the corresponding RASLog message appears under the summary section of the dashboard.
PAGE 83
Viewing historical data Viewing historical data To view what has happened on a switch since the previous midnight, enter mapsDb --show history to view a summarized status history of the switch for this period. NOTE The output of this command differs depending on the platform on which you run it. On fixed-port switches, ports are shown in port index format; on chassis-based platforms, ports are shown in slot/port format. To view a summarized history of the switch status, complete the following steps. 1.
PAGE 84
MAPS Dashboard The following example displays historical port performance data for January 9, 2014 for a chassis-based platform. Because the health status of the current switch policy is CRITICAL, the sections “Contributing Factors” and “Rules Affecting Health” are displayed. If the current switch policy status was HEALTHY, neither of these sections would be displayed. The column headings in the example have been edited slightly so as to allow the example to display clearly.
PAGE 85
Additional MAPS Features ● Fabric performance monitoring using MAPS.................................................................. 85 ● Scalability limit monitoring...............................................................................................88 ● MAPS Service Availability Module.................................................................................. 92 ● Brocade 7840 FCIP monitoring using MAPS..................................................................
PAGE 86
Enabling MAPS Fabric Performance Impact monitoring NOTE No existing bottleneck daemon logic or behaviors have been removed from Fabric OS 7.3.0. Enabling MAPS Fabric Performance Impact monitoring NOTE If you want to use MAPS Fabric Performance Impact (FPI) monitoring, the legacy bottleneck monitoring feature cannot be enabled. Use the following steps to enable MAPS FPI monitoring. This is not necessary on new switches already running Fabric OS 7.3.
PAGE 87
Additional MAPS Features In the following extract, the last three lines list bottlenecks, with the final bottleneck caused by a timeout rather than a numeric value. Be aware that the column headings in the example have been edited slightly so as to allow the example to display clearly. 4.
PAGE 88
MAPS Fabric Performance Impact monitoring and legacy bottleneck monitoring MAPS Fabric Performance Impact monitoring and legacy bottleneck monitoring The following conditions apply to MAPS Fabric Performance Impact (FPI) monitoring and legacy bottleneck monitoring: • MAPS FPI monitoring and the legacy bottleneck monitoring feature are mutually exclusive.
PAGE 89
Layer 2 fabric device connection monitoring Fabric State Changes |Out of operating range Switch Resource |In operating range Traffic Performance |In operating range FCIP Health |Not applicable Fabric Performance Impact|In operating range |No Errors |Out of operating range |In operating range |Not applicable |In operating range | | | | | 3.
PAGE 90
Zone configuration size monitoring The following example shows a typical RASLog entry for exceeding the threshold for the number of Fibre Channel routers in the Backbone fabric: 2014/05/27-17:02:00, [MAPS-1003], 14816, SLOT 4 | FID 20, WARNING, switch_20, Switch, Condition=SWITCH(BB_FCR_CNT>12), Current Value:[BB_FCR_CNT,13], RuleName= defSWITCHBB_FCR_CNT_12, Dashboard Category=Fabric State Changes.
PAGE 91
Default rules for scalability limit monitoring • The “LSAN-imported device” metric is only monitored in switches that are a part of a Backbone fabric. • Scalability limits that are determined internally by a device cannot be monitored by MAPS. Default rules for scalability limit monitoring The following table lists the scalability monitoring default rules in each of the default policies, and shows the actions and condition for each rule.
PAGE 92
MAPS Service Availability Module Rule for LSAN device counts In the following example, when the total device count in all switches that are part of the metaSAN (edge plus Backbone) fabric rises above 90 percent of the total permissible count in the fabric, MAPS reports the threshold violation using a RASLog message on that platform.
PAGE 93
Additional MAPS Features Using only “--show” In this form, the report lists the following information for each port: • Port Number • Port type ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ • • • • • D (disable port) DIA (D_Port) DP (persistently disabled port) E (E_Port) F (F_Port) G (G_Port) T (Trunk port) TF (F_Port trunk) U (U_Port) NOTE The MAPSSAM report does not include the health status of gigabyte Ethernet (GbE) ports. Total up time — Percentage of time the port was up.
PAGE 94
Brocade 7840 FCIP monitoring using MAPS Using “--show memory” The following example shows the output for mapsSam --show memory. switch:admin> mapssam --show memory Showing Memory Usage: Memory Usage : 22.0% Used Memory : 225301k Free Memory : 798795k Total Memory : 1024096k Using “--show flash” The following example shows the output for mapsSam --show flash.
PAGE 95
Additional MAPS Features On triggering the rules, the corresponding RASLogs will appear under the summary section of the dashboard. In the following example, there is one RASLog, triggered by the rule “low_tunnel_mon”. This rule has the format “-group ALL_TUNNEL_LOW_QOS -monitor PKTLOSS -timebase HOUR -op ge value 30 -action raslogs”. 3.
PAGE 96
Brocade 7840 FCIP monitoring using MAPS 96 Monitoring and Alerting Policy Suite Administrator's Guide 53-1003147-01
PAGE 97
MAPS Threshold Values ● Viewing monitoring thresholds........................................................................................ 97 ● Fabric monitoring thresholds...........................................................................................98 ● FCIP monitoring thresholds.............................................................................................99 ● FRU state thresholds.................................................................................................
PAGE 98
Fabric monitoring thresholds The following example shows all the thresholds for the ALL_D_PORTS group in the policy named “dflt_conservative_policy”.
PAGE 99
FCIP monitoring thresholds FCIP monitoring thresholds The following tables list the default monitoring thresholds for Fiber Channel over IP (FCIP) criteria used by MAPS. All actions are triggered when the reported value is greater than the threshold value.
PAGE 100
FRU state thresholds TABLE 24 Default FCIP monitoring thresholds for Brocade 7840 devices (Continued) MAPS thresholds and actions per policy Monitoring statistic Units Aggressive Moderate Conservative Actions Circuit: Packet loss percentage (CIR_PKTLOSS) Percentage per minute 0.01 0.05 0.
PAGE 101
MAPS Threshold Values TABLE 25 Default Port Health monitoring thresholds for D_Ports (Aggressive Policy) Monitoring statistic Unit Threshold Actions CRC Errors (defALL_D_PORTSCRC_1) Min 1 EMAIL, SNMP, RASLOG Protocol Errors (defALL_D_PORTSPE_1) Min 1 EMAIL, SNMP, RASLOG Invalid Transmit Words (defALL_D_PORTSITW_1) Min 1 EMAIL, SNMP, RASLOG Link Failure (defALL_D_PORTSLF_1) Min 1 EMAIL, SNMP, RASLOG Sync Loss (defALL_D_PORTSLOSS_SYNC_1) Min 1 EMAIL, SNMP, RASLOG CRC Errors (defALL_D_
PAGE 102
MAPS Threshold Values TABLE 26 Default Port Health monitoring thresholds for D_Ports (Moderate Policy) (Continued) Monitoring statistic Unit Threshold Actions CRC Errors (defALL_D_PORTSCRC_H60) Hour 60 EMAIL, SNMP, RASLOG Protocol Errors (defALL_D_PORTSPE_H60) Hour 60 EMAIL, SNMP, RASLOG Invalid Transmit Words (defALL_D_PORTSITW_H60) Hour 60 EMAIL, SNMP, RASLOG Link Failure (defALL_D_PORTSLF_H60) Hour 60 EMAIL, SNMP, RASLOG Sync Loss (defALL_D_PORTSLOSS_SYNC_H60) Hour 60 EMAIL, SNMP,
PAGE 103
MAPS Threshold Values TABLE 27 Default Port Health monitoring thresholds for D_Ports (Conservative Policy) (Continued) Monitoring statistic Unit Threshold Actions Sync Loss (defALL_D_PORTSLOSS_SYNC_H90) Hour 90 EMAIL, SNMP, RASLOG CRC Errors (defALL_D_PORTSCRC_D1500) Day 1500 EMAIL, SNMP, RASLOG Protocol Errors (defALL_D_PORTSPE_D1500) Day 1500 EMAIL, SNMP, RASLOG Invalid Transmit Words (defALL_D_PORTSITW_D1500) Day 1500 EMAIL, SNMP, RASLOG Link Failure (defALL_D_PORTSLF_D1500) Day 15
PAGE 104
MAPS Threshold Values TABLE 28 Default Port Health monitoring thresholds for E_Ports (Continued) MAPS E_Port high/low thresholds and actions per policy Monitoring statistic Aggressive Moderate Conservative Actions Loss of signal (LOSS_SIGNAL) 0 3 5 EMAIL, SNMP, RASLOG Link Failure (LF) 0 3 5 EMAIL, SNMP, RASLOG Sync Loss (LOSS_SYNC) 0 3 5 EMAIL, SNMP, RASLOG RXP percentage 60 75 90 EMAIL, SNMP, RASLOG TXP percentage 60 75 90 EMAIL, SNMP, RASLOG Utilization percentage 60 75
PAGE 105
MAPS Threshold Values TABLE 29 Default Port Health monitoring thresholds for Host F_Ports (Continued) Monitoring statistic State Change (STATE_CHG) MAPS Host F_Port high/low thresholds and actions per policy Aggressive Moderate Conservative Actions 2/4 5/10 11/20 Low threshold: EMAIL, SNMP, RASLOG High threshold: EMAIL, SNMP, FENCE, DECOM Protocol Errors (PE) 0/2 3/7 5/10 Low threshold: EMAIL, SNMP, RASLOG High threshold: EMAIL, SNMP, FENCE, DECOM Loss of signal (LOSS_SIGNAL) 0 3 5 EMAIL
PAGE 106
MAPS Threshold Values TABLE 30 Default Port Health monitoring thresholds for Target F_Ports (Continued) MAPS Target F_Port high/low thresholds and actions per policy Monitoring statistic Aggressive Moderate Conservative Actions Link Reset (LR) 0/2 3/5 6/10 Low threshold: EMAIL, SNMP, RASLOG High threshold: EMAIL, SNMP, FENCE, DECOM State Change (STATE_CHG) 0/2 Protocol Errors (PE) 0/2 3/7 8/15 Low threshold: EMAIL, SNMP, RASLOG High threshold: EMAIL, SNMP, FENCE, DECOM 3/4 5/6 Low thresh
PAGE 107
Resource monitoring thresholds TABLE 31 Default Port Health monitoring thresholds for non-F_Ports (Continued) MAPS non-F_Port high/low thresholds and actions per policy Monitoring statistic Aggressive Moderate Conservative Actions Invalid Transmit Words (ITW) 15/20 21/40 41/80 Low threshold: EMAIL, SNMP, RASLOG High threshold: EMAIL, SNMP, FENCE, DECOM Link Reset (LR) 2/4 5/10 11/20 Low threshold: EMAIL, SNMP, RASLOG High threshold: EMAIL, SNMP, FENCE, DECOM State Change (STATE_CHG) 2/4 5/
PAGE 108
Security monitoring thresholds TABLE 32 Default resource monitoring thresholds MAPS thresholds and actions per policy Monitoring statistic Aggressive Moderate Conservative Actions Flash (percentage used) 90 90 90 RASLOG, SNMP, EMAIL CPU (percentage used) 80 80 80 RASLOG, SNMP, EMAIL Memory (percentage used) 75 75 75 RASLOG, SNMP, EMAIL Management port (up or down) Up/Down Up/Down Up/Down RASLOG, SNMP, EMAIL Security monitoring thresholds The following table lists the default monito
PAGE 109
SFP monitoring thresholds SFP monitoring thresholds These are the default SFP monitoring thresholds used by the Monitoring and Alerting Policy Suite (MAPS). All SFP monitoring thresholds used by MAPS are triggered when the reported value exceeds the threshold value. For thresholds with both an upper value and a lower value, actions are triggered when the reported value exceeds the upper threshold value or drops below the lower threshold value.
PAGE 110
Fabric Performance Impact thresholds TABLE 35 Default SFP monitoring thresholds for QSFPs and all other SFPs (Continued) MAPS thresholds and actions (all policies) Monitoring statistic ALL_QSFP ALL_OTHER_SFP Actions Receive Power (RXP) (μW) 2180 5000 RASLOG, SNMP, EMAIL Transmit Power (TXP) (μW) - 5000 RASLOG, SNMP, EMAIL Voltage (VOLTAGE) (mV) 2940 to 3600 2960 to 3630 RASLOG, SNMP, EMAIL Temperature (TEMP) (°C) -5 to 85 -13 to 85 RASLOG, SNMP, EMAIL Fabric Performance Impact threshold
PAGE 111
MAPS Threshold Values TABLE 37 Default Switch Status Policy thresholds for the MAPS aggressive policy Monitoring statistic MAPS thresholds (Marginal/Critical) Actions Bad Power DCX, DCX+: -/3 SW_CRITICAL, SNMP, EMAIL DCX-4S, DCX-4S+: -/1 All other platforms: 1/2 Bad Temp 1/2 Low threshold: SW_MARGINAL, SNMP, EMAIL High threshold: SW_CRITICAL, SNMP, EMAIL Bad Fan 1/2 Low threshold: SW_MARGINAL, SNMP, EMAIL High threshold: SW_CRITICAL, SNMP, EMAIL Flash Usage 90 RASLOG, SNMP, EMAIL Marginal Po
PAGE 112
MAPS Threshold Values TABLE 38 Default Switch Status Policy thresholds for the MAPS moderate policy Monitoring statistic MAPS thresholds (Marginal/Critical) Actions Bad Power DCX, DCX+: -/3 SW_CRITICAL, SNMP, EMAIL DCX-4S, DCX-4S+: -/1 All other platforms: 1/2 Bad Temp 1/2 Low threshold: SW_MARGINAL, SNMP, EMAIL High threshold: SW_CRITICAL, SNMP, EMAIL Bad Fan 1/2 Low threshold: SW_MARGINAL, SNMP, EMAIL High threshold: SW_CRITICAL, SNMP, EMAIL Flash Usage 90 RASLOG, SNMP, EMAIL Marginal Port
PAGE 113
Traffic Performance thresholds TABLE 39 Default Switch Status Policy thresholds for the MAPS conservative policy Monitoring statistic MAPS thresholds (Marginal/Critical) Actions Bad Power DCX, DCX+: -/3 SW_CRITICAL, SNMP, EMAIL DCX-4S, DCX-4S+: -/1 All Other Platforms: 1/2 Bad Temp 1/2 Low threshold: SW_MARGINAL, SNMP, EMAIL High threshold: SW_CRITICAL, SNMP, EMAIL Bad Fan 1/2 Low threshold: SW_MARGINAL, SNMP, EMAIL High threshold: SW_CRITICAL, SNMP, EMAIL Flash Usage 90 RASLOG, SNMP, EMAIL
PAGE 114
MAPS Threshold Values Monitoring statistic 114 Threshold Actions Aggres Moder sive ate Conser vative Receive Bandwidth usage percentage (RX) 60 75 90 RASLOG, SNMP, EMAIL Transmit Bandwidth usage percentage (TX) 60 75 90 RASLOG, SNMP, EMAIL Trunk Utilization percentage (UTIL) 60 75 90 RASLOG, SNMP, EMAIL Monitoring and Alerting Policy Suite Administrator's Guide 53-1003147-01