AlliedWareTM OS How To | Configure EPSR (Ethernet Protection Switching Ring) to Protect a Ring from Loops Introduction Putting a ring of Ethernet switches at the core of a network is a simple way to increase the network’s resilience—such a network is no longer susceptible to a single point of failure. However, the ring must be protected from Layer 2 loops. Traditionally, STP-based technologies are used to protect rings, but they are relatively slow to recover from link failure.
Which products and software versions does it apply to? • • • • "Example 3: EPSR and RSTP" on page 17 "Example 4: EPSR with Nested VLANs" on page 20 "Example 5: EPSR with management stacking" on page 23 "Example 6: EPSR with an iMAP" on page 26 Next, it discusses important implementation details in the following sections: • • • • "Classifiers and Hardware Filters" on page 29 "Ports and Recovery Times" on page 30 "IGMP Snooping and Recovery Times" on page 31 "Health Message Priority" on page 31 Finally,
How EPSR Works How EPSR Works EPSR operates on physical rings of switches (note, not on meshed networks). When all nodes and links in the ring are up, EPSR prevents a loop by blocking data transmission across one port. When a node or link fails, EPSR detects the failure rapidly and responds by unblocking the blocked port so that data can flow around the ring. In EPSR, each ring of switches forms an EPSR domain. One of the domain’s switches is the master node and the others are transit nodes.
How EPSR Works Establishing a Ring Once you have configured EPSR on the switches, the following steps complete the EPSR ring: 1. The master node creates an EPSR Health message and sends it out the primary port. This increments the master node’s Transmit: Health counter in the show epsr count command. 2. The first transit node receives the Health message on one of its two ring ports and, using a hardware filter, sends the message out its other ring port.
How EPSR Works Detecting a Fault EPSR uses a fault detection scheme that alerts the ring when a break occurs, instead of using a spanning treelike calculation to determine the best path. The ring then automatically heals itself by sending traffic over a protected reverse path.
How EPSR Works new configuration, the nodes (master and transit) re-learn their layer 2 addresses. During this period, the master node continues to send Health messages over the control VLAN. This situation continues until the faulty link or node is repaired. For a multidomain ring, this process occurs separately for each domain within the ring. The following figure shows the flow of control frames when a link breaks.
How EPSR Works Restoring Normal Operation Master Node Once the fault has been fixed, the master node’s Health messages traverse the whole ring and arrive at the master node’s secondary port. The master node then restores normal conditions by: 1. declaring the ring to be in a state of Complete 2. blocking its secondary port for data VLAN traffic (but not for the control VLAN) 3. flushing its forwarding database for its two ring ports 4.
How To Configure EPSR How To Configure EPSR This section first outlines, step-by-step, how to configure EPSR. Then it discusses changing the settings for the control VLAN, if you need to do this after initial configuration. Configuring EPSR 1. Connect your switches into a ring EPSR does not in itself limit the number of nodes that can exist on any given ring. Each switch can participate in up to 16 rings.
How To Configure EPSR iii. Remove the ring ports from the default VLAN If you leave all the ring ports in the default VLAN (vlan1), they will create a loop, unless vlan1 is part of the EPSR domain. To avoid loops, you need to do one of the following: • • • make vlan1 a data VLAN, or remove the ring ports from vlan1, or remove at least one of the ring ports from vlan1 on at least one of the switches.
How To Configure EPSR Modifying the Control VLAN You cannot modify the control VLAN while EPSR is enabled. If you try to remove or add ports to the control VLAN, the switch generates an error message as follows: Manager> delete vlan=1000 port=1 Error (3089409): VLAN 1000 is a control VLAN in EPSR and cannot be modified Disable the EPSR domain and then make the required changes. Note that disabling EPSR will create a loop, so is not recommended on a network with live data.
Example 1: A Basic Ring Example 1: A Basic Ring This example builds a simple 3-switch ring with one data VLAN, as shown in the following diagram. Control packets are transmitted around the ring on vlan1000 and data packets on vlan2. End User Ports port 1: primary P S port 2: secondary Master Node (A) port 1: ring End User Ports port 2: ring Transit Node (B) port 1: ring End User Ports port 2: ring Transit Node (C) epsr-example-basic-ring Configure the Master Node (A) 1.
Example 1: A Basic Ring 5. Remove the ring ports from the default VLAN delete vlan=1 port=1-2 6. Create the EPSR domain This step creates the domain, specifying that this switch is the master node. It also specifies which VLAN is the control VLAN and which port is the primary port. create epsr=test mode=master controlvlan=vlan1000 primaryport=1 7. Add the data VLAN to the domain add epsr=test datavlan=vlan2 8.
Example 1: A Basic Ring 6. Create the EPSR domain This step creates the domain, specifying that this switch is the transit node. It also specifies which VLAN is the control VLAN. create epsr=test mode=transit controlvlan=vlan1000 7. Add the data VLAN to the domain add epsr=test datavlan=vlan2 8.
Example 2: A Double Ring Example 2: A Double Ring This example adds to the previous ring by making two domains, as shown in the following diagram. Master Node (A) Master Node (C) port 4: primary port 1: primary port 2: secondary port 1 port 5: secondary port 4 Domain 1 Domain 2 control VLAN: 1000 data VLAN: 2 control VLAN: 40 data VLAN: 50 port 2 port 5 Transit port 1 port 4 Node (E) port 2 Transit Node (B) port 5 Transit Node (D) epsr-example-double-ring 1.
Example 2: A Double Ring 2. Configure the transit node (switch B) that belongs just to domain 1 This transit node is the same as in the previous example (except that the domain has been renamed). create vlan=vlan1000 vid=1000 add vlan=1000 port=1-2 frame=tagged create vlan=vlan2 vid=2 add vlan=2 port=1-2 frame=tagged delete vlan=1 port=1-2 create epsr=domain1 mode=transit controlvlan=vlan1000 add epsr=domain1 datavlan=vlan2 enable epsr=domain1 3.
Example 2: A Double Ring Configure EPSR: create epsr=domain2 mode=transit controlvlan=vlan40 add epsr=domain2 datavlan=vlan50 enable epsr=domain2 5. Configure the transit node (switch E) that belongs to both domains Two separate EPSR domains are configured on this switch.
Example 3: EPSR and RSTP Example 3: EPSR and RSTP This example uses EPSR to protect one ring and RSTP to protect another overlapping ring. Master Node (A) port 1: primary RSTP Switch (C) port 10 port 2: secondary port 11 port 1 port 10 Domain 1 RSTP: control VLAN: 1000 data VLAN: 2 STP VLAN: 10 port 2 port 11 Switch port 1 port 10 (E) port 2 port 11 Transit Node (B) RSTP Switch (D) epsr-example-rstp 1.
Example 3: EPSR and RSTP 2. Configure the transit node (switch B) that belongs just to the EPSR domain This transit node (B) is the same as in the previous example. create vlan=vlan1000 vid=1000 add vlan=1000 port=1-2 frame=tagged create vlan=vlan2 vid=2 add vlan=2 port=1-2 frame=tagged delete vlan=1 port=1-2 create epsr=domain1 mode=transit controlvlan=vlan1000 add epsr=domain1 datavlan=vlan2 enable epsr=domain1 3.
Example 3: EPSR and RSTP 4.
Example 4: EPSR with Nested VLANs Example 4: EPSR with Nested VLANs In this example: • • • • • client switches A and C are in the same end-user VLAN (vlan20) client switches B and D are in the same end-user VLAN (vlan200) traffic for vlan20 and vlan200 is nested inside vlan50 for transmission around the core vlan50 is the data VLAN for the EPSR domain vlan100 is the control VLAN for the EPSR domain Client Switch (E) Client Switch (H) port 20 port 10 port 22 port 22 port 2: secondary port 1: primary
Example 4: EPSR with Nested VLANs 1. Configure the master node (switch A) for the EPSR domain Configure the EPSR control VLAN: create vlan=vlan100 vid=100 add vlan=100 port=1-2 frame=tagged Configure vlan50. This VLAN acts as both the nested VLAN and the EPSR data VLAN.
Example 4: EPSR with Nested VLANs 3. Configure client switch E (connected to the master node) create vlan=vlan20 vid=20 add vlan=20 port=20 frame=tagged enable ip add ip interface=vlan20 ip=192.168.20.10 4. Configure client switch F (connected to transit node B) create vlan=vlan200 vid=200 add vlan=200 port=10 frame=tagged enable ip add ip interface=vlan200 ip=192.168.200.1 5.
Example 5: EPSR with management stacking Example 5: EPSR with management stacking In this example: • three switches are stacked together, so you can manage all three switches by entering commands into the CLI of any one of them • • the three switches are also configured as an EPSR domain • • the data VLAN for EPSR is vlan20 vlan1000 is used as the stacking VLAN and as the EPSR control VLAN.
Example 5: EPSR with management stacking 1. Configure stacking on the master node for the EPSR domain (host1) The following commands must be entered into the CLI of this particular switch. First, give the switch a host ID number so that the stack can identify it: set system hostid=1 serialnumber=12345678 set system name=host1 Create the stacking VLAN and add the ring ports to it. Note the port numbering notation— these are ports 1 and 2 on stacking host 1.
Example 5: EPSR with management stacking 4. Configure the other VLANs on the stacked switches The stack now exists, so you can configure all three switches from the CLI of the master node (or any other of the switches). However, the ports and IP addresses are different for each switch, so you need to make most of the commands host-directed. Create the EPSR data VLAN. This command will propagate to all three switches: create vlan=vlan20 vid=20 Assign ports and an IP address to the data VLAN on each switch.
Example 6: EPSR with an iMAP Example 6: EPSR with an iMAP This example is the same as "Example 1: A Basic Ring" on page 11 except that one of the three switches is an iMAP. We used an AT-TN7100 iMAP running 6.1.10. The ring ports on the iMAP are 5.0 and 5.1. The example first shows the configuration script for the iMAP as the master node, then as the transit node. For the configuration of the other two switches, see Example 1.
Example 6: EPSR with an iMAP Checking the Master Node Configuration To see a summary, use the command: show epsr The following diagram shows the expected output. --- EPSR Domain Information --------------------------------------------------EPSR Domain Node Type Domain Status/ Control Interface(s) (PhysicalState, State Vlan Type, State) --------------- --------- --------------- ------- ---------------------------test MASTER EN/COMPLETE 1000 5.0 (UP,DNSTRM,FWDING ), 5.
Example 6: EPSR with an iMAP Configure the AT-TN7100 iMAP as a Transit Node The following diagram shows a partial script for the iMAP, with the commands for configuring it as a transit node. CREATE EPSR=test TRANSIT # CREATE VLAN=vlan2 VID=2 FORWARDINGMODE=STD CREATE VLAN=vlan1000 VID=1000 FORWARDINGMODE=STD # DISABLE INTERFACE=0.0-0.15,1.0-1.15,2.0-2.15,4.0-4.1,5.0-5.1 # ADD VLAN=2 INTERFACE=ETH:[5.0-1] FRAME=TAGGED ADD VLAN=1000 INTERFACE=ETH:[5.0-1] FRAME=TAGGED # DELETE VLAN=1 INTERFACE=ETH:[5.
Classifiers and Hardware Filters To see details, use the command: show epsr=test The following diagram shows the expected output. --- EPSR Domain Information --------------------------------------------------EPSR Domain Name...................... EPSR Domain Node Type................. EPSR Domain State..................... MAC Address of Master Node............ EPSR Domain Status.................... Control Vlan.......................... Ring Interface # 1....................
Ports and Recovery Times Ports and Recovery Times In practice, recovery time in an EPSR ring is generally between 50 and 100ms. However, it depends on the port type, because this determines how long it takes for the port to report that it is down and send a Link-Down message.
IGMP Snooping and Recovery Times IGMP Snooping and Recovery Times Since Software Version 281-03, IGMP snooping includes query solicitation, a new feature that minimises loss of multicast data after a topology change. When IGMP snooping is enabled on a VLAN, and EPSR changes the underlying link layer topology of that VLAN, this can interrupt multicast data flow for a significant length of time. Query solicitation prevents this by monitoring the VLAN for any topology changes.
EPSR State and Settings EPSR State and Settings To display the EPSR state, the attached VLANs, the ring ports, and the timer values, use the command: show epsr Master Node in a Complete Ring The following diagram shows the output for a master node in a ring that is in a state of Complete. As well as giving the state as Complete, it also shows that port 1 is the primary port and port 2 is the secondary port. Note that the secondary port is blocked, so does not forward packets over the data VLAN (vlan2).
EPSR State and Settings Master Node in a Failed Ring In contrast, the following diagram shows the output for a master node in a ring that is in a Failed state. Both ring ports are now forwarding. EPSR Information -----------------------------------------------------------------------Name ........................ domain1 Mode .......................... Master Status ........................ Enabled State ......................... Failed Control Vlan .................. vlan1000 (1000) Data VLAN(s) .........
SNMP Traps SNMP Traps You can use SNMP traps to notify you when events occur in the EPSR ring. Download the latest version of the Allied Telesis Enterprise MIB from www.alliedtelesis.co.nz/support/updates/patches.html. The EPSR Group is contained in the sub-file called atr-epsr.mib. The EPSR Group has the object identifier prefix epsr ({ modules 136}), and contains a collection of objects and traps for monitoring EPSR states.
Counters Counters The EPSR counters record the number of EPSR messages that the CPU received and transmitted. To display the counters, use the command: show epsr=domain1 count Master node in a Complete ring The following diagram shows the counters for a master node in a ring that has never had a link or node fail.
Debugging Debugging This section walks you through the EPSR debugging output as links go down and come back up again. The debugging output comes from the ring in "Example 1: A Basic Ring" on page 11.
Debugging 2. The master node continues sending Health messages The master node continues sending Health messages, and increments the Hello Sequence number with each message. If all nodes and links in the ring are intact, these Health messages are the only debugging output you see. . . .
Debugging 4. The master node receives a Link-Down message on its secondary port The master node receives a Link-Down message on its secondary port (port 2) from transit node B, which is at the other end of the broken link.
Debugging 6. The Hello timer expires The Hello timer expires, which would normally trigger the master node to send a Health message out the primary port. However, the link between the primary port and the neighbouring transit node is down, so the master node does not send the Health message.
Debugging 9. The master node receives the Health message on its secondary port The master node receives the Health message on its secondary port (port 2). This tells it that all links on the ring are up again.
Debugging 12. The master node transmits and receives Health messages The master node continues transmitting and receiving Health messages for as long as the ring stays in a state of Complete.
Debugging Transit Node (Node B) Debug Output The following debugging shows the same events as the previous section, but on the transit node instead of the master node. It starts with the ring established and in a state of Complete. 1. The transit node receives Health messages The transit node receives Health messages on port 1, because that port is connected to the master node’s primary port.
Debugging 2. Port 1 on the transit node goes down The transit node detects that port 1 (between the transit node and the master node) has gone down. The transit node flushes its forwarding database, blocks port 1 for the data VLAN (to prevent a loop from forming when the master node comes back up), sends a LinkDown message towards the master node, sends a trap, and changes the EPSR state to LinkDown. This is the packet shown in step 4 on page 38 of the master node debug output.
Debugging 4. Port 1 comes back up The transit node detects that port 1 has come back up. It sends a trap and changes the EPSR state to Pre-forwarding. Note that it leaves port 1 blocked for vlan2, to make sure there are no loops. Manager 9924-B> Block EPSR:test port:1 VLAN:2 EPSR test, Port 1 port up EPSR INFO: Send trap EPSR:test oldState:LINK-DOWN newState:PRE-FORWARDING nodeType:TRANSIT EPSR test oldState:LINK-DOWN newState:PRE-FORWARDING 5.
Debugging 6. Transit node receives a Ring-Up-Flush-FDB message. The Health message from the previous step reaches the master node and shows it that all links in the ring are now up. The master node sends a Ring-Up-Flush-FDB message. When it receives the message, the transit node unblocks port 1 for vlan2, flushes its FDB, sends a trap, and changes the state to Link-Up. This is the packet shown in step 10 on page 40 of the master node debug output.
Debugging 7. The transit node receives Health messages The transit node continues receiving Health messages for as long as the ring stays in a state of Complete. This is the packet shown in step 12 on page 41 of the master node debug output.
Debugging Link Down Between Two Transit Nodes This section shows the debugging output when the link between transit node B and transit node C goes down and comes back up again. It shows the debugging output for the complete failure and recovery cycle: • • on the master node, and then on transit node B Master Node (Node A) Debug Output The following debugging output starts with the ring established and in a state of Complete. 1.
Debugging 2. The link between the two transit nodes goes down When the link goes down, the master node transmits a Health message but does not receive it on its secondary port.
Debugging 4. The master node transmits a Ring-Down-Flush-FDB message In response to the Link-Down message, the master node transmits a Ring-Down-Flush-FDB message out both its primary and secondary ports. The message has to go out both ports to make sure it reaches the nodes on both sides of the broken link. The master node also unblocks its secondary port for vlan2, flushes its forwarding database, sends a trap, and changes the EPSR state to Failed.
Debugging 6. The master node continues sending Health messages The master node continues sending Health messages out its primary port. It does not receive any of these at the secondary port, which tells it that the link is still down.
Debugging 8. The master node returns the ring to a state of Complete Now that the ring is back up, the master node blocks its secondary port for the data VLAN, transmits a Ring-Up-Flush-FDB message, flushes its FDB, sends a trap, and changes the EPSR state to Complete.
Debugging 10. The master node transmits and receives Health messages The master node continues transmitting and receiving Health messages for as long as the ring stays in a state of Complete.
Debugging Transit Node (Node B) Debug Output The following debugging shows the same events as the previous section, but on the transit node instead of the master node. It starts with the ring established and in a state of Complete. 1. The transit node receives Health messages The transit node receives Health messages on port 1, because that port is connected to the master node’s primary port. Note that the message shows that the ring state is Complete.
Debugging 3. The transit node receives a Ring-Down-Flush-FDB message In the meanwhile, the master node has received a Link-Down message from the switch at the other end of the broken link (in step 3 on page 48). Therefore, the master node realises that the ring is broken and acts accordingly. As part of the recovery process, the master node sends a Ring-Down-Flush-FDB message. The transit node receives this message and flushes its forwarding database.
Debugging 5. The transit node receives Health messages The transit node receives Health messages from the master node. These have a state of Failed, which shows that the ring is still broken. This is the packet shown in step 6 on page 50 of the master node debug output.
8. The transit node receives a Ring-Up-Flush-FDB message The transit node receives a Ring-Up-Flush-FDB message, which indicates that the master node knows that all links in the ring are up again. The transit node unblocks port 2 for vlan2, flushes its FDB, sends a trap, and changes state to Link-Up. This is the packet shown in step 8 on page 51 of the master node debug output.