Uplink Failure Detection
Uplink failure detection detects the failure on uplink interfaces and advertises this information to the downlink interfaces so that the switch over of interfaces is possible to avoid loss of traffic. The topics below discuss the functions of uplink failure detections and the steps to configure and verify the working of it.
Overview of Uplink Failure Detection
Uplink failure detection allows a switch to detect link failure on uplink interfaces and to propagate this information to the downlink interfaces, so that servers connected to those downlinks can switch over to secondary interfaces.
Uplink failure detection supports network adapter teaming and provides network redundancy. In network adapter teaming, all of the network interface cards (NICs) on a server are configured in a primary or secondary relationship and share the same IP address. When the primary link goes down, the server transparently shifts the connection to the secondary link. With uplink failure detection, the switch monitors uplink interfaces for link failures. When it detects a failure, it disables the downlink interfaces. When the server detects disabled downlink interfaces, it switches over to the secondary link to help ensure that the traffic of the failed link is not dropped.
This topic describes:
Uplink Failure Detection Configuration
Uplink failure detection allows switches to monitor uplink interfaces to spot link failures. When a switch detects a link failure, it automatically disables the downlink interfaces bound to the uplink interface. A server that is connected to the disabled downlink interface triggers a network adapter failover to a secondary link to avoid any traffic loss.
Figure 1 illustrates a typical setup for uplink failure detection.
For uplink failure detection, you specify a group of uplink interfaces to be monitored and downlink interfaces to be brought down when an uplink fails. The downlink interfaces are bound to the uplink interfaces within the group. If all uplink interfaces in a group go down, then the switch brings down all downlink interfaces within that group. If any uplink interface returns to service, then the switch brings all downlink interfaces in that group back to service.
The switch can monitor both physical interface links and logical interface links for uplink failures, but you must put the two types of interfaces into separate groups.
For logical interfaces, the server must send keepalives between the switch and the server to detect failure of logical links.
Failure Detection Pair
Uplink failure detection requires that you create pairs of uplink and downlink interfaces in a group. Each pair includes one of each of the following:
A link-to-monitor interface—The link-to-monitor interfaces specify the uplinks the switch monitors. You can configure a maximum of 48 uplink interfaces as link-to-monitor interfaces for a group.
A link-to-disable interface—The link-to-disable interfaces specify the downlinks the switch disables when the switch detects an uplink failure. You can configure a maximum of 48 downlinks to disable in the group.
The link-to-disable interfaces are bound to the link-to-monitor interfaces within the group. When a link-to-monitor interface returns to service, the switch automatically enables all link-to-disable interfaces in the group.
Debounce Interval
The debounce interval is the amount of time, in seconds, that elapses before the downlink interfaces are brought up after corresponding state changes of the uplink interfaces. You can configure the debounce interval for the uplink failure detection group. In absence of the debounce interval configuration, the downlink interfaces are brought up immediately after a state change of the uplink interfaces, which might introduce unnecessary state changes of the downlink interfaces, as well as unnecessary failovers on the servers connected to these ports.
In the event that the uplink interface goes down during the debounce interval, the debounce timer will start when the uplink interface comes back up. If the uplink interface goes down before the debounce interval expires, the debounce timer restarts when the uplink interface comes back up.
Any change you make to the debounce interval takes effect immediately. If you make a change to the debounce interval while the debounce timer is in effect, the change will take place if the new expiry time is in the future. If not, the timer stops immediately.
If uplink failure detection restarts during the debounce interval, the debounce timer resets, and the time that elapsed before uplink failure detection restarted is lost. The link-to-disable interface comes up without waiting for the debounce interval to elapse.
If the link-to-disable interface does not come up after the debounce timer expires, there might be latency between the time the debounce timer expires and the time when the link-to-disable interface actually comes up.
Configuring Interfaces for Uplink Failure Detection
You can configure uplink failure detection to help ensure balanced traffic flow. Using this feature, switches can monitor and detect link failure on uplink interfaces and can propagate the failure information to downlink interfaces, so that servers connected to those downlinks can switch over to secondary interfaces.
Follow these configuration guidelines:
Configure an interface in only one group.
Configure a maximum of 48groups for each switch.
Configure a maximum of 48 uplinks to monitor and a maximum of 48 downlinks to disable in each group.
Configure physical links and logical links in separate groups.
To configure uplink failure detection on a switch:
After you have configured an uplink failure detection group, use the show uplink-failure-detection group (Uplink Failure Detection) group-name command to verify that all interfaces in the group are up. If the interfaces are down, uplink failure detection does not work.
Example: Configuring Interfaces for Uplink Failure Detection
Uplink failure detection allows a switch to detect link failure on uplink interfaces and to propagate the failure information to the downlink interfaces. All of the network interface cards (NICs) on a server are configured as being either the primary link or the secondary link and share the same IP address. When the primary link goes down, the server transparently shifts the connection to the secondary link to ensure that the traffic on the failed link is not dropped.
This example describes:
- Requirements
- Overview and Topology
- Configuring Uplink Failure Detection on Both Switches
- Verification
Requirements
This example uses the following software and hardware components:
Junos OS Release 19.2R1 or later for the QFX Series
Two QFX5100, QFX5110, QFX5120, QFX5200, or QFX5210 switches
Two aggregation switches
One dual-homed server
Overview and Topology
The topology in this example illustrates how to configure uplink failure detection on Switch 1 and Switch B. Switch 1 and Switch 2 are both configured with a link-to-monitor interface (the uplink interface to the aggregation switch) and a link-to-disable interface (the downlink interface to the server). For simplicity, only one group of link-to-monitor interfaces and link-to-disable interfaces is configured for each switch. The server is dual-homed to both Switch 1 and Switch 2. In this scenario, if the link-to-monitor interface to Switch 1 is disabled, the server uses the link-to-monitor interface to Switch 2 instead.
This example does not describe how to configure the dual-homed server or the aggregation switches. Please refer to the documentation for each of these devices for more information.
Figure 2 illustrates a typical setup for uplink failure detection.
Table 1 lists uplink failure settings for each QFX3500 switch.
Topology
Switch 1 | Switch 2 |
---|---|
|
|
Configuring Uplink Failure Detection on Both Switches
To configure uplink failure detection on both switches, perform these tasks.
Procedure
CLI Quick Configuration
To quickly configure uplink failure protection on Switch 1 and Switch 2, copy the following commands and paste them into the switch terminal window:
[edit protocols] set uplink-failure-detection group group1 set uplink-failure-detection group group2 set uplink-failure-detection group group1 link-to-monitor xe-0/0/0 set uplink-failure-detection group group1 debounce-interval 20 set uplink-failure-detection group group2 link-to-monitor xe-0/0/0 set uplink-failure-detection group group2 debounce-interval 20 set uplink-failure-detection group group1 link-to-disable xe-0/0/1 set uplink-failure-detection group group2 link-to-disable xe-0/0/1
Step-by-Step Procedure
To configure uplink failure protection on both switches:
Specify a name for the uplink failure detection group on Switch 1:
[edit protocols] user@switch# set uplink-failure-detection group group1
Add an uplink interface to the group on Switch 1:
[edit protocols] user@switch# set uplink-failure-detection group group1 link-to-monitor xe-0/0/0
Add a downlink interface to the group on Switch 1:
[edit protocols] user@switch# set uplink-failure-detection group group1 link-to-disable xe-0/0/1
Configure the debounce interval for group1 on Switch 1:
[edit protocols] user@switch# set uplink-failure-detection group group1 debounce-interval 20
Specify a name for the uplink failure detection group on Switch 2:
[edit protocols] user@switch# set uplink-failure-detection group group2
Add an uplink interface to the group on Switch 2:
[edit protocols] user@switch# set uplink-failure-detection group group2 link-to-monitor xe-0/0/0
Configure the debounce interval for group2 on Switch 1:
[edit protocols] user@switch# set uplink-failure-detection group group2 debounce-interval 20
Add a downlink interface to the group on Switch 2:
[edit protocols] user@switch# set uplink-failure-detection group group2 link-to-disable xe-0/0/1
Results
Display the results of the configuration:
uplink-failure-detection { group { group1 { debounce-interval 20; link-to-monitor { xe-0/0/0; } link-to-disable { xe-0/0/1; } } group2 { debounce-interval 20; link-to-monitor { xe-0/0/0; } link-to-disable { xe-0/0/1; } } } }
Verification
To verify that uplink failure detection is working correctly, perform the following tasks on Switch 1 and Switch 2:
Verifying That Uplink Failure Detection is Working Correctly
Purpose
Verify that the switch disables the downlink interface when it detects an uplink failure.
Action
View the current uplink failure detection status:
user@switch> show uplink-failure-detection Group : group1 Uplink : xe-0/0/0* Downlink : xe-0/0/1* Failure Action : Inactive Debounce Interval : 20
Note:The asterisk (*) indicates that the link is up.
Disable the uplink interface:
[edit] user@switch# set interface xe-0/0/0 disable
Save the configuration on the switch.
View the current uplink failure detection status:
user@switch> show uplink-failure-detection Group : group1 Uplink : xe-0/0/0 Downlink : xe-0/0/1 Failure Action : Active Debounce Interval : 20
Meaning
The output in Step 1 shows that the uplink interface is up, and hence that the downlink interface is also up, and that the status of Failure Action is Inactive.
The output in Step 4 shows that both the uplink and downlink interfaces are down (there are no asterisks after the interface name) and that the status of Failure Action is changed to Active. This output shows that uplink failure detection is working.
Verifying That Uplink Failure Detection Is Working Correctly
Purpose
Verify that the switch disables the downlink interface when it detects an uplink failure.
Action
View the current uplink failure detection status:
user@switch> show uplink-failure-detection Group : group1 Uplink : xe-0/0/0* Downlink : xe-0/0/1* Failure Action : Inactive Debounce Interval : 20
Note:The asterisk (*) indicates that the link is up.
Disable the uplink interface:
[edit] user@switch# set interface xe-0/0/0 disable
Save the configuration on the switch.
View the current uplink failure detection status:
user@switch> show uplink-failure-detection Group : group1 Uplink : xe-0/0/0 Downlink : xe-0/0/1 Failure Action : Active Debounce Interval : 20
Meaning
The output in Step 1 shows that the uplink interface is up, and hence that the downlink interface is also up, and that the status of Failure Action is Inactive.
The output in Step 4 shows that both the uplink and downlink interfaces are down (there are no asterisks after the interface name) and that the status of Failure Action is changed to Active. This output shows that uplink failure detection is working.