Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Backup Liveness Detection on EVPN Dual Homing

By default, spine-and-leaf devices in an EVPN network implement the core isolation feature. When a device loses all of its EVPN BGP peering sessions it triggers the core isolation feature. Using the LACP, the core isolation feature shuts down all L2 ESI LAG interfaces.

In some situations, the core isolation feature produces a favorable outcome. In other situations, the core isolation feature produces an undesired outcome, which you can prevent by disabling it. However, in a dual-homed architecture, disabling the core isolation feature on one or both peers can cause other issues. You can address these issues by configuring node liveness detection along with the default core isolation feature.

EVPN Core Isolation Configurations

Figure 1 displays an EVPN-VXLAN topology with two spine devices with the default configuration for core isolation. The QFX5110 switches acting as leaf devices are multihomed in active/active mode through ESI-LAG interfaces to the spine devices.

Figure 1: EVPN-VXLAN Default Core IsolationEVPN-VXLAN Default Core Isolation

If the link between Spine 0 and Spine 1 goes down, the last established BGP peering session also goes down. The default core isolation feature triggers LACP to set the leaf-facing interfaces on Spines 0 and 1 to standby mode. This causes data traffic to and from both leaf devices to be dropped halting traffic within the data center, which is an undesired outcome.

Figure 2 displays an EVPN-VXLAN topology with two spine devices with no-core-isolation configured. The QFX5110 switches acting as leaf devices are multihomed in active/active mode through ESI-LAG interfaces to the spine devices.

Figure 2: EVPN-VXLAN No Core IsolationEVPN-VXLAN No Core Isolation

Disabling the core isolation feature could cause the following issues:

  • With no-core-isolation configured on both Spine devices the links to the leaf devices will remain up even when BGP is down. But L3 traffic might fail because ARP/ND resolution will fail. When Spine 0 generates an ARP/ND request, the ARP/ND reply might get load balanced to Spine 1. But since BGP is down, the control plane will not synchronize resulting in ARP resolution failures.

  • With no-core-isolation configured only on Spine 0, only Spine 1 would shut down its leaf facing interface when BGP is down. However, if Spine 0 goes down, Spine 1 still brings its leaf facing interfaces down stopping the data traffic to and from both leaf devices.

Figure 3 displays an EVPN-VXLAN topology with two spine devices with node-detection (backup liveness detection) configured. The QFX5110 switches acting as leaf devices are multihomed in active/active mode through ESI-LAG interfaces to the spine devices.

Figure 3: EVPN Node Detection (Backup Liveness Detection)EVPN Node Detection (Backup Liveness Detection)

The node liveness detection uses BFD configured between the EVPN peers with one side configured to take action based on the BFD and BGP status. This action ensures traffic forwarding continues through the network. The BFD liveness session can run over standard interfaces or over the management interfaces between the EVPN BGP peer devices. Node liveness needs to be detected even when the link between the Spines is down. So, node liveness detection should run over a separate network.

Note:

Junos OS Evolved platforms do not support BFD over management ports.

Benefits of Backup Liveness Detection

  • Can keep the leaf facing links up on one of the devices in a dual-homed environment during a core isolation event.

  • Ensures internal traffic continues even with a BGP failure on the Spine devices.

Behavior

The node liveness feature behaves as follows. When the BGP session is up, both peers keep their leaf facing ESI-LAG interfaces up. When the BGP session is down but the BFD session is up. The Spine with the action configured brings its leaf facing ESI-LAG interfaces down. The Spine which has no action configured will keep its ESI-LAG facing the leaf devices up. When the BGP session and the BFD session tracking the node liveness are down, it indicates one of the Spine devices has gone down. Then the Spine with the action configured will keep its leaf facing ESI-LAG interfaces up if it is active. When the BFD session comes back up, the Spine with action configured will again bring its leaf facing ESI-LAG interfaces down.

Table 1 summarizes the state on each of the Spine devices in the different scenarios. The actions configured will be either laser-off or trigger-node-isolation. With laser-off as the action the leaf facing ESI-LAG would receive a laser-off signal on core-isolation. If the action configured is trigger-node-isolation the l2 process will place the leaf facing ESI-LAG links in link-down state.

Table 1: Backup Liveness Behavior

BGP State

Node Liveness BFD State

On Spine with action Configured

On Spine with action Not Configured

Comments

Up -> Down

Down

ESI-LAG remains up

ESI-LAG remains up

For example, on Spine 0 both BGP and liveness are down Which means that Spine 1 is unavailable. Keep ESI LAG UP regardless of action configuration

Up -> Down

Up

Bring ESI-LAG down

ESI-LAG remains up

BGP state is down. But BFD is up. This indicates a split-brain scenario. Only one ESI LAG must be up. So shut down the ESI LAG on the device with the action configured.

Down -> Up

Down

If ESI LAG is down, bring it up.

ESI-LAG remains up

BGP status takes precedence

Down -> Up

Up

If ESI LAG is down, bring it up.

ESI-LAG remains up

BGP status takes precedence

Limitations

  • Junos OS Evolved platforms do not support BFD over management ports, due to a Junos OS Evolved limitation that prevents sending protocol packets over a management port.

  • EVPN node liveness detection does not support Multihop BFD. Configure node liveness detection over directly connected interfaces.

  • Configuring no-core-isolation overrides node-detection configurations.

Configure EVPN Backup Liveness Detection

Configure the following elements to enable the backup liveness detection feature for an EVPN peer:

1. Configure node-detection on both EVPN peers with the action configured only on one peer.

  • Peer 1 with action configured

  • Peer 2 without action configured

2. If the laser-off statement is used, then you must configure asychronous-notification for the members of the ESI LAG.

3. If the trigger-node-isolation statement is used, then you must configure network-isolation with the link-down action to bring the necessary interfaces down on core isolation.

Verifying Status

You can view the BFD from the peer devices by using the show bfd session command.

When the action set is laser-off, you can examine the system log messages for state information.