ON THIS PAGE
Configure Graceful Restart and Long-Lived Graceful Restart
SUMMARY This topic describes Graceful Restart and Long-Lived Graceful Restart (LLGR) Juniper® Cloud-Native Contrail Networking (CN2) Release 23.2 and later in a Kubernetes-orchestrated environment.
Overview
In CN2, whenever a peer session is detected as down, the control node deletes all routes learned from the peer and immediately withdraws the routes from its advertised peers. This event causes instantaneous disruption of traffic among the worker nodes. To prevent this, you can configure graceful restart and long-lived graceful restart (LLGR). Enabling these features ensures that routes learnt are not immediately deleted and withdraw from the advertised peers. Instead, the routes are kept and marked as stale. Consequently, if sessions come back up and routes are relearned, the overall impact to the network is minimized.
Graceful restart allows a routing device undergoing a restart to inform its adjacent neighbors and peers of its condition. LLGR is a mechanism used to preserve routing details for a longer period of time in the event of a failed peer. LLGR retains stale routes for a much longer time than that allowed by graceful restart alone. With LLGR, route preference is retained and best paths are recomputed.
-
Graceful restart and LLGR support IPv4, IPv6, and EVPN Type 2 routes.
-
Neither graceful restart nor LLGR supports multicast traffic.
BGP Helper Mode
You can use BGP helper mode to minimize routing churn whenever a BGP session flaps. Or when the BGP peers restart, BGP helper mode can be used to minimize the impact to the network. This is especially helpful if the SDN gateway router goes down gracefully, as in an rpd crash or restart on a device. In that case, CN2 acts as a helper to the gateway, by retaining the routes learned from the gateway and advertises them to the rest of the network, as applicable. For this to work, the restarting router (the SDN gateway in this case) must support and be configured with graceful restart for all of the address families used.
BGP helper mode is also supported for BGP-as-a-Service (BGPaaS) clients. When configured, CN2 provides BGP helper mode to a restarting BGPaaS client.
XMPP Helper Mode
Contrail vRouter datapath agent supports route retention with its controller peer when LLGR with XMPP helper mode is enabled. This route retention allows the datapath agent to retain the last route path from the Contrail controller when an XMPP-based connection is lost. The route paths are held by the agent until a new XMPP-based connection is established to one of the Contrail controllers. Once the XMPP connection is up and is stable for a predefined duration, the route paths from the old XMPP connection are flushed. This route retention allows a controller to go down gracefully but with some forwarding interruption when connectivity to a controller is restored.
When enabling graceful restart and LLGR with XMPP helper mode, consider the following:
-
You can enable graceful restart and LLGR with XMPP helper mode without enabling BGP helper mode and vica-versa.
-
LLGR and XMPP subsecond timers for fast convergences should not be used simultaneously.
Configure Graceful Restart and LLGR
global-system-config
spec using the following
command: kubectl edit gsc default-global-system-config
Whenever you enable or disable gracefulRestartParameters
,
the GlobalSystemConfigSpec
gets updated and pushed down to
all control nodes. Subsequently the configuration gets pushed down to the
worker nodes as well. This results in session (BGP and XMPP) flap (session
restart) since the peers are exchanging the new configuration. Because of
this, you might observe a small drop in traffic while the peers restart.
The following is an example of how to enable graceful restart and LLGR and assign timer values.
Spec: gracefulRestartParameters: enable: true longLivedRestartTime: 1800 restartTime: 60 xmppHelperEnable: true bgpHelperEnable: true endOfRibTimeout: 90
In
the above example, enable
is set to true
to
enable graceful restart and LLGR. The bgpHelperEnable
and
xmppHelperEnable
modes are set to true
to
enable helper modes for BGP and XMPP peers.
See Table 1 for descriptions of the timers and their associated behaviors.
Timer | Description |
---|---|
Restart time |
The We recommend that you specify a nonzero value. A nonzero reset time advertise for graceful restart and long-lived graceful restart capabilities from peers.
|
LLGR restart time | The longLivedRestartTime indicates the
amount of time LLGR retains stale routes.The default is 1800
seconds.
When graceful restart and LLGR are both configured, the duration of the LLGR timer is the sum of both timers. |
End of RIB timer |
The The EOR timer starts when this End of Config message is received by the vRouter agent. When the EOR timer expires, an EOR message is sent from the vRouter agent to the control node. The control node receives this EOR message then removes the stale routes which were previously advertised by the vRouter agent from it’s RIB. |
Verify Your Configuration
To verify your configuration, run the following command:
kubectl get gsc default-global-system-config -o yaml
The following output is an example of the GlobalSysemConfig
.yaml
file with graceful restart and LLGR configured:
kubectl get gsc default-global-system-config -o yaml apiVersion: core.contrail.juniper.net/v4 kind: GlobalSystemConfig metadata: annotations: conversion.globalsystemconfig.gracefulrestartparameters: '{"BgpHelperEnable":true,"Enable":true,"EndOfRibTimeout":90,"LongLivedRestartTime":1800,"RestartTime":60,"XmppHelperEnable":true}' core.juniper.net/description: This cluster object defines all the global Contrail configurations. This object must be unique for a Contrail deployment. core.juniper.net/display-name: Default Global System Config creationTimestamp: "2023-05-02T19:01:23Z" generation: 25 labels: back-reference.core.juniper.net/28c27a9ee0c01b57b940db85e38aef6526e506189cfa69d69b019349: BGPRouter_contrail_tmallick-k8s-ctrl-2.englab.juniper.net back-reference.core.juniper.net/31b3717b8b83c3cedd435758e30860370ff90c20542563beb1fab793: BGPRouter_contrail_tmallick-k8s-ctrl-1.englab.juniper.net back-reference.core.juniper.net/d2e220fd972a5acfc95a0cccaf8d973e97427606046a10f0b04297aa: BGPRouter_contrail_tmallick-k8s-ctrl-3.englab.juniper.net name: default-global-system-config resourceVersion: "31673" uid: 3dfaefbc-a0e5-4e14-b6b3-73d03f09bf1b spec: autonomousSystem: 64512 bgpRouterReferences: - apiVersion: core.contrail.juniper.net/v2 fqName: - default-domain - contrail - ip-fabric - default - jdoe-k8s-ctrl-1.englab.juniper.net kind: BGPRouter name: jdoe-k8s-ctrl-1.englab.juniper.net namespace: contrail uid: 698aa03c-6c5c-4e7a-bb75-b89baef75ea6 - apiVersion: core.contrail.juniper.net/v2 fqName: - default-domain - contrail - ip-fabric - default - jdoe-k8s-ctrl-2.englab.juniper.net kind: BGPRouter name: jdoe-k8s-ctrl-2.englab.juniper.net namespace: contrail uid: 8afddddb-56ad-44fb-95a5-1cfe0fc10361 - apiVersion: core.contrail.juniper.net/v2 fqName: - default-domain - contrail - ip-fabric - default - jdoe-k8s-ctrl-3.englab.juniper.net kind: BGPRouter name: jdoe-k8s-ctrl-3.englab.juniper.net namespace: contrail uid: af2f0faa-31f6-4869-8f16-ad2ba4d4c013 enable4bytesAS: true fqName: - default-global-system-config gracefulRestartParameters: bgpHelperEnable: true enable: true endOfRibTimeout: 90 longLivedRestartTime: 1800 restartTime: 60 xmppHelperEnable: true ibgpAutoMesh: true status: observation: ""