Understanding Routing Engine Redundancy
Routing engine redundancy ensures the continued functionality of your network. If the primary Routing Engine is taken offline (either by failover or switchover), the standby Routing Engine takes over all routing functions.
Routing Engine Redundancy Overview
Redundant Routing Engines are two Routing Engines that are installed in the same routing platform. One functions as the primary, while the other stands by as a backup should the primary Routing Engine fail. On routing platforms with dual Routing Engines, network reconvergence takes place more quickly than on routing platforms with a single Routing Engine.
When a Routing Engine is configured as primary, it has full functionality. It receives and transmits routing information, builds and maintains routing tables, communicates with interfaces and Packet Forwarding Engine components, and has full control over the chassis. When a Routing Engine is configured to be the backup, it does not communicate with the Packet Forwarding Engine or chassis components.
On devices running Junos OS Release 8.4 or later, both Routing Engines cannot be configured to be primary at the same time. This configuration causes the commit check to fail.
A failover from the primary Routing Engine to the backup Routing Engine occurs
automatically when the primary Routing Engine experiences a hardware failure or when
you have configured the software to support a change in primary role based on
specific conditions. You can also manually switch Routing Engine primary role by
issuing one of the request chassis routing-engine
commands. In this
topic, the term failover refers to an automatic event, whereas
switchover refers to either an automatic or a manual event.
When a failover or a switchover occurs, the backup Routing Engine takes control of the system as the new primary Routing Engine.
-
If graceful Routing Engine switchover is not configured, when the backup Routing Engine becomes primary, it resets the switch plane and downloads its own version of the microkernel to the Packet Forwarding Engine components. Traffic is interrupted while the Packet Forwarding Engine is reinitialized. All kernel and forwarding processes are restarted.
-
If graceful Routing Engine switchover is configured, interface and kernel information is preserved. The switchover is faster because the Packet Forwarding Engines are not restarted. The new primary Routing Engine restarts the routing protocol process (rpd). All hardware and interfaces are acquired by a process that is similar to a warm restart.
-
If graceful Routing Engine switchover and nonstop active routing (NSR) are configured, traffic is not interrupted during the switchover. Interface, kernel, and routing protocol information is preserved.
-
If graceful Routing Engine switchover and graceful restart are configured, traffic is not interrupted during the switchover. Interface and kernel information is preserved. Graceful restart protocol extensions quickly collect and restore routing information from the neighboring routers.
Conditions That Trigger a Routing Engine Failover
The following events can result in an automatic change in Routing Engine primary role, depending on your configuration:
-
The routing platform experiences a hardware failure. A change in Routing Engine primary role occurs if either the Routing Engine or the associated host module or subsystem is abruptly powered off. You can also configure the backup Routing Engine to take primary role if it detects a hard disk error on the primary Routing Engine. To enable this feature, include the
failover on-disk-failure
statement at the[edit chassis redundancy]
hierarchy level. -
The routing platform experiences a software failure, such as a kernel crash or a CPU lock. You must configure the backup Routing Engine to take primary role when it detects a loss of keepalive signal. To enable this failover method, include the
failover on-loss-of-keepalives
statement at the[edit chassis redundancy]
hierarchy level. -
The routing platform experiences an em0 interface failure on the primary Routing Engine. You must configure the backup Routing Engine to take primary role when it detects the em0 interface failure. To enable this failover method, include the
on-re-to-fpc-stale
statement at the[edit chassis redundancy failover]
hierarchy level. -
A specific software process fails. You can configure the backup Routing Engine to take primary role when one or more specified processes fail at least four times within 30 seconds. Include the
failover other-routing-engine
statement at the[edit system processes process-name]
hierarchy level.
If any of these conditions is met, a message is logged and the backup Routing Engine attempts to take primary role. By default, an alarm is generated when the backup Routing Engine becomes active. After the backup Routing Engine takes primary role, it continues to function as primary even after the originally configured primary Routing Engine has successfully resumed operation. You must manually restore it to its previous backup status. (However, if at any time one of the Routing Engines is not present, the other Routing Engine becomes primary automatically, regardless of how redundancy is configured.)
Default Routing Engine Redundancy Behavior
By default, Junos OS uses re0 as the primary Routing Engine and re1 as the backup Routing Engine. Unless otherwise specified in the configuration, re0 always becomes primary when the acting primary Routing Engine is rebooted.
A single Routing Engine in the chassis always becomes the primary Routing Engine even if it was previously the backup Routing Engine.
Perform the following steps to see how the default Routing Engine redundancy setting works:
-
Ensure that re0 is the primary Routing Engine.
-
Manually switch the state of Routing Engine primary role by issuing the
request chassis routing-engine master switch
command from the primary Routing Engine. re0 is now the backup Routing Engine and re1 is the primary Routing Engine.Note:On the next reboot of the primary Routing Engine, Junos OS returns the router to the default state because you have not configured the Routing Engines to maintain this state after a reboot.
-
Reboot the primary Routing Engine re1.
The Routing Engine boots up and reads the configuration. Because you have not specified in the configuration which Routing Engine is the primary, re1 uses the default configuration as the backup. Now both re0 and re1 are in a backup state. Junos OS detects this conflict and, to prevent a no-primary state, reverts to the default configuration to direct re0 to become primary.
Situations That Require You to Halt Routing Engines
Before you shut the power off to a routing platform that has two Routing Engines or
before you remove the primary Routing Engine, you must first halt the backup Routing
Engine and then halt the primary Routing Engine. Otherwise, you might need to
reinstall Junos OS. You can use the request system halt
both-routing-engines
command on the primary Routing Engine, which first
shuts down the primary Routing Engine and then shuts down the backup Routing Engine.
To shut down only the backup Routing Engine, issue the request system
halt
command on the backup Routing Engine.
If you halt the primary Routing Engine and do not power it off or remove it, the backup Routing Engine remains inactive unless you have configured it to become the primary when it detects a loss of keepalive signal from the primary Routing Engine.
To restart the router, you must log in to the console port (rather than the Ethernet management port) of the Routing Engine. When you log in to the console port of the primary Routing Engine, the system automatically reboots. After you log in to the console port of the backup Routing Engine, press Enter to reboot it.
If you have upgraded the backup Routing Engine, first reboot it and then reboot the primary Routing Engine.