Configuring Routing Engine Redundancy
SUMMARY Follow the steps and examples below to configure routing engine redundancy.
To complete the tasks in the following sections, re0 and re1 configuration groups must be defined. For more information about configuration groups, see the Junos OS CLI User Guide.
Modifying the Default Routing Engine Primary Role
For routers with two Routing Engines, you can configure which Routing Engine is the primary and which is the backup. By default, the Routing Engine in slot 0 is the primary (re0) and the one in slot 1 is the backup (re1).
In systems with two Routing Engines, both Routing Engines cannot be configured to be primary at the same time. This configuration causes the commit check to fail.
To modify the default configuration, include the routing-engine
statement at the [edit chassis redundancy]
hierarchy level:
[edit chassis redundancy] routing-engine slot-number (master | backup | disabled);
slot-number can be 0 or 1. To configure the Routing Engine to be the primary, specify the master option. To configure it to be the backup, specify the backup option. To disable a Routing Engine, specify the disabled option.
To switch between the primary and the backup Routing Engines, see Manually Switching Routing Engine Primary Role.
Configuring Automatic Failover to the Backup Routing Engine
The following sections describe how to configure automatic failover to the backup Routing Engine when certain failures occur on the primary Routing Engine.
- Without Interruption to Packet Forwarding
- On Detection of a Hard Disk Error on the Primary Routing Engine
- On Detection of a Broken LCMD Connectivity Between the VM and RE
- On Detection of a Loss of Keepalive Signal from the Primary Routing Engine
- On Detection of the em0 Interface Failure on the Primary Routing Engine
- When a Software Process Fails
Without Interruption to Packet Forwarding
For routers with two Routing Engines, you can configure graceful Routing Engine switchover (GRES). When graceful switchover is configured, socket reconnection occurs seamlessly without interruption to packet forwarding. For information about how to configure graceful Routing Engine switchover, see Configuring Graceful Routing Engine Switchover.
On Detection of a Hard Disk Error on the Primary Routing Engine
After you configure a backup Routing Engine, you can direct it to take
primary role automatically if it detects a hard disk error from the primary
Routing Engine. To enable this feature, include the
on-disk-failure
statement at the [edit chassis
redundancy failover]
hierarchy level.
[edit chassis redundancy failover] on-disk-failure;
The on-disk-failure
statement at the [edit
chassis redundancy]
hierarchy level
is
not supported on PTX platforms running Junos Evolved. These platforms
default to a switchover when disk failure is detected.
On Detection of a Broken LCMD Connectivity Between the VM and RE
Set the following configuration that will result in an automatic RE
switchover when the LCMD connectivity between VM and RE is broken. To enable
this feature, include the on-loss-of-vm-host-connection
statement at the [edit chassis redundancy failover]
hierarchy level.
[edit chassis redundancy failover] on-loss-of-vm-host-connection;
If the LCMD process is crashing on the primary, the system will switchover after one minute provided the backup RE LCMD connection is stable. The system will not switchover under the following conditions: if the backup RE LCMD connection is unstable or if the current primary just gained primary role. When the primary has just gained primary role, the switchover happens only after four minutes.
On Detection of a Loss of Keepalive Signal from the Primary Routing Engine
After you configure a backup Routing Engine, you can direct it to take primary role automatically if it detects a loss of keepalive signal from the primary Routing Engine.
To enable failover on receiving a loss of keepalive signal, include the
on-loss-of-keepalives
statement at the [edit
chassis redundancy failover]
hierarchy level:
[edit chassis redundancy failover] on-loss-of-keepalives;
The on-loss-of-keepalives
statement at the [edit
chassis redundancy]
hierarchy is not supported on PTX
platforms running Junos Evolved. These platforms default to a
switchover when keepalive messages are not detected.
When graceful Routing Engine switchover is not configured, by default, failover occurs after 300 seconds (5 minutes). You can configure a shorter or longer time interval.
The keepalive time period is reset to 360 seconds when the primary Routing Engine has been manually rebooted or halted.
To change the keepalive time period, include the
keepalive-time
statement at the [edit chassis
redundancy]
hierarchy level:
[edit chassis redundancy] keepalive-time seconds;
The range for keepalive-time is 2 through 10,000 seconds.
The following example describes the sequence of events if you configure the backup Routing Engine to detect a loss of keepalive signal in the primary Routing Engine:
-
Manually configure a keepalive-time of 25 seconds.
-
After the Packet Forwarding Engine connection to the primary Routing Engine is lost and the keepalive timer expires, packet forwarding is interrupted.
-
After 25 seconds of keepalive loss, a message is logged, and the backup Routing Engine attempts to take primary role. An alarm is generated when the backup Routing Engine becomes active, and the display is updated with the current status of the Routing Engine.
-
After the backup Routing Engine takes primary role, it continues to function as primary.
When graceful Routing Engine switchover is configured, the keepalive signal is automatically enabled and the failover time is set to 2 seconds (4 seconds on M20 routers). You cannot manually reset the keepalive time.
When you halt or reboot the primary Routing Engine, Junos OS resets the keepalive time to 360 seconds, and the backup Routing Engine does not take over primary role until the 360-second keepalive time period expires.
A former primary Routing Engine becomes a backup Routing Engine if it returns to service after a failover to the backup Routing Engine. To restore primary status to the former primary Routing Engine, you can use the request chassis routing-engine master switch operational mode command.
If at any time one of the Routing Engines is not present, the remaining Routing Engine becomes primary automatically, regardless of how redundancy is configured.
On Detection of the em0 Interface Failure on the Primary Routing Engine
After you configure a backup Routing Engine, you instruct it to take primary
role automatically if the em0 interface fails on the primary Routing Engine.
To enable this feature, include the on-re-to-fpc-stale
statement at the [edit chassis redundancy failover]
hierarchy level.
[edit chassis redundancy failover] on-re-to-fpc-stale;
When a Software Process Fails
To configure automatic switchover to the backup Routing Engine if a software
process fails, include the failover other-routing-engine
statement at the [edit system processes
process-name]
hierarchy level:
[edit system processes process-name] failover other-routing-engine;
process-name is one of the valid
process names. If this statement is configured for a process, and that
process fails four times within 30 seconds, the router reboots from the
other Routing Engine. Another statement available at the [edit
system processes]
hierarchy level is failover
alternate-media. For information about the alternate media
option, see the Junos OS Administration Library for Routing Devices.
Manually Switching Routing Engine Primary Role
To manually switch Routing Engine primary role, use one of the following commands:
-
On the backup Routing Engine, request that the backup Routing Engine take primary role by issuing the
request chassis routing-engine master acquire
command. -
On the primary Routing Engine, request that the backup Routing Engine take primary role by using the
request chassis routing-engine master release
command. -
On either Routing Engine, switch primary role by issuing the
request chassis routing-engine master switch
command.
Verifying Routing Engine Redundancy Status
A separate log file is provided for redundancy logging at
/var/log/mastership. To view the log, use the
file show /var/log/mastership
command. Table 1
lists the primary role log event codes and descriptions.
Event Code |
Description |
---|---|
E_NULL = 0 |
The event is a null event. |
E_CFG_M |
The Routing Engine is configured as primary. |
E_CFG_B |
The Routing Engine is configured as backup. |
E_CFG_D |
The Routing Engine is configured as disabled. |
E_MAXTRY |
The maximum number of tries to acquire or release primary role was exceeded. |
E_REQ_C |
A claim primary role request was sent. |
E_ACK_C |
A claim primary role acknowledgement was received. |
E_NAK_C |
A claim primary role request was not acknowledged. |
E_REQ_Y |
Confirmation of primary role is requested. |
E_ACK_Y |
Primary Role is acknowledged. |
E_NAK_Y |
Primary Role is not acknowledged. |
E_REQ_G |
A release primary role request was sent by a Routing Engine. |
E_ACK_G |
The Routing Engine acknowledged release of primary role. |
E_CMD_A |
The command request chassis routing-engine master acquire was issued from the backup Routing Engine. |
E_CMD_F |
The command request chassis routing-engine master acquire force was issued from the backup Routing Engine. |
E_CMD_R |
The command request chassis routing-engine master release was issued from the primary Routing Engine. |
E_CMD_S |
The command request chassis routing-engine master switch was issued from a Routing Engine. |
E_NO_ORE |
No other Routing Engine is detected. |
E_TMOUT |
A request timed out. |
E_NO_IPC |
Routing Engine connection was lost. |
E_ORE_M |
Other Routing Engine state was changed to primary. |
E_ORE_B |
Other Routing Engine state was changed to backup. |
E_ORE_D |
Other Routing Engine state was changed to disabled. |
Initial Routing Engine Configuration Example
You can use configuration groups to ensure that the correct IP addresses are used for each Routing Engine and to maintain a single configuration file for both Routing Engines.
The following example defines configuration groups re0 and re1 with separate IP addresses. These well-known configuration group names take effect only on the appropriate Routing Engine.
groups { re0 { system { host-name my-re0; } interfaces { fxp0 { description "10/100 Management interface"; unit 0 { family inet { address 10.255.2.40/24; } } } } } re1 { system { host-name my-re1; } interfaces { fxp0 { description "10/100 Management interface"; unit 0 { family inet { address 10.255.2.41/24; } } } } } }
You can assign an additional IP address to the management Ethernet interface (fxp0 in this example) on both Routing Engines. The assigned address uses the master-only keyword and is identical for both Routing Engines, ensuring that the IP address for the primary Routing Engine can be accessed at any time. The address is active only on the primary Routing Engine's management Ethernet interface. During a Routing Engine switchover, the address moves over to the new primary Routing Engine.
For example, on re0, the configuration is:
[edit groups re0 interfaces fxp0] unit 0 { family inet { address 10.17.40.131/25 { master-only; } address 10.17.40.132/25; } }
On re1, the configuration is:
[edit groups re1 interfaces fxp0] unit 0 { family inet { address 10.17.40.131/25 { master-only; } address 10.17.40.133/25; } }
For more information about the initial configuration of dual Routing Engines, see the Junos OS Software Installation and Upgrade Guide. For more information about assigning an additional IP address to the management Ethernet interface with the master-only keyword on both Routing Engines, see the Junos OS CLI User Guide.
See Also
Copying a Configuration File from One Routing Engine to the Other
You can use either the console port or the management Ethernet port to establish connectivity between the two Routing Engines. You can then copy or use FTP to transfer the configuration from the primary to the backup, and load the file and commit it in the normal way.
To connect to the other Routing Engine using the management Ethernet port, issue the following command:
user@host> request routing-engine login (other-routing-engine | re0 | re1)
On a TX Matrix router, to make connections to the other Routing Engine using the management Ethernet port, issue the following command:
user@host> request routing-engine login (backup | lcc number | master | other-routing-engine | re0 | re1)
For more information about the request routing-engine login
command, see the CLI Explorer.
To copy a configuration file from one Routing Engine
to the other, issue the file copy
command:
user@host> file copy source destination
In this case, source is the name of the configuration file. These files are stored in the directory /config. The active configuration is /config/juniper.conf, and older configurations are in /config/juniper.conf {1...9}. The destination is a file on the other Routing Engine.
The following example copies a configuration file from Routing Engine 0 to Routing Engine 1:
user@host> file copy /config/juniper.conf re1:/var/tmp/copied-juniper.conf
The following example copies a configuration file from Routing Engine 0 to Routing Engine 1 on a TX Matrix router:
user@host> file copy /config/juniper.conf scc-re1:/var/tmp/copied-juniper.conf
To load the configuration file, enter the load replace
command at the [edit]
hierarchy level:
user@host> load replace /var/tmp/copied-juniper.conf
Make sure you change any IP addresses specified in the management Ethernet interface configuration on Routing Engine 0 to addresses appropriate for Routing Engine 1.
See Also
Loading a Software Package from the Other Routing Engine
You can load a package from the other Routing Engine
onto the local Routing Engine using the existing request system
software add package-name
command:
user@host> request system software add re(0|1):/filename
In the re portion of the URL, specify the number of the other Routing Engine. In the filename portion of the URL, specify the path to the package. Packages are typically in the directory /var/sw/pkg.