Configuring Routing Engine Redundancy

Modifying the Default Routing Engine Primary Role

For routers with two Routing Engines, you can configure which Routing Engine is the primary and which is the backup. By default, the Routing Engine in slot 0 is the primary (re0) and the one in slot 1 is the backup (re1).

Note:

In systems with two Routing Engines, both Routing Engines cannot be configured to be primary at the same time. This configuration causes the commit check to fail.

To modify the default configuration, include the routing-engine statement at the [edit chassis redundancy] hierarchy level:

slot-number can be 0 or 1. To configure the Routing Engine to be the primary, specify the master option. To configure it to be the backup, specify the backup option. To disable a Routing Engine, specify the disabled option.

Note:

To switch between the primary and the backup Routing Engines, see Manually Switching Routing Engine Primary Role.

Configuring Automatic Failover to the Backup Routing Engine

The following sections describe how to configure automatic failover to the backup Routing Engine when certain failures occur on the primary Routing Engine.

Without Interruption to Packet Forwarding
On Detection of a Hard Disk Error on the Primary Routing Engine
On Detection of a Broken LCMD Connectivity Between the VM and RE
On Detection of a Loss of Keepalive Signal from the Primary Routing Engine
On Detection of the em0 Interface Failure on the Primary Routing Engine
When a Software Process Fails

Without Interruption to Packet Forwarding

For routers with two Routing Engines, you can configure graceful Routing Engine switchover (GRES). When graceful switchover is configured, socket reconnection occurs seamlessly without interruption to packet forwarding. For information about how to configure graceful Routing Engine switchover, see Configuring Graceful Routing Engine Switchover.

On Detection of a Hard Disk Error on the Primary Routing Engine

After you configure a backup Routing Engine, you can direct it to take primary role automatically if it detects a hard disk error from the primary Routing Engine. To enable this feature, include the on-disk-failure statement at the [edit chassis redundancy failover] hierarchy level.

Note:

The on-disk-failure statement at the [edit chassis redundancy] hierarchy level is not supported on PTX platforms running Junos Evolved. These platforms default to a switchover when disk failure is detected.

On Detection of a Broken LCMD Connectivity Between the VM and RE

Set the following configuration that will result in an automatic RE switchover when the LCMD connectivity between VM and RE is broken. To enable this feature, include the on-loss-of-vm-host-connection statement at the [edit chassis redundancy failover] hierarchy level.

If the LCMD process is crashing on the primary, the system will switchover after one minute provided the backup RE LCMD connection is stable. The system will not switchover under the following conditions: if the backup RE LCMD connection is unstable or if the current primary just gained primary role. When the primary has just gained primary role, the switchover happens only after four minutes.

On Detection of a Loss of Keepalive Signal from the Primary Routing Engine

After you configure a backup Routing Engine, you can direct it to take primary role automatically if it detects a loss of keepalive signal from the primary Routing Engine.

To enable failover on receiving a loss of keepalive signal, include the on-loss-of-keepalives statement at the [edit chassis redundancy failover] hierarchy level:

Note:

The on-loss-of-keepalives statement at the [edit chassis redundancy] hierarchy is not supported on PTX platforms running Junos Evolved. These platforms default to a switchover when keepalive messages are not detected.

When graceful Routing Engine switchover is not configured, by default, failover occurs after 300 seconds (5 minutes). You can configure a shorter or longer time interval.

Note:

The keepalive time period is reset to 360 seconds when the primary Routing Engine has been manually rebooted or halted.

To change the keepalive time period, include the keepalive-time statement at the [edit chassis redundancy] hierarchy level:

The range for keepalive-time is 2 through 10,000 seconds.

The following example describes the sequence of events if you configure the backup Routing Engine to detect a loss of keepalive signal in the primary Routing Engine:

Manually configure a keepalive-time of 25 seconds.
After the Packet Forwarding Engine connection to the primary Routing Engine is lost and the keepalive timer expires, packet forwarding is interrupted.
After 25 seconds of keepalive loss, a message is logged, and the backup Routing Engine attempts to take primary role. An alarm is generated when the backup Routing Engine becomes active, and the display is updated with the current status of the Routing Engine.
After the backup Routing Engine takes primary role, it continues to function as primary.

Note:

When graceful Routing Engine switchover is configured, the keepalive signal is automatically enabled and the failover time is set to 2 seconds (4 seconds on M20 routers). You cannot manually reset the keepalive time.

Note:

When you halt or reboot the primary Routing Engine, Junos OS resets the keepalive time to 360 seconds, and the backup Routing Engine does not take over primary role until the 360-second keepalive time period expires.

A former primary Routing Engine becomes a backup Routing Engine if it returns to service after a failover to the backup Routing Engine. To restore primary status to the former primary Routing Engine, you can use the request chassis routing-engine master switch operational mode command.

If at any time one of the Routing Engines is not present, the remaining Routing Engine becomes primary automatically, regardless of how redundancy is configured.

On Detection of the em0 Interface Failure on the Primary Routing Engine

After you configure a backup Routing Engine, you instruct it to take primary role automatically if the em0 interface fails on the primary Routing Engine. To enable this feature, include the on-re-to-fpc-stale statement at the [edit chassis redundancy failover] hierarchy level.

When a Software Process Fails

To configure automatic switchover to the backup Routing Engine if a software process fails, include the failover other-routing-engine statement at the [edit system processes process-name] hierarchy level:

process-name is one of the valid process names. If this statement is configured for a process, and that process fails four times within 30 seconds, the router reboots from the other Routing Engine. Another statement available at the [edit system processes] hierarchy level is failover alternate-media. For information about the alternate media option, see the Junos OS Administration Library for Routing Devices.

Manually Switching Routing Engine Primary Role

To manually switch Routing Engine primary role, use one of the following commands:

On the backup Routing Engine, request that the backup Routing Engine take primary role by issuing the request chassis routing-engine master acquire command.
On the primary Routing Engine, request that the backup Routing Engine take primary role by using the request chassis routing-engine master release command.
On either Routing Engine, switch primary role by issuing the request chassis routing-engine master switch command.

Verifying Routing Engine Redundancy Status

A separate log file is provided for redundancy logging at /var/log/mastership. To view the log, use the file show /var/log/mastership command. Table 1 lists the primary role log event codes and descriptions.

Table 1: Routing Engine Primary Role Log
Event Code	Description
E_NULL = 0	The event is a null event.
E_CFG_M	The Routing Engine is configured as primary.
E_CFG_B	The Routing Engine is configured as backup.
E_CFG_D	The Routing Engine is configured as disabled.
E_MAXTRY	The maximum number of tries to acquire or release primary role was exceeded.
E_REQ_C	A claim primary role request was sent.
E_ACK_C	A claim primary role acknowledgement was received.
E_NAK_C	A claim primary role request was not acknowledged.
E_REQ_Y	Confirmation of primary role is requested.
E_ACK_Y	Primary Role is acknowledged.
E_NAK_Y	Primary Role is not acknowledged.
E_REQ_G	A release primary role request was sent by a Routing Engine.
E_ACK_G	The Routing Engine acknowledged release of primary role.
E_CMD_A	The command request chassis routing-engine master acquire was issued from the backup Routing Engine.
E_CMD_F	The command request chassis routing-engine master acquire force was issued from the backup Routing Engine.
E_CMD_R	The command request chassis routing-engine master release was issued from the primary Routing Engine.
E_CMD_S	The command request chassis routing-engine master switch was issued from a Routing Engine.
E_NO_ORE	No other Routing Engine is detected.
E_TMOUT	A request timed out.
E_NO_IPC	Routing Engine connection was lost.
E_ORE_M	Other Routing Engine state was changed to primary.
E_ORE_B	Other Routing Engine state was changed to backup.
E_ORE_D	Other Routing Engine state was changed to disabled.

Purpose

You can display exhaustive system process information about software processes that are running on the router and have controlling terminals. This command is equivalent to the UNIX top command. However, the UNIX top command shows real-time memory usage, with the memory values constantly changing, while the show system processes extensive command provides a snapshot of memory usage in a given moment.

Action

To check overall CPU and memory usage, enter the following Junos OS command-line interface (CLI) command:

Sample Output

user@R1> show
     system processes extensive

Meaning

The sample output shows the amount of virtual memory used by the Routing Engine and software processes. For example, 118 MB of physical memory is free and 512 MB of the swap file is free, indicating that the router is not short of memory. The processes field shows that most of the 58 processes are in the sleeping state, with 1 in the running state. The process or command that is running is the top command.

The commands column lists the processes that are currently running. For example, the chassis process (chassisd) has a process identifier (PID) of 4480, with a current priority (PRI) of 2. A lower priority number indicates a higher priority.

The processes are listed according to level of activity, with the most active process at the top of the output. For example, the chassis (chassisd) process is consuming the largest amount of CPU resource at 2.34 percent.

The memory field (Mem) shows the virtual memory managed by the Routing Engine and used by processes. The value in the memory field is in KB and MB, and is broken down as follows:

Active—Memory that is allocated and actually in use by programs.
Inact—Memory that is either allocated but not recently used or memory that was freed by programs. Inactive memory is still mapped in the address space of one or more processes and, therefore, counts toward the resident set size of those processes.
Wired—Memory that is not eligible to be swapped, and is usually used for Routing Engine memory structures or memory physically locked by a process.
Cache—Memory that is not associated with any program and does not need to be swapped before being reused.
Buf—The size of the memory buffer used to hold data recently called from disk.
Free—Memory that is not associated with any programs. Memory freed by a process can become Inactive, Cache, or Free, depending on the method used by the process to free the memory.

When the system is under memory pressure, the pageout process reuses memory from the free, cache, inactive and, if necessary, active pages.

The Swap field shows the total swap space available and how much is unused. In the example, the output shows 512 MB of total swap space and 512 MB of free swap space.

Finally, the memory usage of each process is listed. The SIZE field indicates the size of the virtual address space, and the RES field indicates the amount of the program in physical memory, which is also known as RSS or Resident Set Size. In the sample output, the chassis (chassisd) process has 3728 KB of virtual address space and 1908 KB of physical memory.

Initial Routing Engine Configuration Example

You can use configuration groups to ensure that the correct IP addresses are used for each Routing Engine and to maintain a single configuration file for both Routing Engines.

The following example defines configuration groups re0 and re1 with separate IP addresses. These well-known configuration group names take effect only on the appropriate Routing Engine.

You can assign an additional IP address to the management Ethernet interface (fxp0 in this example) on both Routing Engines. The assigned address uses the master-only keyword and is identical for both Routing Engines, ensuring that the IP address for the primary Routing Engine can be accessed at any time. The address is active only on the primary Routing Engine's management Ethernet interface. During a Routing Engine switchover, the address moves over to the new primary Routing Engine.

For example, on re0, the configuration is:

On re1, the configuration is:

For more information about the initial configuration of dual Routing Engines, see the Junos OS Software Installation and Upgrade Guide. For more information about assigning an additional IP address to the management Ethernet interface with the master-only keyword on both Routing Engines, see the Junos OS CLI User Guide.

Copying a Configuration File from One Routing Engine to the Other

You can use either the console port or the management Ethernet port to establish connectivity between the two Routing Engines. You can then copy or use FTP to transfer the configuration from the primary to the backup, and load the file and commit it in the normal way.

To connect to the other Routing Engine using the management Ethernet port, issue the following command:

On a TX Matrix router, to make connections to the other Routing Engine using the management Ethernet port, issue the following command:

For more information about the request routing-engine login command, see the CLI Explorer.

To copy a configuration file from one Routing Engine to the other, issue the file copy command:

In this case, source is the name of the configuration file. These files are stored in the directory /config. The active configuration is /config/juniper.conf, and older configurations are in /config/juniper.conf {1...9}. The destination is a file on the other Routing Engine.

The following example copies a configuration file from Routing Engine 0 to Routing Engine 1:

The following example copies a configuration file from Routing Engine 0 to Routing Engine 1 on a TX Matrix router:

To load the configuration file, enter the load replace command at the [edit] hierarchy level:

CAUTION:

Make sure you change any IP addresses specified in the management Ethernet interface configuration on Routing Engine 0 to addresses appropriate for Routing Engine 1.

Loading a Software Package from the Other Routing Engine

You can load a package from the other Routing Engine onto the local Routing Engine using the existing request system software add package-name command:

In the re portion of the URL, specify the number of the other Routing Engine. In the filename portion of the URL, specify the path to the package. Packages are typically in the directory /var/sw/pkg.

ON THIS PAGE

Modifying the Default Routing Engine Primary Role

Configuring Automatic Failover to the Backup Routing Engine

Without Interruption to Packet Forwarding

On Detection of a Hard Disk Error on the Primary Routing Engine

On Detection of a Broken LCMD Connectivity Between the VM and RE

On Detection of a Loss of Keepalive Signal from the Primary Routing Engine

On Detection of the em0 Interface Failure on the Primary Routing Engine

When a Software Process Fails

Manually Switching Routing Engine Primary Role

Verifying Routing Engine Redundancy Status

Check Overall CPU and Memory Usage

Purpose

Action

Sample Output

Meaning

Initial Routing Engine Configuration Example

See Also

Copying a Configuration File from One Routing Engine to the Other

See Also

Loading a Software Package from the Other Routing Engine

See Also

Related Documentation