Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

header-navigation
keyboard_arrow_up
close
keyboard_arrow_left
High Availability User Guide
Table of Contents Expand all
list Table of Contents
keyboard_arrow_right

Configuring Routing Engine Redundancy

date_range 20-Dec-24

Follow the steps and examples below to configure routing engine redundancy.

Note:

To complete the tasks in the following sections, re0 and re1 configuration groups must be defined. For more information about configuration groups, see the Junos OS CLI User Guide.

Modifying the Default Routing Engine Primary Role

For routers with two Routing Engines, you can configure which Routing Engine is the primary and which is the backup. By default, the Routing Engine in slot 0 is the primary (re0) and the one in slot 1 is the backup (re1).

Note:

In systems with two Routing Engines, both Routing Engines cannot be configured to be primary at the same time. This configuration causes the commit check to fail.

To modify the default configuration, include the routing-engine statement at the [edit chassis redundancy] hierarchy level:

content_copy zoom_out_map
[edit chassis redundancy]
routing-engine slot-number (master | backup | disabled);

slot-number can be 0 or 1. To configure the Routing Engine to be the primary, specify the master option. To configure it to be the backup, specify the backup option. To disable a Routing Engine, specify the disabled option.

Note:

To switch between the primary and the backup Routing Engines, see Manually Switching Routing Engine Primary Role.

Configuring Automatic Failover to the Backup Routing Engine

The following sections describe how to configure automatic failover to the backup Routing Engine when certain failures occur on the primary Routing Engine.

Without Interruption to Packet Forwarding

For routers with two Routing Engines, you can configure graceful Routing Engine switchover (GRES). When graceful switchover is configured, socket reconnection occurs seamlessly without interruption to packet forwarding. For information about how to configure graceful Routing Engine switchover, see Configuring Graceful Routing Engine Switchover.

On Detection of a Hard Disk Error on the Primary Routing Engine

After you configure a backup Routing Engine, you can direct it to take primary role automatically if it detects a hard disk error from the primary Routing Engine. To enable this feature, include the on-disk-failure statement at the [edit chassis redundancy failover] hierarchy level.

content_copy zoom_out_map
[edit chassis redundancy failover]
on-disk-failure;
Note:

The on-disk-failure statement at the [edit chassis redundancy] hierarchy level is not supported on PTX platforms running Junos Evolved. These platforms default to a switchover when disk failure is detected.

On Detection of a Broken LCMD Connectivity Between the VM and RE

Set the following configuration that will result in an automatic RE switchover when the LCMD connectivity between VM and RE is broken. To enable this feature, include the on-loss-of-vm-host-connection statement at the [edit chassis redundancy failover] hierarchy level.

content_copy zoom_out_map
[edit chassis redundancy failover]
on-loss-of-vm-host-connection;

If the LCMD process is crashing on the primary, the system will switchover after one minute provided the backup RE LCMD connection is stable. The system will not switchover under the following conditions: if the backup RE LCMD connection is unstable or if the current primary just gained primary role. When the primary has just gained primary role, the switchover happens only after four minutes.

On Detection of a Loss of Keepalive Signal from the Primary Routing Engine

After you configure a backup Routing Engine, you can direct it to take primary role automatically if it detects a loss of keepalive signal from the primary Routing Engine.

To enable failover on receiving a loss of keepalive signal, include the on-loss-of-keepalives statement at the [edit chassis redundancy failover] hierarchy level:

content_copy zoom_out_map
Note:

The on-loss-of-keepalives statement at the [edit chassis redundancy] hierarchy is not supported on PTX platforms running Junos Evolved. These platforms default to a switchover when keepalive messages are not detected.

When graceful Routing Engine switchover is not configured, by default, failover occurs after 300 seconds (5 minutes). You can configure a shorter or longer time interval.

Note:

The keepalive time period is reset to 360 seconds when the primary Routing Engine has been manually rebooted or halted.

To change the keepalive time period, include the keepalive-time statement at the [edit chassis redundancy] hierarchy level:

content_copy zoom_out_map
[edit chassis redundancy]
keepalive-time seconds;

The range for keepalive-time is 2 through 10,000 seconds.

The following example describes the sequence of events if you configure the backup Routing Engine to detect a loss of keepalive signal in the primary Routing Engine:

  1. Manually configure a keepalive-time of 25 seconds.

  2. After the Packet Forwarding Engine connection to the primary Routing Engine is lost and the keepalive timer expires, packet forwarding is interrupted.

  3. After 25 seconds of keepalive loss, a message is logged, and the backup Routing Engine attempts to take primary role. An alarm is generated when the backup Routing Engine becomes active, and the display is updated with the current status of the Routing Engine.

  4. After the backup Routing Engine takes primary role, it continues to function as primary.

Note:

When graceful Routing Engine switchover is configured, the keepalive signal is automatically enabled and the failover time is set to 2 seconds (4 seconds on M20 routers). You cannot manually reset the keepalive time.

Note:

When you halt or reboot the primary Routing Engine, Junos OS resets the keepalive time to 360 seconds, and the backup Routing Engine does not take over primary role until the 360-second keepalive time period expires.

A former primary Routing Engine becomes a backup Routing Engine if it returns to service after a failover to the backup Routing Engine. To restore primary status to the former primary Routing Engine, you can use the request chassis routing-engine master switch operational mode command.

If at any time one of the Routing Engines is not present, the remaining Routing Engine becomes primary automatically, regardless of how redundancy is configured.

On Detection of the em0 Interface Failure on the Primary Routing Engine

After you configure a backup Routing Engine, you instruct it to take primary role automatically if the em0 interface fails on the primary Routing Engine. To enable this feature, include the on-re-to-fpc-stale statement at the [edit chassis redundancy failover] hierarchy level.

content_copy zoom_out_map

When a Software Process Fails

To configure automatic switchover to the backup Routing Engine if a software process fails, include the failover other-routing-engine statement at the [edit system processes process-name] hierarchy level:

content_copy zoom_out_map
[edit system processes process-name]
failover other-routing-engine;

process-name is one of the valid process names. If this statement is configured for a process, and that process fails four times within 30 seconds, the router reboots from the other Routing Engine. Another statement available at the [edit system processes] hierarchy level is failover alternate-media. For information about the alternate media option, see the Junos OS Administration Library for Routing Devices.

Manually Switching Routing Engine Primary Role

To manually switch Routing Engine primary role, use one of the following commands:

  • On the backup Routing Engine, request that the backup Routing Engine take primary role by issuing the request chassis routing-engine master acquire command.

  • On the primary Routing Engine, request that the backup Routing Engine take primary role by using the request chassis routing-engine master release command.

  • On either Routing Engine, switch primary role by issuing the request chassis routing-engine master switch command.

Verifying Routing Engine Redundancy Status

A separate log file is provided for redundancy logging at /var/log/mastership. To view the log, use the file show /var/log/mastership command. Table 1 lists the primary role log event codes and descriptions.

Table 1: Routing Engine Primary Role Log

Event Code

Description

E_NULL = 0

The event is a null event.

E_CFG_M

The Routing Engine is configured as primary.

E_CFG_B

The Routing Engine is configured as backup.

E_CFG_D

The Routing Engine is configured as disabled.

E_MAXTRY

The maximum number of tries to acquire or release primary role was exceeded.

E_REQ_C

A claim primary role request was sent.

E_ACK_C

A claim primary role acknowledgement was received.

E_NAK_C

A claim primary role request was not acknowledged.

E_REQ_Y

Confirmation of primary role is requested.

E_ACK_Y

Primary Role is acknowledged.

E_NAK_Y

Primary Role is not acknowledged.

E_REQ_G

A release primary role request was sent by a Routing Engine.

E_ACK_G

The Routing Engine acknowledged release of primary role.

E_CMD_A

The command request chassis routing-engine master acquire was issued from the backup Routing Engine.

E_CMD_F

The command request chassis routing-engine master acquire force was issued from the backup Routing Engine.

E_CMD_R

The command request chassis routing-engine master release was issued from the primary Routing Engine.

E_CMD_S

The command request chassis routing-engine master switch was issued from a Routing Engine.

E_NO_ORE

No other Routing Engine is detected.

E_TMOUT

A request timed out.

E_NO_IPC

Routing Engine connection was lost.

E_ORE_M

Other Routing Engine state was changed to primary.

E_ORE_B

Other Routing Engine state was changed to backup.

E_ORE_D

Other Routing Engine state was changed to disabled.

Check Overall CPU and Memory Usage

Purpose

You can display exhaustive system process information about software processes that are running on the router and have controlling terminals. This command is equivalent to the UNIX top command. However, the UNIX top command shows real-time memory usage, with the memory values constantly changing, while the show system processes extensive command provides a snapshot of memory usage in a given moment.

Action

To check overall CPU and memory usage, enter the following Junos OS command-line interface (CLI) command:

content_copy zoom_out_map
user@host> show system processes extensive

Sample Output

user@R1> show system processes extensive
content_copy zoom_out_map
last pid:  5251;  load averages:  0.00,  0.00,  0.00  up 4+20:22:16    10:44:41
58 processes:  1 running, 57 sleeping
Mem: 57M Active, 54M Inact, 17M Wired, 184K Cache, 35M Buf, 118M Free
Swap:  512M Total, 512M Free
  PID USERNAME PRI NICE  SIZE    RES STATE    TIME   WCPU    CPU COMMAND
 4480 root       2   0  3728K  1908K select 231:17  2.34%  2.34% chassisd
 4500 root       2   0  1896K   952K select   0:36  0.00%  0.00% fud
 4505 root       2   0  1380K   736K select   0:35  0.00%  0.00% irsd
 4481 root       2   0  1864K   872K select   0:32  0.00%  0.00% alarmd
  4488 root       2   0  8464K  4600K kqread   0:28  0.00%  0.00% rpd
 4501 root       2 -15  1560K   968K select   0:21  0.00%  0.00% ppmd
 4510 root       2   0  1372K   812K select   0:13  0.00%  0.00% bfdd
    5 root      18   0     0K     0K syncer   0:09  0.00%  0.00% syncer
 4485 root       2   0  3056K  1776K select   0:07  0.00%  0.00% snmpd
 4499 root       2   0  3688K  1676K select   0:05  0.00%  0.00% kmd
 4486 root       2   0  3760K  1748K select   0:05  0.00%  0.00% mib2d
 4493 root       2   0  1872K   928K select   0:03  0.00%  0.00% pfed
 4507 root       2   0  1984K  1052K select   0:02  0.00%  0.00% fsad
 4518 root       2   0  3780K  2400K select   0:02  0.00%  0.00% dcd
    8 root     -18   0     0K     0K psleep   0:02  0.00%  0.00% vmuncachedaemo
    4 root     -18   0     0K     0K psleep   0:02  0.00%  0.00% bufdaemon
 4690 root       2   0     0K     0K peer_s   0:01  0.00%  0.00% peer proxy
 4504 root       2   0  1836K   968K select   0:01  0.00%  0.00% dfwd
 4477 root       2   0   992K   320K select   0:01  0.00%  0.00% watchdog
 4354 root       2   0  1116K   604K select   0:01  0.00%  0.00% syslogd
 4492 root      10   0  1004K   400K nanslp   0:01  0.00%  0.00% tnp.sntpd
 4446 root      10   0  1108K   616K nanslp   0:01  0.00%  0.00% cron
 4484 root       2   0 15716K  7468K select   0:01  0.00%  0.00% mgd
 4494 root       2  15  2936K  2036K select   0:01  0.00%  0.00% sampled
 5245 remote     2   0  8340K  3472K select   0:01  0.00%  0.00% cli
    2 root     -18   0     0K     0K psleep   0:00  0.00%  0.00% pagedaemon
 4512 root       2   0  2840K  1400K select   0:00  0.00%  0.00% l2tpd
    1 root      10   0   852K   580K wait     0:00  0.00%  0.00% init
 5244 root       2   0  1376K   784K select   0:00  0.00%  0.00% telnetd
 4509 root      10   0  1060K   528K nanslp   0:00  0.00%  0.00% eccd
 4508 root       2   0  2264K  1108K select   0:00  0.00%  0.00% spd
 2339 root      10   0   514M 17260K mfsidl   0:00  0.00%  0.00% newfs
 4497 root       2   0  2432K  1152K select   0:00  0.00%  0.00% cosd
 4490 root       2 -15  2356K  1020K select   0:00  0.00%  0.00% apsd
 4496 root       2   0  2428K  1108K select   0:00  0.00%  0.00% rmopd
 4491 root       2   0  2436K  1104K select   0:00  0.00%  0.00% vrrpd
 4487 root       2   0 15756K  7648K sbwait   0:00  0.00%  0.00% mgd
 5246 root       2   0 15776K  8336K select   0:00  0.00%  0.00% mgd
    0 root     -18   0     0K     0K sched    0:00  0.00%  0.00% swapper
  5251 root      30   0 21732K   840K RUN      0:00  0.00%  0.00% top
 4511 root       2   0  1964K   908K select   0:00  0.00%  0.00% pgmd
 4502 root       2   0  1960K   956K select   0:00  0.00%  0.00% lmpd
 4495 root       2   0  1884K   876K select   0:00  0.00%  0.00% ilmid
 4482 root       2   0  1772K   776K select   0:00  0.00%  0.00% craftd
 4503 root      10   0  1040K   492K nanslp   0:00  0.00%  0.00% smartd
    6 root      28   0     0K     0K sleep    0:00  0.00%  0.00% netdaemon
 4498 root       2   0  1736K   932K select   0:00  0.00%  0.00% nasd
 4506 root       2   0  1348K   672K select   0:00  0.00%  0.00% rtspd
 4489 root       2   0  1160K   668K select   0:00  0.00%  0.00% inetd
 4478 root       2   0  1108K   608K select   0:00  0.00%  0.00% tnetd
 4483 root       2   0  1296K   540K select   0:00  0.00%  0.00% ntpd
 4514 root       3   0  1080K   540K ttyin    0:00  0.00%  0.00% getty
 4331 root       2   0   416K   232K select   0:00  0.00%  0.00% pccardd
    7 root       2   0     0K     0K pfeacc   0:00  0.00%  0.00% if_pfe_listen
   11 root       2   0     0K     0K picacc   0:00  0.00%  0.00% if_pic_listen
    3 root      18   0     0K     0K psleep   0:00  0.00%  0.00% vmdaemon
    9 root       2   0     0K     0K scs_ho   0:00  0.00%  0.00% scs_housekeepi
   10 root       2   0     0K     0K cb-pol   0:00  0.00%  0.00% cb_poll

Meaning

The sample output shows the amount of virtual memory used by the Routing Engine and software processes. For example, 118 MB of physical memory is free and 512 MB of the swap file is free, indicating that the router is not short of memory. The processes field shows that most of the 58 processes are in the sleeping state, with 1 in the running state. The process or command that is running is the top command.

The commands column lists the processes that are currently running. For example, the chassis process (chassisd) has a process identifier (PID) of 4480, with a current priority (PRI) of 2. A lower priority number indicates a higher priority.

The processes are listed according to level of activity, with the most active process at the top of the output. For example, the chassis (chassisd) process is consuming the largest amount of CPU resource at 2.34 percent.

The memory field (Mem) shows the virtual memory managed by the Routing Engine and used by processes. The value in the memory field is in KB and MB, and is broken down as follows:

  • Active—Memory that is allocated and actually in use by programs.

  • Inact—Memory that is either allocated but not recently used or memory that was freed by programs. Inactive memory is still mapped in the address space of one or more processes and, therefore, counts toward the resident set size of those processes.

  • Wired—Memory that is not eligible to be swapped, and is usually used for Routing Engine memory structures or memory physically locked by a process.

  • Cache—Memory that is not associated with any program and does not need to be swapped before being reused.

  • Buf—The size of the memory buffer used to hold data recently called from disk.

  • Free—Memory that is not associated with any programs. Memory freed by a process can become Inactive, Cache, or Free, depending on the method used by the process to free the memory.

When the system is under memory pressure, the pageout process reuses memory from the free, cache, inactive and, if necessary, active pages.

The Swap field shows the total swap space available and how much is unused. In the example, the output shows 512 MB of total swap space and 512 MB of free swap space.

Finally, the memory usage of each process is listed. The SIZE field indicates the size of the virtual address space, and the RES field indicates the amount of the program in physical memory, which is also known as RSS or Resident Set Size. In the sample output, the chassis (chassisd) process has 3728 KB of virtual address space and 1908 KB of physical memory.

Initial Routing Engine Configuration Example

You can use configuration groups to ensure that the correct IP addresses are used for each Routing Engine and to maintain a single configuration file for both Routing Engines.

The following example defines configuration groups re0 and re1 with separate IP addresses. These well-known configuration group names take effect only on the appropriate Routing Engine.

content_copy zoom_out_map
groups {
    re0 {
        system {
            host-name my-re0;
        }
        interfaces {
            fxp0 {
                description "10/100 Management interface";
                unit 0 {
                    family inet {
                        address 10.255.2.40/24;
                    }
                }
            }
        }
    }
    re1 {
        system {
            host-name my-re1;
        }
        interfaces {
            fxp0 {
                description "10/100 Management interface";
                unit 0 {
                    family inet {
                        address 10.255.2.41/24;
                    }
                }
            }
        }
    }
}

You can assign an additional IP address to the management Ethernet interface (fxp0 in this example) on both Routing Engines. The assigned address uses the master-only keyword and is identical for both Routing Engines, ensuring that the IP address for the primary Routing Engine can be accessed at any time. The address is active only on the primary Routing Engine's management Ethernet interface. During a Routing Engine switchover, the address moves over to the new primary Routing Engine.

For example, on re0, the configuration is:

content_copy zoom_out_map
[edit groups re0 interfaces fxp0]

unit 0 {
    family inet {
        address 10.17.40.131/25 {
            master-only;
        }
        address 10.17.40.132/25;
    }
}

On re1, the configuration is:

content_copy zoom_out_map
[edit groups re1 interfaces fxp0]
unit 0 {
    family inet {
        address 10.17.40.131/25 {
            master-only;
        }
        address 10.17.40.133/25;
    }
}

For more information about the initial configuration of dual Routing Engines, see the Junos OS Software Installation and Upgrade Guide. For more information about assigning an additional IP address to the management Ethernet interface with the master-only keyword on both Routing Engines, see the Junos OS CLI User Guide.

Copying a Configuration File from One Routing Engine to the Other

You can use either the console port or the management Ethernet port to establish connectivity between the two Routing Engines. You can then copy or use FTP to transfer the configuration from the primary to the backup, and load the file and commit it in the normal way.

To connect to the other Routing Engine using the management Ethernet port, issue the following command:

content_copy zoom_out_map
user@host> request routing-engine login (other-routing-engine | re0 | re1)

On a TX Matrix router, to make connections to the other Routing Engine using the management Ethernet port, issue the following command:

content_copy zoom_out_map
user@host> request routing-engine login (backup | lcc number | master | other-routing-engine | re0 | re1)

For more information about the request routing-engine login command, see the CLI Explorer.

To copy a configuration file from one Routing Engine to the other, issue the file copy command:

content_copy zoom_out_map
user@host> file copy source destination

In this case, source is the name of the configuration file. These files are stored in the directory /config. The active configuration is /config/juniper.conf, and older configurations are in /config/juniper.conf {1...9}. The destination is a file on the other Routing Engine.

The following example copies a configuration file from Routing Engine 0 to Routing Engine 1:

content_copy zoom_out_map
user@host> file copy /config/juniper.conf re1:/var/tmp/copied-juniper.conf

The following example copies a configuration file from Routing Engine 0 to Routing Engine 1 on a TX Matrix router:

content_copy zoom_out_map
user@host> file copy /config/juniper.conf scc-re1:/var/tmp/copied-juniper.conf

To load the configuration file, enter the load replace command at the [edit] hierarchy level:

content_copy zoom_out_map
user@host> load replace /var/tmp/copied-juniper.conf
CAUTION:

Make sure you change any IP addresses specified in the management Ethernet interface configuration on Routing Engine 0 to addresses appropriate for Routing Engine 1.

Loading a Software Package from the Other Routing Engine

You can load a package from the other Routing Engine onto the local Routing Engine using the existing request system software add package-name command:

content_copy zoom_out_map
user@host> request system software add re(0|1):/filename

In the re portion of the URL, specify the number of the other Routing Engine. In the filename portion of the URL, specify the path to the package. Packages are typically in the directory /var/sw/pkg.

footer-navigation