Configuring Routing Engine Redundancy
Follow the steps and examples below to configure routing engine redundancy.
To complete the tasks in the following sections, re0 and re1 configuration groups must be defined. For more information about configuration groups, see the Junos OS CLI User Guide.
Modifying the Default Routing Engine Primary Role
For routers with two Routing Engines, you can configure which Routing Engine is the primary and which is the backup. By default, the Routing Engine in slot 0 is the primary (re0) and the one in slot 1 is the backup (re1).
In systems with two Routing Engines, both Routing Engines cannot be configured to be primary at the same time. This configuration causes the commit check to fail.
To modify the default configuration, include the routing-engine
statement at the [edit chassis redundancy]
hierarchy level:
[edit chassis redundancy] routing-engine slot-number (master | backup | disabled);
slot-number can be 0 or 1. To configure the Routing Engine to be the primary, specify the master option. To configure it to be the backup, specify the backup option. To disable a Routing Engine, specify the disabled option.
To switch between the primary and the backup Routing Engines, see Manually Switching Routing Engine Primary Role.
Configuring Automatic Failover to the Backup Routing Engine
The following sections describe how to configure automatic failover to the backup Routing Engine when certain failures occur on the primary Routing Engine.
- Without Interruption to Packet Forwarding
- On Detection of a Hard Disk Error on the Primary Routing Engine
- On Detection of a Broken LCMD Connectivity Between the VM and RE
- On Detection of a Loss of Keepalive Signal from the Primary Routing Engine
- On Detection of the em0 Interface Failure on the Primary Routing Engine
- When a Software Process Fails
Without Interruption to Packet Forwarding
For routers with two Routing Engines, you can configure graceful Routing Engine switchover (GRES). When graceful switchover is configured, socket reconnection occurs seamlessly without interruption to packet forwarding. For information about how to configure graceful Routing Engine switchover, see Configuring Graceful Routing Engine Switchover.
On Detection of a Hard Disk Error on the Primary Routing Engine
After you configure a backup Routing Engine, you can direct it to take
primary role automatically if it detects a hard disk error from the primary
Routing Engine. To enable this feature, include the
on-disk-failure
statement at the [edit chassis
redundancy failover]
hierarchy level.
[edit chassis redundancy failover] on-disk-failure;
The on-disk-failure
statement at the [edit
chassis redundancy]
hierarchy level
is
not supported on PTX platforms running Junos Evolved. These platforms
default to a switchover when disk failure is detected.
On Detection of a Broken LCMD Connectivity Between the VM and RE
Set the following configuration that will result in an automatic RE
switchover when the LCMD connectivity between VM and RE is broken. To enable
this feature, include the on-loss-of-vm-host-connection
statement at the [edit chassis redundancy failover]
hierarchy level.
[edit chassis redundancy failover] on-loss-of-vm-host-connection;
If the LCMD process is crashing on the primary, the system will switchover after one minute provided the backup RE LCMD connection is stable. The system will not switchover under the following conditions: if the backup RE LCMD connection is unstable or if the current primary just gained primary role. When the primary has just gained primary role, the switchover happens only after four minutes.
On Detection of a Loss of Keepalive Signal from the Primary Routing Engine
After you configure a backup Routing Engine, you can direct it to take primary role automatically if it detects a loss of keepalive signal from the primary Routing Engine.
To enable failover on receiving a loss of keepalive signal, include the
on-loss-of-keepalives
statement at the [edit
chassis redundancy failover]
hierarchy level:
[edit chassis redundancy failover] on-loss-of-keepalives;
The on-loss-of-keepalives
statement at the [edit
chassis redundancy]
hierarchy is not supported on PTX
platforms running Junos Evolved. These platforms default to a switchover
when keepalive messages are not detected.
When graceful Routing Engine switchover is not configured, by default, failover occurs after 300 seconds (5 minutes). You can configure a shorter or longer time interval.
The keepalive time period is reset to 360 seconds when the primary Routing Engine has been manually rebooted or halted.
To change the keepalive time period, include the
keepalive-time
statement at the [edit chassis
redundancy]
hierarchy level:
[edit chassis redundancy] keepalive-time seconds;
The range for keepalive-time is 2 through 10,000 seconds.
The following example describes the sequence of events if you configure the backup Routing Engine to detect a loss of keepalive signal in the primary Routing Engine:
-
Manually configure a keepalive-time of 25 seconds.
-
After the Packet Forwarding Engine connection to the primary Routing Engine is lost and the keepalive timer expires, packet forwarding is interrupted.
-
After 25 seconds of keepalive loss, a message is logged, and the backup Routing Engine attempts to take primary role. An alarm is generated when the backup Routing Engine becomes active, and the display is updated with the current status of the Routing Engine.
-
After the backup Routing Engine takes primary role, it continues to function as primary.
When graceful Routing Engine switchover is configured, the keepalive signal is automatically enabled and the failover time is set to 2 seconds (4 seconds on M20 routers). You cannot manually reset the keepalive time.
When you halt or reboot the primary Routing Engine, Junos OS resets the keepalive time to 360 seconds, and the backup Routing Engine does not take over primary role until the 360-second keepalive time period expires.
A former primary Routing Engine becomes a backup Routing Engine if it returns to service after a failover to the backup Routing Engine. To restore primary status to the former primary Routing Engine, you can use the request chassis routing-engine master switch operational mode command.
If at any time one of the Routing Engines is not present, the remaining Routing Engine becomes primary automatically, regardless of how redundancy is configured.
On Detection of the em0 Interface Failure on the Primary Routing Engine
After you configure a backup Routing Engine, you instruct it to take primary
role automatically if the em0 interface fails on the primary Routing Engine.
To enable this feature, include the on-re-to-fpc-stale
statement at the [edit chassis redundancy failover]
hierarchy level.
[edit chassis redundancy failover] on-re-to-fpc-stale;
When a Software Process Fails
To configure automatic switchover to the backup Routing Engine if a software
process fails, include the failover other-routing-engine
statement at the [edit system processes
process-name]
hierarchy level:
[edit system processes process-name] failover other-routing-engine;
process-name is one of the valid
process names. If this statement is configured for a process, and that
process fails four times within 30 seconds, the router reboots from the
other Routing Engine. Another statement available at the [edit
system processes]
hierarchy level is failover
alternate-media. For information about the alternate media
option, see the Junos OS Administration Library for Routing Devices.
Manually Switching Routing Engine Primary Role
To manually switch Routing Engine primary role, use one of the following commands:
-
On the backup Routing Engine, request that the backup Routing Engine take primary role by issuing the
request chassis routing-engine master acquire
command. -
On the primary Routing Engine, request that the backup Routing Engine take primary role by using the
request chassis routing-engine master release
command. -
On either Routing Engine, switch primary role by issuing the
request chassis routing-engine master switch
command.
Verifying Routing Engine Redundancy Status
A separate log file is provided for redundancy logging at
/var/log/mastership. To view the log, use the
file show /var/log/mastership
command. Table 1
lists the primary role log event codes and descriptions.
Event Code |
Description |
---|---|
E_NULL = 0 |
The event is a null event. |
E_CFG_M |
The Routing Engine is configured as primary. |
E_CFG_B |
The Routing Engine is configured as backup. |
E_CFG_D |
The Routing Engine is configured as disabled. |
E_MAXTRY |
The maximum number of tries to acquire or release primary role was exceeded. |
E_REQ_C |
A claim primary role request was sent. |
E_ACK_C |
A claim primary role acknowledgement was received. |
E_NAK_C |
A claim primary role request was not acknowledged. |
E_REQ_Y |
Confirmation of primary role is requested. |
E_ACK_Y |
Primary Role is acknowledged. |
E_NAK_Y |
Primary Role is not acknowledged. |
E_REQ_G |
A release primary role request was sent by a Routing Engine. |
E_ACK_G |
The Routing Engine acknowledged release of primary role. |
E_CMD_A |
The command request chassis routing-engine master acquire was issued from the backup Routing Engine. |
E_CMD_F |
The command request chassis routing-engine master acquire force was issued from the backup Routing Engine. |
E_CMD_R |
The command request chassis routing-engine master release was issued from the primary Routing Engine. |
E_CMD_S |
The command request chassis routing-engine master switch was issued from a Routing Engine. |
E_NO_ORE |
No other Routing Engine is detected. |
E_TMOUT |
A request timed out. |
E_NO_IPC |
Routing Engine connection was lost. |
E_ORE_M |
Other Routing Engine state was changed to primary. |
E_ORE_B |
Other Routing Engine state was changed to backup. |
E_ORE_D |
Other Routing Engine state was changed to disabled. |
Check Overall CPU and Memory Usage
Purpose
You can display exhaustive system process information about software processes that are running on the router and have controlling terminals. This command is equivalent to the UNIX top command. However, the UNIX top command shows real-time memory usage, with the memory values constantly changing, while the show system processes extensive command provides a snapshot of memory usage in a given moment.
Action
To check overall CPU and memory usage, enter the following Junos OS command-line interface (CLI) command:
user@host> show system processes extensive
Sample Output
user@R1> show
system processes extensive
last pid: 5251; load averages: 0.00, 0.00, 0.00 up 4+20:22:16 10:44:41 58 processes: 1 running, 57 sleeping Mem: 57M Active, 54M Inact, 17M Wired, 184K Cache, 35M Buf, 118M Free Swap: 512M Total, 512M Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 4480 root 2 0 3728K 1908K select 231:17 2.34% 2.34% chassisd 4500 root 2 0 1896K 952K select 0:36 0.00% 0.00% fud 4505 root 2 0 1380K 736K select 0:35 0.00% 0.00% irsd 4481 root 2 0 1864K 872K select 0:32 0.00% 0.00% alarmd 4488 root 2 0 8464K 4600K kqread 0:28 0.00% 0.00% rpd 4501 root 2 -15 1560K 968K select 0:21 0.00% 0.00% ppmd 4510 root 2 0 1372K 812K select 0:13 0.00% 0.00% bfdd 5 root 18 0 0K 0K syncer 0:09 0.00% 0.00% syncer 4485 root 2 0 3056K 1776K select 0:07 0.00% 0.00% snmpd 4499 root 2 0 3688K 1676K select 0:05 0.00% 0.00% kmd 4486 root 2 0 3760K 1748K select 0:05 0.00% 0.00% mib2d 4493 root 2 0 1872K 928K select 0:03 0.00% 0.00% pfed 4507 root 2 0 1984K 1052K select 0:02 0.00% 0.00% fsad 4518 root 2 0 3780K 2400K select 0:02 0.00% 0.00% dcd 8 root -18 0 0K 0K psleep 0:02 0.00% 0.00% vmuncachedaemo 4 root -18 0 0K 0K psleep 0:02 0.00% 0.00% bufdaemon 4690 root 2 0 0K 0K peer_s 0:01 0.00% 0.00% peer proxy 4504 root 2 0 1836K 968K select 0:01 0.00% 0.00% dfwd 4477 root 2 0 992K 320K select 0:01 0.00% 0.00% watchdog 4354 root 2 0 1116K 604K select 0:01 0.00% 0.00% syslogd 4492 root 10 0 1004K 400K nanslp 0:01 0.00% 0.00% tnp.sntpd 4446 root 10 0 1108K 616K nanslp 0:01 0.00% 0.00% cron 4484 root 2 0 15716K 7468K select 0:01 0.00% 0.00% mgd 4494 root 2 15 2936K 2036K select 0:01 0.00% 0.00% sampled 5245 remote 2 0 8340K 3472K select 0:01 0.00% 0.00% cli 2 root -18 0 0K 0K psleep 0:00 0.00% 0.00% pagedaemon 4512 root 2 0 2840K 1400K select 0:00 0.00% 0.00% l2tpd 1 root 10 0 852K 580K wait 0:00 0.00% 0.00% init 5244 root 2 0 1376K 784K select 0:00 0.00% 0.00% telnetd 4509 root 10 0 1060K 528K nanslp 0:00 0.00% 0.00% eccd 4508 root 2 0 2264K 1108K select 0:00 0.00% 0.00% spd 2339 root 10 0 514M 17260K mfsidl 0:00 0.00% 0.00% newfs 4497 root 2 0 2432K 1152K select 0:00 0.00% 0.00% cosd 4490 root 2 -15 2356K 1020K select 0:00 0.00% 0.00% apsd 4496 root 2 0 2428K 1108K select 0:00 0.00% 0.00% rmopd 4491 root 2 0 2436K 1104K select 0:00 0.00% 0.00% vrrpd 4487 root 2 0 15756K 7648K sbwait 0:00 0.00% 0.00% mgd 5246 root 2 0 15776K 8336K select 0:00 0.00% 0.00% mgd 0 root -18 0 0K 0K sched 0:00 0.00% 0.00% swapper 5251 root 30 0 21732K 840K RUN 0:00 0.00% 0.00% top 4511 root 2 0 1964K 908K select 0:00 0.00% 0.00% pgmd 4502 root 2 0 1960K 956K select 0:00 0.00% 0.00% lmpd 4495 root 2 0 1884K 876K select 0:00 0.00% 0.00% ilmid 4482 root 2 0 1772K 776K select 0:00 0.00% 0.00% craftd 4503 root 10 0 1040K 492K nanslp 0:00 0.00% 0.00% smartd 6 root 28 0 0K 0K sleep 0:00 0.00% 0.00% netdaemon 4498 root 2 0 1736K 932K select 0:00 0.00% 0.00% nasd 4506 root 2 0 1348K 672K select 0:00 0.00% 0.00% rtspd 4489 root 2 0 1160K 668K select 0:00 0.00% 0.00% inetd 4478 root 2 0 1108K 608K select 0:00 0.00% 0.00% tnetd 4483 root 2 0 1296K 540K select 0:00 0.00% 0.00% ntpd 4514 root 3 0 1080K 540K ttyin 0:00 0.00% 0.00% getty 4331 root 2 0 416K 232K select 0:00 0.00% 0.00% pccardd 7 root 2 0 0K 0K pfeacc 0:00 0.00% 0.00% if_pfe_listen 11 root 2 0 0K 0K picacc 0:00 0.00% 0.00% if_pic_listen 3 root 18 0 0K 0K psleep 0:00 0.00% 0.00% vmdaemon 9 root 2 0 0K 0K scs_ho 0:00 0.00% 0.00% scs_housekeepi 10 root 2 0 0K 0K cb-pol 0:00 0.00% 0.00% cb_poll
Meaning
The sample output shows the amount of virtual memory used by the Routing Engine and software processes. For example, 118 MB of physical memory is free and 512 MB of the swap file is free, indicating that the router is not short of memory. The processes field shows that most of the 58 processes are in the sleeping state, with 1 in the running state. The process or command that is running is the top command.
The commands column lists the processes that are currently running. For example, the chassis process (chassisd) has a process identifier (PID) of 4480, with a current priority (PRI) of 2. A lower priority number indicates a higher priority.
The processes are listed according to level of activity, with the most active process at the top of the output. For example, the chassis (chassisd) process is consuming the largest amount of CPU resource at 2.34 percent.
The memory field (Mem) shows the virtual memory managed by the Routing Engine and used by processes. The value in the memory field is in KB and MB, and is broken down as follows:
-
Active—Memory that is allocated and actually in use by programs.
-
Inact—Memory that is either allocated but not recently used or memory that was freed by programs. Inactive memory is still mapped in the address space of one or more processes and, therefore, counts toward the resident set size of those processes.
-
Wired—Memory that is not eligible to be swapped, and is usually used for Routing Engine memory structures or memory physically locked by a process.
-
Cache—Memory that is not associated with any program and does not need to be swapped before being reused.
-
Buf—The size of the memory buffer used to hold data recently called from disk.
-
Free—Memory that is not associated with any programs. Memory freed by a process can become Inactive, Cache, or Free, depending on the method used by the process to free the memory.
When the system is under memory pressure, the pageout process reuses memory from the free, cache, inactive and, if necessary, active pages.
The Swap field shows the total swap space available and how much is unused. In the example, the output shows 512 MB of total swap space and 512 MB of free swap space.
Finally, the memory usage of each process is listed. The SIZE field indicates the size of the virtual address space, and the RES field indicates the amount of the program in physical memory, which is also known as RSS or Resident Set Size. In the sample output, the chassis (chassisd) process has 3728 KB of virtual address space and 1908 KB of physical memory.
Initial Routing Engine Configuration Example
You can use configuration groups to ensure that the correct IP addresses are used for each Routing Engine and to maintain a single configuration file for both Routing Engines.
The following example defines configuration groups re0 and re1 with separate IP addresses. These well-known configuration group names take effect only on the appropriate Routing Engine.
groups { re0 { system { host-name my-re0; } interfaces { fxp0 { description "10/100 Management interface"; unit 0 { family inet { address 10.255.2.40/24; } } } } } re1 { system { host-name my-re1; } interfaces { fxp0 { description "10/100 Management interface"; unit 0 { family inet { address 10.255.2.41/24; } } } } } }
You can assign an additional IP address to the management Ethernet interface (fxp0 in this example) on both Routing Engines. The assigned address uses the master-only keyword and is identical for both Routing Engines, ensuring that the IP address for the primary Routing Engine can be accessed at any time. The address is active only on the primary Routing Engine's management Ethernet interface. During a Routing Engine switchover, the address moves over to the new primary Routing Engine.
For example, on re0, the configuration is:
[edit groups re0 interfaces fxp0] unit 0 { family inet { address 10.17.40.131/25 { master-only; } address 10.17.40.132/25; } }
On re1, the configuration is:
[edit groups re1 interfaces fxp0] unit 0 { family inet { address 10.17.40.131/25 { master-only; } address 10.17.40.133/25; } }
For more information about the initial configuration of dual Routing Engines, see the Junos OS Software Installation and Upgrade Guide. For more information about assigning an additional IP address to the management Ethernet interface with the master-only keyword on both Routing Engines, see the Junos OS CLI User Guide.
See Also
Copying a Configuration File from One Routing Engine to the Other
You can use either the console port or the management Ethernet port to establish connectivity between the two Routing Engines. You can then copy or use FTP to transfer the configuration from the primary to the backup, and load the file and commit it in the normal way.
To connect to the other Routing Engine using the management Ethernet port, issue the following command:
user@host> request routing-engine login (other-routing-engine | re0 | re1)
On a TX Matrix router, to make connections to the other Routing Engine using the management Ethernet port, issue the following command:
user@host> request routing-engine login (backup | lcc number | master | other-routing-engine | re0 | re1)
For more information about the request routing-engine login
command, see the CLI Explorer.
To copy a configuration file from one Routing Engine
to the other, issue the file copy
command:
user@host> file copy source destination
In this case, source is the name of the configuration file. These files are stored in the directory /config. The active configuration is /config/juniper.conf, and older configurations are in /config/juniper.conf {1...9}. The destination is a file on the other Routing Engine.
The following example copies a configuration file from Routing Engine 0 to Routing Engine 1:
user@host> file copy /config/juniper.conf re1:/var/tmp/copied-juniper.conf
The following example copies a configuration file from Routing Engine 0 to Routing Engine 1 on a TX Matrix router:
user@host> file copy /config/juniper.conf scc-re1:/var/tmp/copied-juniper.conf
To load the configuration file, enter the load replace
command at the [edit]
hierarchy level:
user@host> load replace /var/tmp/copied-juniper.conf
Make sure you change any IP addresses specified in the management Ethernet interface configuration on Routing Engine 0 to addresses appropriate for Routing Engine 1.
See Also
Loading a Software Package from the Other Routing Engine
You can load a package from the other Routing Engine
onto the local Routing Engine using the existing request system
software add package-name
command:
user@host> request system software add re(0|1):/filename
In the re portion of the URL, specify the number of the other Routing Engine. In the filename portion of the URL, specify the path to the package. Packages are typically in the directory /var/sw/pkg.