Software Upgrade in Multinode High Availability
Overview
In a Multinode High Availability setup, you can upgrade your SRX Series Firewalls between two different Junos OS releases with minimal disruption of traffic.
We support a software upgrade method using the CLI as in Junos OS Release 22.3R1.
From Junos OS Release | To Junos OS Release | Use Software Upgrade Method |
---|---|---|
20.4 | Any release post 20.4 | No |
22.3 | Next version of Junos OS Release | Yes |
For information about upgrade and downgrade support for Junos OS releases, see Upgrade and Downgrade Support Policy for Junos OS Releases and Extended End-Of-Life Releases in Release Notes.
When you are upgrading SRX Series Firewalls in Multinode High Availability to Junos OS Release 22.4R1 or to a higher release, from an earlier Junos OS release, you can use the Isolated Nodes Upgrade Procedure. Junos OS Release 22.4R1 and higher releases are not compatible with earlier Junos OS releases for synchronizing sessions during a regular upgrade.
When you are upgrading an SRX Series Firewall from Junos OS Release 22.3 to the next version of the Junos OS release, you may experience some disruption in traffic.
When you upgrade Junos OS Releases on SRX Series Firewalls in Multinode High Availability
setup, the following message is displayed in the show chassis high-availability
information
command output though the upgrade process completes
successfully:
Peer Hardware Incompatible: SPU SLOT MISMATCH
The above message is displayed when you upgrade from Junos OS release 21.4R1 to any Junos OS release post 21.4R1.
In Junos OS releases prior to 23.4R2, newly established NAT sessions are not synchronized during the interim upgrade stage of Multinode High Availability formation, when one node is running a different major Junos OS version.
You must install the same version of Junos OS on both the SRX Series Firewalls in a Multinode High Availability setup. Therefore, when you upgrade Junos OS on one device, ensure that you upgrade the other device also to the same version.
We support following upgrade methods in Multinode High Availability setup:
-
For Layer 3 deployments: The
install-on-failure-route
configuration (recommended). In this method, you can divert the traffic by changing the route. Here, traffic can still go through the node and interface remains up. Go to Upgrade Software using install-on-failure-route for details. You can also use theshutdown-on-failure
interfaces method for Layer 3 deployments. -
For Hybrid deployment and Default gateway (Layer 2/switching) deployments: The
shutdown-on-failure
interfaces method. In this method, you can divert the traffic by closing down interfaces on the node. Here, traffic cannot pass through the nodes. Go to Upgrade Software using shutdown-on-failure interface for details..
In the following procedure, we'll show you how to upgrade two SRX Series Firewalls (SRX-01 and SRX-02) from Junos OS Release 22.3R1.1 to Junos OS Release 22.3R1.3 using CLI. To avoid downtime when upgrading SRX Series Firewalls in Multinode High Availability setup, we'll update one device at a time.
Best Practices for Upgrading Junos OS
Consider the following best practices when you plan your software upgrade:
- Ensure both nodes are online and have the same version of Junos OS.
- Prepare your SRX Series Firewalls for an upgrade using the checklist available in Preparing for Software Installation and Upgrade (Junos OS).
- Check whether both nodes have sufficient storage in the /var file
system by using the
show system storage
command. -
Check the status of all the cards on both the devices by using the
show chassis fpc pic-status
command. -
Verify that there are no major alarms on the devices by using the
show chassis alarms
command. - Ensure that there are no uncommitted changes.
- Back up the active configuration and license keys.
We recommend that you perform software upgrades during a maintenance window.
Preinstallation Steps
Complete the following tasks before you start the software upgrade.
- Check the current Junos OS software version on your
device.
user@host> show version
Hostname: srx-01 Model: vSRX Junos: 22.3R1.1 - Download the Junos OS image from the Juniper Networks Support page on both SRX Series Firewalls and save it in the /var/tmp location.
-
Use the show chassis high-availability information command to verify that your Multinode High Availability setup is healthy, functional, and that the interchassis link (ICL) is up.
On SRX-01 Device
user@srx-01> show chassis high-availability information
Node failure codes: HW Hardware monitoring LB Loopback monitoring MB Mbuf monitoring SP SPU monitoring CS Cold Sync monitoring SU Software Upgrade Node Status: ONLINE Local-id: 1 Local-IP: 10.22.0.1 HA Peer Information: Peer Id: 2 IP address: 10.22.0.2 Interface: ge-0/0/2.0 Routing Instance: default Encrypted: YES Conn State: UP Cold Sync Status: COMPLETE Services Redundancy Group: 0 Current State: ONLINE Peer Information: Peer Id: 2 SRG failure event codes: BF BFD monitoring IP IP monitoring IF Interface monitoring CP Control Plane monitoring Services Redundancy Group: 1 Deployment Type: ROUTING Status: ACTIVE Activeness Priority: 200 Preemption: ENABLED Process Packet In Backup State: NO Control Plane State: READY System Integrity Check: N/A Failure Events: NONE Peer Information: Peer Id: 2 Status : BACKUP Health Status: HEALTHY Failover Readiness: READYOn SRX-02 Device
user@srx-02> show chassis high-availability information
Node failure codes: HW Hardware monitoring LB Loopback monitoring MB Mbuf monitoring SP SPU monitoring CS Cold Sync monitoring SU Software Upgrade Node Status: ONLINE Local-id: 2 Local-IP: 10.22.0.2 HA Peer Information: Peer Id: 1 IP address: 10.22.0.1 Interface: ge-0/0/2.0 Routing Instance: default Encrypted: YES Conn State: UP Cold Sync Status: COMPLETE Services Redundancy Group: 0 Current State: ONLINE Peer Information: Peer Id: 1 SRG failure event codes: BF BFD monitoring IP IP monitoring IF Interface monitoring CP Control Plane monitoring Services Redundancy Group: 1 Deployment Type: ROUTING Status: BACKUP Activeness Priority: 1 Preemption: DISABLED Process Packet In Backup State: NO Control Plane State: READY System Integrity Check: COMPLETE Failure Events: NONE Peer Information: Peer Id: 1 Status : ACTIVE Health Status: HEALTHY Failover Readiness: N/AThese output samples confirm that the two SRX Series Firewalls in the Multinode High Availability setup are in a healthy state and are operating normally.
You are now ready to proceed with software upgrade.
Upgrade Software using install-on-failure-route
Prerequisite for Diverting Transit Traffic
Check whether your device has the configuration required to divert transit traffic by changing the route as mentioned in Configuring Multinode High Availability In a Layer 3 Network. If you haven't configured, use the following steps:
-
Create a dedicated custom virtual router for the route used for diverting traffic during the upgrade.
set routing-instances MNHA-signal-routes instance-type virtual-router
- Configure the
install-on-failure-route
statement for SRG0. Here, you have configured the route with IP address 10.39.1.3 as the route to install when the node fails.set routing-instances MNHA-signal-routes instance-type virtual-router set chassis high-availability services-redundancy-group 0 install-on-failure-route 10.39.1.3 routing-instance MNHA-signal-routes set chassis high-availability services-redundancy-group 1 active-signal-route 10.39.1.1 routing-instance MNHA-signal-routes set chassis high-availability services-redundancy-group 1 backup-signal-route 10.39.1.2 routing-instance MNHA-signal-routes
The routing table installs the route mentioned in the statement when the node fails.
- Configure a matching routing policy and define a policy condition based on the
existence of routes. Here you include the route 10.39.1.3 as the route match condition
for the
if-route-exists
.set policy-options condition active_route_exists if-route-exists address-family inet 10.39.1.1/32 set policy-options condition active_route_exists if-route-exists address-family inet table MNHA-signal-routes.inet.0 set policy-options condition backup_route_exists if-route-exists address-family inet 10.39.1.2/32 set policy-options condition backup_route_exists if-route-exists address-family inet table MNHA-signal-routes.inet.0 set policy-options condition failure_route_exists if-route-exists address-family inet 10.39.1.3/32 set policy-options condition failure_route_exists if-route-exists address-family inet table MNHA-signal-routes.inet.0
Create the policy statement to refer the condition as one of the matching term.
set policy-options policy-statement mnha-route-policy term 4 from protocol static set policy-options policy-statement mnha-route-policy term 4 from protocol direct set policy-options policy-statement mnha-route-policy term 4 from condition failure_route_exists set policy-options policy-statement mnha-route-policy term 4 then metric 100 set policy-options policy-statement mnha-route-policy term 4 then accept
Upgrade Multinode High Availability Software
Let's upgrade the device that is acting as the backup node (SRX-02).
- Initiate the software upgrade process and commit the
configuration.
user@srx-02# set chassis high-availability software-upgrade
This command initiates local failure for SRG0 and transitions SRG1 (if configured) to the
INELIGIBLE
state on the local device. The peer device now transitions to or stays in active state for SRG1. On the local node, the active and backup signal routes of SRG1 are removed. If you've configured theinstall-on-failure-route
statement, the signal route associated with theinstall-on-failure-route
configuration is installed. With the appropriate routing policies, the local device can advertise higher route metrics and divert the traffic away from the local device and steer the traffic toward the peer device, - Verify the status of Multinode High
Availability.
user@srx-02> show chassis high-availability information Node failure codes: HW Hardware monitoring LB Loopback monitoring MB Mbuf monitoring SP SPU monitoring CS Cold Sync monitoring SU Software Upgrade Node Status: OFFLINE [ SU ] Local-id: 1 Local-IP: 10.22.0.1 HA Peer Information: Peer Id: 2 IP address: 10.22.0.2 Interface: ge-0/0/2.0 Routing Instance: default Encrypted: YES Conn State: UP Cold Sync Status: COMPLETE Services Redundancy Group: 0 Current State: ONLINE Peer Information: Peer Id: 2 SRG failure event codes: BF BFD monitoring IP IP monitoring IF Interface monitoring CP Control Plane monitoring Services Redundancy Group: 1 Deployment Type: ROUTING Status: INELIGIBLE Activeness Priority: 200 Preemption: ENABLED Process Packet In Backup State: NO Control Plane State: N/A System Integrity Check: IN PROGRESS Failure Events: NONE Peer Information: Peer Id: 2 Status : ACTIVE Health Status: HEALTHY Failover Readiness: N/A
The output shows
Node Status: OFFLINE [ SU ]
, which indicates that the node is ready for the software upgrade. You can see that the status of the SRG1 has changed toINELIGIBLE
. - Confirm that the other device (SRX-01) is in the active role and is functioning
normally.
user@srx-01> show chassis high-availability informationNode failure codes: HW Hardware monitoring LB Loopback monitoring MB Mbuf monitoring SP SPU monitoring CS Cold Sync monitoring SU Software Upgrade Node Status: ONLINE Local-id: 2 Local-IP: 10.22.0.2 HA Peer Information: Peer Id: 1 IP address: 10.22.0.1 Interface: ge-0/0/2.0 Routing Instance: default Encrypted: YES Conn State: UP Cold Sync Status: COMPLETE Services Redundancy Group: 0 Current State: ONLINE Peer Information: Peer Id: 1 SRG failure event codes: BF BFD monitoring IP IP monitoring IF Interface monitoring CP Control Plane monitoring Services Redundancy Group: 1 Deployment Type: ROUTING Status: ACTIVE Activeness Priority: 1 Preemption: DISABLED Process Packet In Backup State: NO Control Plane State: READY System Integrity Check: N/A Failure Events: NONE Peer Information: Peer Id: 1 Status : INELIGIBLE Health Status: UNHEALTHY Failover Readiness: NOT READY
The command output shows that the status of SRG1 is ACTIVE.
Also note that under the
Peer Information
section of the SRG1, the status isINELIGIBLE
which indicates that the other node is in ineligible state. - Install the Junos OS software on the SRX-02 device.
user@srx-02> request system software add /var/tmp/junos-install-vsrx3-x86-64-22.3R1.3.tgz no-copy
- After a successful installation, reboot the device using the
request system reboot
command. - After the reboot, check the Junos OS version using the
show version
command.user@srx-02> show version
Hostname: srx-02 Model: vSRX Junos: 22.3R1.3The output confirms that the device is upgraded to the correct Junos OS version.
- Check status of the Multinode High Availability on the device.
user@srx-02> show chassis high-availability information
Node failure codes: HW Hardware monitoring LB Loopback monitoring MB Mbuf monitoring SP SPU monitoring CS Cold Sync monitoring SU Software Upgrade Node Status: OFFLINE [ SU ] Local-id: 1 Local-IP: 10.22.0.1 HA Peer Information: Peer Id: 2 IP address: 10.22.0.2 Interface: ge-0/0/2.0 Routing Instance: default Encrypted: YES Conn State: UP Cold Sync Status: COMPLETE Services Redundancy Group: 0 Current State: ONLINE Peer Information: Peer Id: 2 SRG failure event codes: BF BFD monitoring IP IP monitoring IF Interface monitoring CP Control Plane monitoring Services Redundancy Group: 1 Deployment Type: ROUTING Status: INELIGIBLE Activeness Priority: 200 Preemption: ENABLED Process Packet In Backup State: NO Control Plane State: N/A System Integrity Check: COMPLETE Failure Events: NONE Peer Information: Peer Id: 2 Status : ACTIVE Health Status: HEALTHY Failover Readiness: N/AThe output continues to display the node status as
OFFLINE [ SU ]
and SRG1 status asINELIGIBLE
. - Remove the
software-upgrade
statement and commit the configuration.user@srx-02# delete chassis high-availability software-upgrade
When you remove
software-upgrade
statement, the local failure state and installed routes are removed. -
Check the Multinode High Availability status again to confirm that the device is online and the overall status is healthy and functioning.
user@srx02> show chassis high-availability information Node failure codes: HW Hardware monitoring LB Loopback monitoring MB Mbuf monitoring SP SPU monitoring CS Cold Sync monitoring SU Software Upgrade Node Status: ONLINE Local-id: 1 Local-IP: 10.22.0.1 HA Peer Information: Peer Id: 2 IP address: 10.22.0.2 Interface: ge-0/0/2.0 Routing Instance: default Encrypted: YES Conn State: UP Cold Sync Status: COMPLETE Services Redundancy Group: 0 Current State: ONLINE Peer Information: Peer Id: 2 SRG failure event codes: BF BFD monitoring IP IP monitoring IF Interface monitoring CP Control Plane monitoring Services Redundancy Group: 1 Deployment Type: ROUTING Status: BACKUP Activeness Priority: 200 Preemption: ENABLED Process Packet In Backup State: NO Control Plane State: READY System Integrity Check: IN PROGRESS Failure Events: NONE Peer Information: Peer Id: 2 Status : ACTIVE Health Status: HEALTHY Failover Readiness: N/A
The output shows
Node Status: ONLINE
and SRG1 status asBACKUP
, which indicates that the node is back online and is functioning normally in backup role. -
Check interfaces, routing protocols, routes advertised and so on to confirm that your setup is operating normally.
Now you can proceed to upgrade the other device (SRX-01) using the same procedure.
In case if you face any issues and are not able to complete the upgrade, you can roll
back the software on the device, and then reboot the system. Use the request
system software rollback
command to restore the previously installed software
version.
Upgrade Software using shutdown-on-failure interface
Prerequisite to Divert Transit Traffic
Check whether your SRX Series includes the configuration required to isolate traffic by shutting down interfaces as mentioned in Configuring Multinode High Availability In a Default Gateway Deployment. if the feature is not configured:
- Configure all traffic interfaces under
the shutdown-on-failure option.
option. Example:set chassis high-availability services-redundancy-group 0 shutdown-on-failure interface-name
[edit] set chassis high-availability services-redundancy-group 0 shutdown-on-failure ge-0/0/0 set chassis high-availability services-redundancy-group 0 shutdown-on-failure ge-0/0/1 set chassis high-availability services-redundancy-group 0 shutdown-on-failure ge-0/0/3 set chassis high-availability services-redundancy-group 0 shutdown-on-failure ge-0/0/4
CAUTION:Donot use interfaces assigned for the interchassis link (ICL).
Upgrade Multinode High Availability Software
Let's upgrade the device that is acting as backup node (SRX-02).
- Initiate the software upgrade and commit the
configuration.
user@srx-02# set chassis high-availability software-upgrade
This command marks interfaces offline and transitions status to ineligible state.
- Check the Multinode High Availability
status.
user@srx-02> show chassis high-availability information Node failure codes: HW Hardware monitoring LB Loopback monitoring MB Mbuf monitoring SP SPU monitoring CS Cold Sync monitoring SU Software Upgrade Node Status: OFFLINE [ SU ] Local-id: 2 Local-IP: 10.22.0.2 HA Peer Information: Peer Id: 1 IP address: 10.22.0.1 Interface: ge-0/0/2.0 Routing Instance: default Encrypted: YES Conn State: UP Cold Sync Status: COMPLETE Services Redundancy Group: 0 Current State: ISOLATED [ Node Failure ] Peer Information: Peer Id: 1 Shut-on-failures interfaces: ge-0/0/4 ge-0/0/3 ge-0/0/1 ge-0/0/0 SRG failure event codes: BF BFD monitoring IP IP monitoring IF Interface monitoring CP Control Plane monitoring Services Redundancy Group: 1 Deployment Type: ROUTING Status: INELIGIBLE Activeness Priority: 1 Preemption: DISABLED Process Packet In Backup State: NO Control Plane State: N/A System Integrity Check: COMPLETE Failure Events: NONE Peer Information: Peer Id: 1 Status : ACTIVE Health Status: HEALTHY Failover Readiness: N/A
The output shows
Node Status: OFFLINE [ SU ]
, which indicates that the node is ready for the software upgrade. You can also see SRG0 status asISOLATED [ Node Failure ]
and SRG1 status asINELIGIBLE
. -
Check the status of the interfaces.
user@host> show interfaces terse Interface Admin Link Proto Local Remote ge-0/0/0 down down ge-0/0/1 down down ge-0/0/2 up up ge-0/0/2.0 up up inet 10.22.0.2/24 ge-0/0/3 down down ge-0/0/3.0 up down inet 10.3.0.2/16 ge-0/0/4 down down ge-0/0/4.0 up down inet 10.5.0.1/16
The output shows that interfaces marked for
shutdown-on-failure
are down. - Confirm that the other device (SRX-01) is in the active role and is functioning
normally.
user@srx-01> show chassis high-availability information Node failure codes: Node failure codes: HW Hardware monitoring LB Loopback monitoring MB Mbuf monitoring SP SPU monitoring CS Cold Sync monitoring SU Software Upgrade Node Status: ONLINE Local-id: 1 Local-IP: 10.22.0.1 HA Peer Information: Peer Id: 2 IP address: 10.22.0.2 Interface: ge-0/0/2.0 Routing Instance: default Encrypted: YES Conn State: UP Cold Sync Status: COMPLETE Services Redundancy Group: 0 Current State: ONLINE Peer Information: Peer Id: 2 SRG failure event codes: BF BFD monitoring IP IP monitoring IF Interface monitoring CP Control Plane monitoring Services Redundancy Group: 1 Deployment Type: ROUTING Status: ACTIVE Activeness Priority: 200 Preemption: ENABLED Process Packet In Backup State: NO Control Plane State: READY System Integrity Check: N/A Failure Events: NONE Peer Information: Peer Id: 2 Status : INELIGIBLE Health Status: UNHEALTHY Failover Readiness: NOT READY
The output shows that the status of SRG1 is
ACTIVE
.Also note that under the
Peer Information
section of the SRG1, the status isINELIGIBLE
which indicates that the other node is in ineligible state. - Install the Junos OS image on SRX-02.
user@srx-02> request system software add /var/tmp/junos-install-vsrx3-x86-64-22.3R1.3.tgz no-copy
- After the successful upgrade, reboot the device using the
request system reboot
command. - Check the Junos OS
version.
user@srx-02> show version
Hostname: srx-02 Model: vSRX Junos: 22.3R1.3The output confirms that the device is upgraded to the correct Junos OS version.
- Check the status of Multinode High Availability on the device.
user@srx-02> show chassis high-availability information
Node failure codes: HW Hardware monitoring LB Loopback monitoring MB Mbuf monitoring SP SPU monitoring CS Cold Sync monitoring SU Software Upgrade Node Status: OFFLINE [ SU ] Local-id: 2 Local-IP: 10.22.0.2 HA Peer Information: Peer Id: 1 IP address: 10.22.0.1 Interface: ge-0/0/2.0 Routing Instance: default Encrypted: YES Conn State: UP Cold Sync Status: COMPLETE Services Redundancy Group: 0 Current State: ISOLATED [ Node Failure ] Peer Information: Peer Id: 1 Shut-on-failures interfaces: ge-0/0/4 ge-0/0/3 ge-0/0/1 ge-0/0/0 SRG failure event codes: BF BFD monitoring IP IP monitoring IF Interface monitoring CP Control Plane monitoring Services Redundancy Group: 1 Deployment Type: ROUTING Status: INELIGIBLE Activeness Priority: 1 Preemption: DISABLED Process Packet In Backup State: NO Control Plane State: N/A System Integrity Check: COMPLETE Failure Events: NONE Peer Information: Peer Id: 1 Status : ACTIVE Health Status: HEALTHY Failover Readiness: N/AThe command output continues to display the node status as
OFFLINE [ SU ]
and SRG0 status asISOLATED [ Node Failure ]
. - Remove the
software-upgrade
statement and commit the configuration.user@srx-02# delete chassis high-availability software-upgrade
-
Check the Multinode High Availability status again on the device and confirm that the device is online and that the overall status is healthy.
user@srx-02> show chassis high-availability information Node failure codes: HW Hardware monitoring LB Loopback monitoring MB Mbuf monitoring SP SPU monitoring CS Cold Sync monitoring SU Software Upgrade Node Status: ONLINE Local-id: 2 Local-IP: 10.22.0.2 HA Peer Information: Peer Id: 1 IP address: 10.22.0.1 Interface: ge-0/0/2.0 Routing Instance: default Encrypted: YES Conn State: UP Cold Sync Status: COMPLETE Services Redundancy Group: 0 Current State: ONLINE Peer Information: Peer Id: 1 Shut-on-failures interfaces: ge-0/0/4 ge-0/0/3 ge-0/0/1 ge-0/0/0 SRG failure event codes: BF BFD monitoring IP IP monitoring IF Interface monitoring CP Control Plane monitoring Services Redundancy Group: 1 Deployment Type: ROUTING Status: BACKUP Activeness Priority: 1 Preemption: DISABLED Process Packet In Backup State: NO Control Plane State: READY System Integrity Check: COMPLETE Failure Events: NONE Peer Information: Peer Id: 1 Status : ACTIVE Health Status: HEALTHY Failover Readiness: N/A
The output shows
Node Status: ONLINE
, and SRG0ONLINE
, which indicates that the node is back online and is functioning normally. -
Verify the status of interfaces.
user@srx-02> show interfaces terse Interface Admin Link Proto Local Remote ge-0/0/0 up up gr-0/0/0 up up ge-0/0/1 up up ge-0/0/2 up up ge-0/0/2.0 up up inet 10.22.0.2/24 ge-0/0/3 up up ge-0/0/3.0 up up inet 10.3.0.2/16 ge-0/0/4 up up ge-0/0/4.0 up up inet 10.5.0.1/16 .............................
The output shows that interfaces that were previously down are up now.
-
Check interfaces, routing protocols, routes advertised, and so on to confirm that your setup is operating normally.
Now you can proceed to upgrade the other device (SRX-01) using the same procedure.