ON THIS PAGE
Troubleshooting a Redundancy Group that Does Not Fail Over in an SRX Chassis Cluster
Problem
Description
A redundancy group (RG) in a high-availability (HA) SRX chassis cluster does not fail over.
Environment
SRX chassis cluster
Diagnosis
From the command prompt of the SRX Series
Services Gateway that is part of the chassis cluster, run the show chassis cluster status
command.
Sample output:
Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 0 node0 150 primary no no node1 100 secondary no no Redundancy group: 1 , Failover count: 0 node0 255 primary yes no node1 100 secondary yes no
In the sample output check the priority of the redundancy group that does not fail over.
If the
Priority
is255
and theManual failover
field isyes
, proceed to Redundancy Group Manual Failover.If the priority is
0
or anything between1
and254
, proceed to Redundancy Group Auto Failover
Resolution
Redundancy Group Manual Failover
Check whether a manual failover of the redundancy group was initiated earlier by using the
show chassis cluster status
command.Sample output:
Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 0 node0 150 primary yes no node1 100 secondary yes no Redundancy group: 1 , Failover count: 0 node0 255 primary no yes node1 100 secondary no yes
In the sample output,
Priority
value of redundancy group 1 (RG1) is255
and the status ofManual failover
isyes
, which means that a manual failover of the redundancy group was initiated earlier. You must reset the redundancy group priority.Note:After a manual failover of a redundancy group, we recommend that you reset the manual failover flag in the cluster status to allow further failovers.
Reset the redundancy group priority by using the
request chassis cluster failover reset redundancy-group <1-128>
.For example:
user@host> request chassis cluster failover reset redundancy-group 1 root@srx> request chassis cluster failover reset redundancy-group 1 node0: -------------------------------------------------------------------------- Successfully reset manual failover for redundancy group 1 node1: -------------------------------------------------------------------------- No reset required for redundancy group 1.
This must resolve the issue and allow further redundancy group failovers. If these steps do not resolve the issue, proceed to section Whats Next.
If you want to initiate a redundancy group x (redundancy groups numbered 1 through 128) failover manually, see Understanding Chassis Cluster Redundancy Group Manual Failover.
Redundancy Group Auto Failover
Check the configuration and link status of the control and fabric links by using the
show chassis cluster interfaces
command.Sample output for a branch SRX Series Services Gateway:
{primary:node0} root@SRX_Branch> show chassis cluster interfaces Control link 0 name: fxp1 Control link status: Up Fabric interfaces: Name Child-interface Status fab0 ge-0/0/2 down fab0 fab1 ge-9/0/2 down fab1 Fabric link status: down
Sample output for a high-end SRX Series Services Gateway:
{primary:node0} root@SRX_HighEnd> show chassis cluster interfaces Control link 0 name: em0 Control link 1 name: em1 Control link status: up Fabric interfaces: Name Child-interface Status fab0 ge-0/0/5 down fab0 Fabric link status: down
If the control link is down, see KB article KB20698 to troubleshoot and bring up the control link and proceed to 3.
If the fabric link is down, see Troubleshooting a Fabric Link Failure in an SRX Chassis Cluster to troubleshoot and bring up the fabric link and proceed to 3.
Proceed to Step 3 if both the control link and fabric link are up.
Check the interface monitoring or IP monitoring configurations that are up. If the configurations are not correct rectify the configurations. If the configurations are correct proceed to step 4.
Check the priority of each node in the output of the
show chassis cluster status
command.-
If the priority is
0
, see KB article KB16869 for JSRP (Junos OS Services Redundancy Protocol) chassis clusters and KB article KB19431 for branch SRX Series Firewalls. If the priority is
255
, see Redundancy Group Manual Failover.If the priority is between
1
and254
and if still the redundancy group does not fail over, proceed to the section Whats Next.
-
What's Next
If these steps do not resolve the issue, see KB article KB15911 for redundancy group failover tips.
If you wish to debug further, see KB article KB21164 to check the debug logs.
To open a JTAC case with the Juniper Networks Support team, see Data Collection for Customer Support for the data you should collect to assist in troubleshooting before you open a JTAC case.