Troubleshooting an SRX Chassis Cluster with One Node in the Primary State and the Other Node in the Disabled State

date_range 28-Nov-23

arrow_backward

arrow_forward

Problem

Description
Environment
Symptoms

Description

The nodes of the SRX chassis cluster are in primary and disabled states.

Environment

SRX chassis cluster

Symptoms

One node of the cluster is in the primary state and the other node is in the disabled state. Run the show chassis cluster status command on each node to view the status of the node. Here is a sample output:

content_copy zoom_out_map
{primary:node0}
root@primary-srx> show chassis cluster status 
Monitor Failure codes:
    CS  Cold Sync monitoring        FL  Fabric Connection monitoring
    GR  GRES monitoring             HW  Hardware monitoring
    IF  Interface monitoring        IP  IP monitoring
    LB  Loopback monitoring         MB  Mbuf monitoring
    NH  Nexthop monitoring          NP  NPC monitoring              
    SP  SPU monitoring              SM  Schedule monitoring
    CF  Config Sync monitoring      RE  Relinquish monitoring
 
Cluster ID: 1
Node   Priority Status         Preempt Manual   Monitor-failures

Redundancy group: 0 , Failover count: 1
node0  255      primary        no      no       None           
node1  129      disabled      no      no       None           

Redundancy group: 1 , Failover count: 1
node0  255      primary        no      no       None           
node1  129      disabled      no      no       None           

Diagnosis

Run the show chassis cluster interfaces command to verify the status of the control and fabric links. Are any of the links down?

Here are sample outputs for a branch SRX Series Firewall and a high-end SRX Series Firewall.

content_copy zoom_out_map
root@Branch-SRX> show chassis cluster interfaces
Control link 0 name: fxp1
Control link status: Up

Fabric interfaces:
Name Child-interface Status
fab0 ge-0/0/2 up
fab0 ge-2/0/6 up
fab1 ge-9/0/2 up
fab1 ge-11/0/6 up
Fabric link status: Up

content_copy zoom_out_map
{primary:node0}
root@High-end-SRX> show chassis cluster interfaces
Control link 0 name: em0
Control link 1 name: em1
Control link status: Up

Fabric interfaces:
Name Child-interface Status
fab0 ge-2/0/0 down
fab0
fab1
fab1
Fabric link status: Up

Yes: See Troubleshooting a Fabric Link Failure in an SRX Chassis Cluster or Troubleshooting a Control Link Failure in an SRX Chassis Cluster.
No: Proceed to Step 2.

Reboot the disabled node. Does the node come up in the disabled state after the reboot?
- Yes: There might be hardware issues. Proceed to Step 3.
- No: The issue is resolved.
Check the node for any hardware issues. Run the show chassis fpc pic-status command on both nodes, and ensure that the FPCs are online. Do you see the status of any FPC listed as Present, OK, or Offline?

Here is a sample output.
```
{primary:node1}
root@J-SRX> show chassis fpc pic-status
node0:
--------------------------------------------------------------------------
Slot 0  Online  FPC
  PIC 0 Online  4x GE Base PIC
Slot 2  Online  FPC
  PIC 0 Online  24x GE gPIM
Slot 6  Online  FPC
  PIC 0 Online  2x 10G gPIM

node1:
--------------------------------------------------------------------------
Slot 0  Online  FPC
  PIC 0 Online  4x GE Base PIC
Slot 2  Online  FPC
  PIC 0 Online  24x GE gPIM
Slot 6  Online  FPC
  PIC 0 Online  2x 10G gPIM
```
- Yes: Reseat the cards and reboot the node. If this does not resolve the issue, open a case with your technical support representative for further troubleshooting. See Data Collection for Customer Support.
- No: Proceed to Step 4.
Run the show chassis cluster statistics on both nodes, and analyze the output.
```
{primary:node0}
root@J-SRX> show chassis cluster statistics
Control link statistics:
    Control link 0:
        Heartbeat packets sent: 418410
        Heartbeat packets received: 418406
        Heartbeat packet errors: 0
Fabric link statistics:
    Probes sent: 418407
    Probes received: 414896
    Probe errors: 0
```
Does the Heartbeat packets received field show a non-increasing value or zero (0), or does the Heartbeat packet errors field show a non-zero value?
- Yes: Open a case with your technical support representative for further troubleshooting. See Data Collection for Customer Support.
- No: Proceed to Step 5.
Configure set chassis cluster no-fabric-monitoring (hidden option) and commit the configuration to temporarily disable fabric monitoring during the troubleshooting process. Reboot the disabled node. After the node reboots, run the show chassis cluster statistics command. Are the probes still lost?
- Yes: Open a case with your technical support representative for further troubleshooting. See Data Collection for Customer Support
- No: Delete the set chassis cluster no-fabric-monitoring configuration, and verify that everything is operational. If you notice any issue, open a case with your technical support representative for further troubleshooting. See Data Collection for Customer Support

arrow_backward PREVIOUS Troubleshooting a Redundancy Group that Does Not Fail Over in an SRX Chassis Cluster

NEXT arrow_forward Troubleshooting an SRX Chassis Cluster with One Node in the Primary State and the Other Node in the Lost State

Chassis Cluster User Guide for SRX Series Devices

ON THIS PAGE