Troubleshooting an SRX Chassis Cluster with One Node in the Primary State and the Other Node in the Disabled State
Problem
Description
The nodes of the SRX chassis cluster are in primary and disabled states.
Environment
SRX chassis cluster
Symptoms
One node of the cluster is in the primary
state and the other node is in the disabled state. Run the show
chassis cluster status
command on each node to view the status
of the node. Here is a sample output:
{primary:node0} root@primary-srx> show chassis cluster status Monitor Failure codes: CS Cold Sync monitoring FL Fabric Connection monitoring GR GRES monitoring HW Hardware monitoring IF Interface monitoring IP IP monitoring LB Loopback monitoring MB Mbuf monitoring NH Nexthop monitoring NP NPC monitoring SP SPU monitoring SM Schedule monitoring CF Config Sync monitoring RE Relinquish monitoring Cluster ID: 1 Node Priority Status Preempt Manual Monitor-failures Redundancy group: 0 , Failover count: 1 node0 255 primary no no None node1 129 disabled no no None Redundancy group: 1 , Failover count: 1 node0 255 primary no no None node1 129 disabled no no None
Diagnosis
-
Run the
show chassis cluster interfaces
command to verify the status of the control and fabric links. Are any of the links down?Here are sample outputs for a branch SRX Series Firewall and a high-end SRX Series Firewall.
root@Branch-SRX> show chassis cluster interfaces Control link 0 name: fxp1 Control link status: Up Fabric interfaces: Name Child-interface Status fab0 ge-0/0/2 up fab0 ge-2/0/6 up fab1 ge-9/0/2 up fab1 ge-11/0/6 up Fabric link status: Up
{primary:node0} root@High-end-SRX> show chassis cluster interfaces Control link 0 name: em0 Control link 1 name: em1 Control link status: Up Fabric interfaces: Name Child-interface Status fab0 ge-2/0/0 down fab0 fab1 fab1 Fabric link status: Up
-
Yes: See Troubleshooting a Fabric Link Failure in an SRX Chassis Cluster or Troubleshooting a Control Link Failure in an SRX Chassis Cluster.
-
No: Proceed to Step 2.
-
-
Reboot the disabled node. Does the node come up in the
disabled
state after the reboot?-
Yes: There might be hardware issues. Proceed to Step 3.
-
No: The issue is resolved.
-
-
Check the node for any hardware issues. Run the
show chassis fpc pic-status
command on both nodes, and ensure that the FPCs are online. Do you see the status of any FPC listed asPresent
,OK
, orOffline
?Here is a sample output.
{primary:node1} root@J-SRX> show chassis fpc pic-status node0: -------------------------------------------------------------------------- Slot 0 Online FPC PIC 0 Online 4x GE Base PIC Slot 2 Online FPC PIC 0 Online 24x GE gPIM Slot 6 Online FPC PIC 0 Online 2x 10G gPIM node1: -------------------------------------------------------------------------- Slot 0 Online FPC PIC 0 Online 4x GE Base PIC Slot 2 Online FPC PIC 0 Online 24x GE gPIM Slot 6 Online FPC PIC 0 Online 2x 10G gPIM
-
Yes: Reseat the cards and reboot the node. If this does not resolve the issue, open a case with your technical support representative for further troubleshooting. See Data Collection for Customer Support.
-
No: Proceed to Step 4.
-
-
Run the
show chassis cluster statistics
on both nodes, and analyze the output.{primary:node0} root@J-SRX> show chassis cluster statistics Control link statistics: Control link 0: Heartbeat packets sent: 418410 Heartbeat packets received: 418406 Heartbeat packet errors: 0 Fabric link statistics: Probes sent: 418407 Probes received: 414896 Probe errors: 0
Does the
Heartbeat packets received
field show a non-increasing value or zero (0), or does theHeartbeat packet errors
field show a non-zero value?-
Yes: Open a case with your technical support representative for further troubleshooting. See Data Collection for Customer Support.
-
No: Proceed to Step 5.
-
-
Configure
set chassis cluster no-fabric-monitoring
(hidden option) and commit the configuration to temporarily disable fabric monitoring during the troubleshooting process. Reboot the disabled node. After the node reboots, run theshow chassis cluster statistics
command. Are the probes still lost?-
Yes: Open a case with your technical support representative for further troubleshooting. See Data Collection for Customer Support
-
No: Delete the
set chassis cluster no-fabric-monitoring
configuration, and verify that everything is operational. If you notice any issue, open a case with your technical support representative for further troubleshooting. See Data Collection for Customer Support
-