- play_arrow Overview
- play_arrow Setting Up a Chassis Cluster
- SRX Series Chassis Cluster Configuration Overview
- SRX Series Chassis Cluster Slot Numbering and Logical Interface Naming
- Preparing Your Equipment for Chassis Cluster Formation
- Connecting SRX Series Firewalls to Create a Chassis Cluster
- Example: Setting the Node ID and Cluster ID for Security Devices in a Chassis Cluster
- Chassis Cluster Management Interfaces
- Chassis Cluster Fabric Interfaces
- Chassis Cluster Control Plane Interfaces
- Chassis Cluster Redundancy Groups
- Chassis Cluster Redundant Ethernet Interfaces
- Configuring Chassis Clustering on SRX Series Devices
- Example: Enabling Eight-Queue Class of Service on Redundant Ethernet Interfaces on SRX Series Firewalls in a Chassis Cluster
- Conditional Route Advertisement over Redundant Ethernet Interfaces on SRX Series Firewalls in a Chassis Cluster
- play_arrow Configuring Redundancy and Failover in a Chassis Cluster
- Chassis Cluster Dual Control Links
- Chassis Cluster Dual Fabric Links
- Monitoring of Global-Level Objects in a Chassis Cluster
- Monitoring Chassis Cluster Interfaces
- Monitoring IP Addresses on a Chassis Cluster
- Configuring Cluster Failover Parameters
- Understanding Chassis Cluster Resiliency
- Chassis Cluster Redundancy Group Failover
- play_arrow Chassis Cluster Operations
- Aggregated Ethernet Interfaces in a Chassis Cluster
- NTP Time Synchronization on Chassis Cluster
- Active/Passive Chassis Cluster Deployments
- Example: Configuring an SRX Series Services Gateway as a Full Mesh Chassis Cluster
- Example: Configuring an Active/Active Layer 3 Cluster Deployment
- Multicast Routing and Asymmetric Routing on Chassis Cluster
- Ethernet Switching on Chassis Cluster
- Media Access Control Security (MACsec) on Chassis Cluster
- Understanding SCTP Behavior in Chassis Cluster
- Example: Encrypting Messages Between Two Nodes in a Chassis Cluster
- play_arrow Upgrading or Disabling a Chassis Cluster
- play_arrow Configuration Statements and Operational Commands
- play_arrow Chassis Cluster Support on SRX100, SRX210, SRX220, SRX240, SRX550M, SRX650, SRX1400, SRX3400, and SRX3600 Devices
Troubleshooting an SRX Chassis Cluster with One Node in the Primary State and the Other Node in the Lost State
Problem
Description
The nodes of the SRX chassis cluster are in primary and lost states.
Environment
SRX chassis cluster
Symptoms
One node of the cluster is in the primary
state and the other node is in the lost state. Run the show chassis
cluster status
command on each node to view the status of the
node. Here is a sample output:
{primary:node0} root@primary-srx> show chassis cluster status Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 100 primary no no node1 0 lost no no Redundancy group: 1 , Failover count: 1 node0 100 primary no no node1 0 lost no no
Diagnosis
Is the node that is in the lost state powered on?
Yes: Are you able to access the node that is in the lost state through a console port? Do not use Telnet or SSH to access the node.
If you are able to access the node, proceed to Step 3.
If you are unable to access the node and if the device is at a remote location, access the node through a console for further troubleshooting. If you have console access, but do not see any output, it might indicate a hardware issue. Open a case with your technical support representative for further troubleshooting. See Data Collection for Customer Support.
No: Power on the node and proceed to Step 2.
After both nodes are powered on, run the
show chassis cluster status
command again. Is the node still in the lost state?Yes: Are you able to access the node that is in the lost state through a console port? Do not use Telnet or SSH to access the node.
If you are able to access the node, proceed to Step 3.
If you are unable to access the node and if the node is at a remote location, access the node through a console for further troubleshooting. If you have console access, but do not see any output, it might indicate a hardware issue. Open a case with your technical support representative for further troubleshooting. See Data Collection for Customer Support.
No: Powering on the device has resolved the issue.
Connect a console to the primary node, and run the
show chassis cluster status
command. Does the output show this node as primary and the other node as lost?Yes: This might indicate a split-brain scenario. Each node would show itself as primary and the other node as lost. Run the following commands to verify which node is processing the traffic:
show security monitoring
show security flow session summary
monitor interface traffic
Isolate the node that is not processing the traffic. You can isolate the node from the network by removing all the cables except the control and fabric links. Proceed to Step 4.
No: Proceed to Step 4.
Verify that all the FPCs are online on the node that is in the lost state by running the
show chassis fpc pic-status
command. Are all the FPCs online?Yes: Proceed to Step 5.
No: Open a case with your technical support representative for further troubleshooting. See Data Collection for Customer Support.
Are the nodes connected through a switch?
Yes: See Troubleshooting a Fabric Link Failure in an SRX Chassis Cluster and Troubleshooting a Control Link Failure in an SRX Chassis Cluster.
No: Proceed to Step 6.
Create a backup of the configuration from the node that is currently primary:
content_copy zoom_out_map{primary:node0}
root@primary-srx# show configuration | save /var/tmp/cfg-bkp.txt
Copy the configuration to the node that is in the lost state, and load the configuration:
content_copy zoom_out_maproot@lost-srx# load override <terminal or filename>
Note:If you use the
terminal
option, paste the complete configuration into the window. Make sure that you use Ctrl+D at the end of the configuration.If you use the
filename
option, provide the path to the configuration file (for example: /var/tmp/Primary_saved.conf), and press Enter.When you connect to the node in the lost state through a console, you might see the state as either primary or hold/disabled. If the node is in the hold/disabled state, a fabric link failure might have occurred before the device went into the lost state. To troubleshoot this issue, follow the steps in Troubleshooting a Fabric Link Failure in an SRX Chassis Cluster.
Commit the changes after the configuration is loaded. If the problem persists, then replace the existing control and fabric links on this device with new cables and reboot the node:
content_copy zoom_out_map{primary:node1}[edit]
root@lost-srx# request system reboot
Is the issue resolved?
No: Open a case with your technical support representative for further troubleshooting. See Data Collection for Customer Support.