Understanding Chassis Cluster Resiliency
Junos OS uses a layered model to provide resiliency on SRX Series Firewalls that are in a chassis cluster setup. In the event of a software or hardware component failure, the layered model ensures that the system performance is not impacted.
Layer 1 for Detecting Hardware Faults and Software Failures
Layer 1 identifies and detects the components that are causing the software failures and impacting the system performance. An alarm, syslog, or an SNMP trap is triggered to provide notifications about the failures.
Layer 2 for Probing Critical Paths
Layer 2 probes the system’s critical paths to detect hardware and software failures that are not detected by Layer 1.
Heartbeat communications validate the state of the paths between the two endpoints of the path. If any component in the path fails, communication is lost and the system health status is communicated using heartbeat messages sent from one end of the path to the other end.
Layer 3 for Detecting Control Link and Fabric Link Failure
Layer 3 determines the system health information from Layer 1 and Layer 2, shares the health status between two nodes over the control links and fabric links, and makes the failover decision based on the health status of the two nodes and the heartbeat status of the control links and fabric links. An alarm, syslog, or an SNMP trap is triggered to provide notifications about the failures.
Layer 3 addresses the following software issues:
em0 flapping
Control path hardware or software component fails
Fabric link is down and control link is alive
Control link is down and fabric link is alive
Both the control link and fabric link are down
Benefits
Improve the failover time and stability.
Identify the exact location of the fault or failure.