Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Example: Configuring an SRX Series Services Gateway as a Full Mesh Chassis Cluster

This example shows how to set up basic active/passive full mesh chassis clustering on a high-end SRX Series device.

Requirements

This example uses the following hardware and software components:

  • Two Juniper Networks SRX5800 Services Gateways with identical hardware configurations running Junos OS Release 9.6 or later.

  • Two Juniper Networks MX480 3D Universal Edge Routers running Junos OS Release 9.6 or later.

  • Two Juniper Networks EX9214 Ethernet Switches running Junos OS Release 9.6 or later.

Note:

This configuration example has been tested using the software release listed and is assumed to work on all later releases.

Before you begin:

  • Physically connect the two SRX Services Gateways (back-to-back for the fabric and control ports).

Overview

This example shows how to set up basic active/passive full mesh chassis clustering on a pair of high-end SRX Series devices. Full mesh active/passive clustering allows you to set up an environment that does not have a single point of failure, not only on the SRX Series devices but also on the surrounding network devices. The main difference in the full mesh deployment described in this example and the basic active/passive deployment described in Configuring an Active/Passive Chassis Cluster Deployment is that additional design elements must be considered to accommodate recovery of possible failure scenarios.

Full mesh chassis clustering requires you to configure reth interfaces for each node and ensure that they are connected together by one or more switches. In this scenario, shown in Figure 1, there are four reth interfaces (reth0, reth1, reth2, and reth3). A reth interface bundles the two physical interfaces (one from each node) together. A reth interface is part of a redundancy group. Only the member that is on the primary node (active) for the redundancy group is active. The member on the secondary (passive) node is completely inactive, that is, it does not send or receive any traffic.

Each reth interface can have one or more logical or subinterfaces (for example, reth 0.0, reth0.1, and so forth). Each must use a different VLAN tag.

The full mesh active/passive chassis cluster consists of two devices:

  • One device actively provides routing, firewall, NAT, VPN, and security services, along with maintaining control of the chassis cluster.

  • The other device passively maintains its state for cluster failover capabilities should the active device become inactive.

Figure 1 shows the topology used in this example.

Figure 1: Full Mesh Active/Passive Chassis Clustering Topology on a Pair of High-End SRX Series Devices Full Mesh Active/Passive Chassis Clustering Topology on a Pair of High-End SRX Series Devices

Configuration

To configure this example, perform the following procedures:

Configuring the Control Ports

Step-by-Step Procedure

Select FPC 1/13, because the central point (CP) is always on the lowest SPC/SPU in the cluster (for this example, it is slot 0). For maximum reliability, place the control ports on a separate SPC from the central point (for this example, use the SPC in slot 1). You must enter the operational mode commands on both devices.

Note:

Control port configuration is required for SRX5600 and SRX5800 devices.

To configure the control port for each device, and commit the configuration:

  1. Configure the control port for the SRX5800-1 (node 0) and commit the configuration.

  2. Configure the control port for the SRX5800-2 (node 1) and commit the configuration.

Enabling Cluster Mode

Step-by-Step Procedure

Set the two devices to cluster mode. A reboot is required to enter into cluster mode after the cluster ID and node ID are set. You can cause the system to boot automatically by including the reboot parameter in the CLI. You must enter the operational mode commands on both devices. When the system boots, both the nodes come up as a cluster.

Note:

Since there is only a single cluster on the segments, this example uses cluster ID 1 with Device SRX5800-1 as node 0 and Device SRX5800-2 as node 1.

To set the two devices in cluster mode:

  1. Enable cluster mode on the SRX5800-1 (node 0).

  2. Enable cluster mode on the SRX5800-2 (node 1).

    Note:

    If you have multiple SRX device clusters on a single broadcast domain, make sure that you assign different cluster IDs to each cluster to avoid a MAC address conflict.

    The cluster ID is the same on both devices, but the node ID must be different because one device is node 0 and the other device is node 1. The range for the cluster ID is 1 through 15. Setting a cluster ID to 0 is equivalent to disabling a cluster.

    Now the devices are a pair. From this point forward, configuration of the cluster is synchronized between the node members, and the two separate devices function as one device.

Configuring Cluster Mode

Step-by-Step Procedure

Note:

In cluster mode, the cluster is synchronized between the nodes when you execute a commit command. All commands are applied to both nodes regardless of which device the command is configured on.

To configure a chassis cluster on a high-end SRX Series device:

  1. Configure the fabric (data) ports of the cluster that are used to pass real-time objects (RTOs) in active/passive mode. Define two fabric interfaces, one on each chassis, to connect together.

  2. Because the SRX Services Gateway chassis cluster configuration is contained within a single common configuration, use the Junos OS node-specific configuration method called groups to assign some elements of the configuration to a specific member only.

    The set apply-groups ${node} command uses the node variable to define how the groups are applied to the nodes. Each node recognizes its number and accepts the configuration accordingly. You must also configure out-of-band management on the fxp0 interface of the SRX5800 Services Gateway using separate IP addresses for the individual control planes of the cluster.

    Note:

    Configuring the backup router destination address as x.x.x.0/0 is not allowed.

  3. Configure redundancy groups for chassis clustering. Each node has interfaces in a redundancy group where interfaces are active in active redundancy groups (multiple active interfaces can exist in one redundancy group).

    Redundancy group 0 controls the control plane and redundancy group 1+ controls the data plane and includes the data plane ports. For any active/passive mode cluster, only redundancy groups 0 and 1 need to be configured. Use four reth interfaces, all of which are members of redundancy group 1. Besides redundancy groups, you must also define:

    • Redundant Ethernet Interface count—Configure how many redundant Ethernet interfaces (reth) can possibly be configured so that the system can allocate the appropriate resources for it.

    • Priority for control plane and data plane—Define which device has priority (for chassis cluster, high priority is preferred) for the control plane, and which device is preferred to be active for the data plane.

      Note:

      In active/passive or active/active mode, the control plane (redundancy group 0) can be active on a chassis different from the data plane (redundancy group 1+ and groups) chassis. However, for this example, we recommend having both the control and data plane active on the same chassis member. When traffic passes through the fabric link to go to another member node, latency is introduced.

  4. Configure the data interfaces on the platform so that in the event of a data plane failover, the other chassis cluster member can take over the connection seamlessly.

    Seamless transition to a new active node occurs with data plane failover. In case of control plane failover, all the daemons are restarted on the new node. Because of this, enabling graceful restart for relevant routing protocols is strongly recommended to avoid losing neighborships with peers. This promotes a seamless transition to the new node without any packet loss.

    Define the following items:

    • Membership information of the member interfaces to the reth interface.

    • Which redundancy group the reth interface is a member of. For this active/passive example, it is always 1.

    • The reth interface information such as the IP address of the interface.

  5. Configure the chassis cluster behavior in case of a failure.

    Each interface is configured with a weight value that is deducted from the redundancy group threshold of 255 upon a link loss. The failover threshold is hard coded at 255 and cannot be changed. You can alter an interface link’s weight to determine the impact on the chassis failover.

    When a redundancy group threshold reaches 0, that redundancy group fails over to the secondary node.

    Enter the following commands on the SRX5800-1:

    This step completes the chassis cluster configuration part of the active/passive mode example for the SRX5800. The rest of this procedure describes how to configure the zone, virtual router, routing, the EX9214, and the MX480 to complete the deployment scenario.

Configuring Zones, Security Policy, and Protocols

Step-by-Step Procedure

Configure zones and add the appropriate reth interfaces, and configure OSPF.

To configure zones and OSPF:

  1. Configure two zones and add the appropriate reth interfaces .

  2. Permit the appropriate protocols and services to reach interfaces in the Trust and Untrust zones.

  3. Configure a security policy to permit traffic from the trust zone to the untrust zone.

  4. Configure OSPF.

Configuring the EX9214-1

Step-by-Step Procedure

For the EX9214 Ethernet switches, the following commands provide only an outline of the applicable configuration as it pertains to this active/passive full mesh example for the SRX5800 Services Gateway; most notably the VLANs, routing, and interface configuration.

To configure the EX9214-1:

  1. Configure the interfaces.

  2. Configure VRRP between the two EX switches.

  3. Configure the VLANs.

  4. Configure the protocols.

Configuring the EX9214-2

Step-by-Step Procedure

To configure the EX9214-2:

  1. Configure the interfaces.

  2. Configure VRRP between the two EX switches.

  3. Configure the VLANs.

  4. Configure the protocols.

Configure MX480-1

Step-by-Step Procedure

For the MX480 Edge Routers, the following commands provide only an outline of the applicable configuration as it pertains to this active/passive mode example for the SRX5800 Services Gateway; most notably you must use an IRB interface within a virtual switch instance on the switch.

To configure the MX480-1:

  1. Configure the downstream interfaces.

  2. Configure the upstream interface.

  3. Configure the IRB interface.

  4. Configure a static route and graceful restart.

  5. Configure the bridge domain.

  6. Configure OSPF.

Configuring the MX480-2

Step-by-Step Procedure

To configure the MX480-2:

  1. Configure the downstream interfaces.

  2. Configure the upstream interface.

  3. Configure the IRB interface.

  4. Configure a static route and graceful restart.

  5. Configure the bridge domain.

  6. Configure OSPF.

Configuring Miscellaneous Settings

Step-by-Step Procedure

This full mesh chassis clustering example for the SRX5800 does not describe in detail miscellaneous configurations such as how to configure NAT, security policies, or VPNs. They are essentially the same as they would be for standalone configurations.

However, if you are performing proxy ARP in chassis cluster configurations, you must apply the proxy ARP configurations to the reth interfaces rather than the member interfaces because the reth interfaces hold the logical configurations.

You can also configure separate logical interface configurations using VLANs and trunked interfaces in the SRX5800. These configurations are similar to the standalone implementations using VLANs and trunked interfaces.

Verification

To confirm that the configuration is working properly, perform these tasks:

Verify Chassis Cluster Status

Purpose

Verify the chassis cluster status, failover status, and redundancy group information.

Action

From operational mode, enter the show chassis cluster status command.

Meaning

The sample output shows the status of the primary and secondary nodes and that there are no manual fail overs.

Verify Chassis Cluster Interfaces

Purpose

Verify information about chassis cluster interfaces.

Action

From operational mode, enter the show chassis cluster interfaces command.

Meaning

The sample output shows each interface’s status, weight value, and the redundancy group to which that interface belongs.

Verify Chassis Cluster Statistics

Purpose

Verify information about chassis cluster services and control link statistics (heartbeats sent and received), fabric link statistics (probes sent and received), and the number of real-time objects (RTOs) sent and received for services.

Action

From operational mode, enter the show chassis cluster statistics command.

Meaning

Use the sample output to:

  • Verify that the Heartbeat packets sent is incrementing.

  • Verify that the Heartbeat packets received is a number close to the number of Heartbeats packets sent.

  • Verify that the Heartbeats packets errors is zero.

This verifies that the heartbeat packets are being transmitted and received without errors.

Verify Chassis Cluster Control Plane Statistics

Purpose

Verify information about chassis cluster control plane statistics (heartbeats sent and received) and the fabric link statistics (probes sent and received).

Action

From operational mode, enter the show chassis cluster control-plane statistics command.

Meaning

Use the sample output to:

  • Verify that the Heartbeat packets sent is incrementing.

  • Verify that the Heartbeat packets received is a number close to the number of Heartbeats packets sent.

  • Verify that the Heartbeats packets errors is zero.

This verifies that the heartbeat packets are being transmitted and received without errors.

Verifying Chassis Cluster Data Plane Statistics

Purpose

Verify information about the number of real-time objects (RTOs) sent and received for services.

Action

From operational mode, enter the show chassis cluster data-plane statistics command.

Meaning

The sample output shows the RTOs sent and received for various services.

Verifying Chassis Cluster Redundancy Group Status

Purpose

Verify the state and priority of both nodes in a cluster and information about whether the primary node has been preempted or whether there has been a manual failover.

Action

From operational mode, enter the chassis cluster status redundancy-group command.

Meaning

The sample output shows the status of the primary and secondary nodes and that there are no manual fail overs.

Verify Connection on EX Device

Purpose

Verify the connection from EX device.

Action

From operational mode, enter these ping 192.168.1.1 count 2 and traceroute 192.168.1.1 commands.

Troubleshoot with Logs

Purpose

Look at the system logs to identify any chassis cluster issues. You should look at the system log files on both nodes.

Action

From operational mode, enter these show log commands.