Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

header-navigation

Juniper Scale-Out Stateful Firewall and CGNAT for SP Edge — JVD

keyboard_arrow_up
list Table of Contents

Use Case and Reference Architecture

date_range 19-Dec-24
JVD-MSE-SCALEOUT-CGNAT-SP-01-01

Solution Functional Elements

Juniper Off-Box Security Services solution architecture includes two main functional blocks:

The security services device formed by a standalone vSRX virtual network functions or SRX4600 or a redundant pair of the same device. This section focuses on the standalone use case, former section shares details on the redundant solution architectures.

The MX Series Router as load balancer router - The Juniper MX Series Routers provide 100G or 400G interfaces to the servers hosting vSRXs or the SRX4600s forming the complex of services. Both access side and Internet side peering (see Figure 1 for reference) are enabled through MX Series Router dedicated ports being used for high throughput.

Figure 1: Scale-Out Solution Functional Blocks Scale-Out Solution Functional Blocks

With the new Trio 6 MX10004 and 10008 systems, capacity per slot is up to 9.6 Tbps and with compact MX304 systems, capacity per slot is up to 4.8 Tbps, enabling a high number of 100G ports. An MX304 router can provide up to 48 x 100G interfaces and an LC9600 line card in a modular MX10000 system, up to 96 x 100G ports.

To optimize port usage, it is recommended to implement an intermediate distribution layer with two (or more) QFX-series switches to aggregate multiple SRX/vSRX Series Firewalls nodes into a bundled 400GE links on the MX Series Router.

If vSRX firewall is the choice for the security element, it can be rolled out on top of the KVM or VMware virtual network function, running on open compute servers. You can bring your own server based on prescribed server specifications (CPU cores, memory, Linux OS, KVM versions). For more information about the server specifications, see vSRX server specifications in the references.

vSRX is a Virtual Network Function (VNF) running on KVM or VMware hypervisors, with a flexible compute server allocated by number of cores (up to 32) and memory (up to 64G). Networking wise vSRX can use virtio or SR-IOV with smart NICs like Mellanox ConnectX-6.

A complete Off-Box solution requires implementation at three fundamental layers: Data, Control, and Management layers. This solution enables consistent traffic flow through the service complex in both directions, addresses high availability requirements, and simplified operations and management of multiple systems.

For this JVD, an external BGP (eBGP) protocol with BFD provides a routing and control function between network elements of the complex while implementing load balancing with two approaches:

  • Equal Cost Multi-Path (ECMP) load balancing function with Consistent Hashing (CHASH)
  • RE based traffic load balancer function (TLB) on MX Series Router

Two routing instances – Access and Internet – are used on MX Series Router to peer with corresponding network segments of the SP network infrastructure and the security node. eBGP enables scalable and flexible exchange of routing information for the access and the Internet side routing (see Figure 2). The failure detection is based on BFD with timers as low as 100ms, enabling fast reconvergence and fast and automatic adjustment for the ECMP load balancing.

To maintain higher level of security in future applications like Managed Enterprise Firewall service - where injection of your routes into the security layer is not preferred - static routes with BFD protection are the preferred control and traffic distribution method.

The access side traffic is load balanced between services nodes dynamically based on ECMP with source IPv4 or IPv6 addresses consistent hashing. For the CGNAT and SFW on the Internet side, eBGP routing and BFD failure detection is required. Destination based IPv4 or IPv6 ECMP consistent hashing (CHASH) is used on the Internet side with stateful firewall services without NAT.

Essentially ECMP with CHASH limits the impact on existing traffic flows in the event of service node failure or addition of new service node to the complex. On service node failure, impacted events flows are rehashed and rebalanced, while on addition of new service modes, limited equal number of flows from each member in cluster are rehashed and rebalanced in the new member in the cluster, limiting the impact while maintaining the equal cost load balancing.

Figure 2: CHASH Based Network Architecture CHASH Based Network Architecture

This architecture allows you to scale the service complex with tens of service nodes (SRX Series Firewalls /vSRX) with efficient load balancing of flows between service nodes and minimizing the effect (blast radius) due to a single node failure. The eBGP routing on MX Series Router in its turn scales beyond Internet tables to millions of routes if required and easily beyond.

Solution Deployment Scenarios

Following the suggested solution architecture, a few deployment scenarios are considered where MX Series Router and SRX Series Firewalls are connected in either standalone or redundant pairs (see topologies). The architecture uses network redundancy mechanisms to provide flow resiliency between the MX forwarding layer and SRX Series Firewalls services layer (MNHA, aka L3 cluster, is explained later in the document). On dual MX with ECMP, a Service Redundancy Daemon (SRD) is used to monitor failure events to trigger a failover to the second MX Series Router. Note this is not required with TLB. Also, BFD protocol is used to achieve a quicker failover mechanism on routing when any other failure occurs. If SRX Series Firewall MNHA provides session synchronization (stateful sessions) between two nodes, then existing traffic and tunnels can continue to operate uninterrupted.

The following diagram shows the four main topologies covered in this JVD, combining standalone/dual MX with standalone/MNHA for SRX Series Firewall, each on a particular load balancing mechanism (ECMP or TLB). It uses three SRX Series Firewalls for the first topology and doubles them to three pairs for the other topologies.

Figure 3: Validated Topologies Validated Topologies

There are numerous trade-offs with each of the architectural choices. In general, complexity increases as more redundancy is added. For example, SRX Series Firewall MNHA pairs introduce some requirements like a network link for HA communications. There are also dependencies on which load balancing method is used on the MX Series Router (namely ECMP CHASH or TLB). This selection of topologies covers the most important considerations of simple to more redundancy scenarios.

  • ECMP CHASH is simple to use, leverages standard protocols and well known ECMP mechanism, which might be a preferable option for some SP or enterprise network operations department, though this method is limited when it comes to failover capabilities.
  • TLB has load balancing capabilities (at the time of publishing this JVD), which leverages services to load balancing, offers better redundancy capabilities, and can be multiplied with different local groups. It is useful when you need to combine different use cases on the same architecture. This method may not be backward compatible with older Junos releases.
Table 1: Validated Features Combination
Load Balancing Method Junos for MX Number of MX Routers Security Features SRX in MNHA Cluster Mode SRX in Standalone Mode
ECMP with CHASH 23.4R2 Single MX SFW/CGNAT No Yes
Dual MX (SRD) SFW/CGNAT Yes No
Traffic -Load -Balancer [TLB] 23.4R2 Single MX SFW/CGNAT Yes Yes
Dual MX with Health Checking SFW/CGNAT Yes Yes

Note that the scale-out solution only uses standard mechanisms and protocols between the components and does not require any special proprietary protocols. The exception is how load balancing is implemented internally (how the MX Series Router handles and distributes sessions). From a networking point of view, this solution uses standard protocols.

Following are some recommendations that may help you in selecting the deployment method.

Deployment Scenario 1 – ECMP CHASH – Single MX Router with Scaled Out Standalone SRXs (Multiple Individual SRX Series Firewalls)

This topology is simple and least redundant. The resiliency is provided at MX Series Router, with a redundant RE, PSU, etc however, there is no protection against MX-node failure. Deployment provides protection against service node failure by redistributing traffic flows between two remaining security nodes. Though there is no session synchronization between the SRX Series Firewalls, which leads to longer restoration time for the affected flows.

Figure 4: Deployment scenario 1 – ECMP CHASH - Single MX, Standalone SRXs A computer network connection with a green rectangular object Description automatically generated with medium confidence

Network operators that are not concerned about stateful failover may want to simply augment security service capacities by adding more SRX Series Firewalls. The application sessions may be short lived anyway (for example, a redundancy mechanism may be handled at the application level so session sync between two different firewalls is not required).

  • Pros: Simplicity and scaling with each individual SRX Series Firewalls
  • Cons: No redundancy

Deployment Scenario 2 – ECMP CHASH – Dual MX with Scaled Out MNHA SRX pairs (Multiple Pairs of SRX Series Firewall)

This topology does offer redundancy at both the MX Series Router and for each redundant SRX Series Firewall pair. The redundant pair of MX Series Router uses an SRD mechanism providing monitoring of physical elements of the network and/or the MX Series Router itself, as well as any other routing and system events that may need to trigger a failover to the other MX Series Router.

Figure 5: Topology 2 – ECMP CHASH - Dual MX with SRD, SRX MNHA Pairs A diagram of a cloud computing system Description automatically generated

In case of a network failure detected by the active MX Series Router, the second MX Series Router takes over the active role and all traffic is then redirected to this active MX Series Router. It means that traffic is sent to the previous backup SRX Series Firewall, becoming master of the MNHA pair. This architecture only allows use of one SRX Series Firewall of a pair at a time, basically the SRX Series Firewalls connected to the same MX Series Router. However, in case of any failover, the traffic continues across the second node of each MNHA pair.

On the SRX Series Firewalls side, MNHA allows both SRX Series Firewalls to handle and synchronize sessions and support any requested security services on both firewalls. Since this topology uses SRG0 as cluster mode, there is no need of failing over a SRX Series Firewalls to the other firewall in case of any failure detected by the MX Series Router (only when detected by SRX Series Firewalls itself). The session synchronization allows any traffic coming from the MX router (at SRD level) to process traffic for existing sessions, and any new sessions coming to it.

  • Pros: Simple redundancy and scaling with each SRX Series Firewalls pair
  • Cons: half of the architecture is active at a time

Deployment Scenario 3 – TLB – Single MX Scaled Out MNHA SRX Pairs (Multiple Pairs of SRX Series Firewalls)

This topology does offer redundancy for the SRX Series Firewalls however, not for the MX Series Router, though this one may have a second Routing Engine (RE) installed in the appropriate slot and is not using two MX chassis in that case.

Figure 6: Topology 3 – TLB - Single MX, SRX MNHA Pairs A diagram of a bus

MNHA offers sessions synchronization within a cluster and help with any failure scenario.

  • Pros: Redundancy and scaling with each SRX Series Firewalls pair
  • Cons: No redundancy on the router (except using dual RE)

Deployment Scenario 4 – TLB – Dual MX Scaled Out MNHA SRX Pairs (Multiple Pairs of SRX Series Firewalls)

This last topology offers the most redundancy for both MX Series Router and SRX Series Firewalls nodes and takes advantage of having all components used at the same time. Any failover scenario can be covered.

Figure 7: Topology 4 – TLB - Dual MX, SRX MNHA Pairs A diagram of a computer network Description automatically generated

MX Series Routers handle traffic on any of the two routers, while SRX Series Firewalls can be used either in Active/Backup role or in Active/Active role, making use of both nodes at the same time. This augments the capacity of the network during normal operation, however this leaves one active role at a time when a failure occurs (consider a single MNHA cluster).

Each SRX Series Firewall is connected to both MX Series Routers. If any of one node fails within a cluster, all other SRX Series Firewalls pairs might have an independent failover from the other SRX Series Firewalls pairs and the MX Series Router.

  • Pros: Full redundancy and scaling for MX Series Router and SRX Series Firewall pairs.
  • Cons: More interfaces used on the MX Series Router if directly connected. Then, an optional distribution layer can cover more connectivity needs when SRX Series Firewall count augments.
footer-navigation