Results Summary and Analysis
This JVD shows that scale-out can leverage the use of essential functions both on the MX Series Router and the SRX Series Firewalls for their respective target usage:
- MX Series Router is used as a load balancer with different options, ECMP CHASH and TLB.
- SRX Series Firewall is used as a security service with simple integration with the MX Series Router.
- Both physical SRX Series Firewalls and virtual SRX Series Firewalls are used the same way.
- Simple network integration using BGP and BFD helps on convergence time.
- The addition of new service nodes in this architecture can help to scale in many directions (performances, scaling, an so on) by simply adding new service nodes without disturbing the global service.
Performance/Scale
Though the maximum possible performance and scale capability of the system is out of scope of the JVD, the validation demonstrated the scale-out property of the complex and ability to demonstrate linear performance and scale growth by adding new service elements to the complex. Initial test is done with a single SRX Series Firewall pair to a typical combination of traffic (100Gbps) as a baseline, a second SRX Series Firewall pair is added to the first one to validate the addition of the same capacity that the first pair is handling (see table later showing tested scaling/performance per SRX Series Firewall pair).
In this case, performance and scaling linearity is obvious when adding more SRX Series Firewall pairs, as the MX Series Router is agnostic to the number of sessions. The amount of traffic stays within MX-PFE throughput limits, every new MNHA pair adds a similar amount of performance to the scale-out complex.
To understand how to reach a maximum performance/capacity, calculate it with an example. Add any number of SRX Series Firewalls until the capacity of the router is reached (for example 3.2Tbps of forwarding capacity with redundant REs or 4.8Tbps without REs, which is high) or its maximum port capacity (for example 16 x 100GE links per line card, up to two cards with redundant REs or three-line cards without REs).
On the SRX Series Firewall side, scaling also depends on the traffic type and its ability to analyze content. More content, more work it needs. Taking 200Gbps tested would then reach MX304 with two-line cards at 3.2Tbps / 200Gbps = 16 SRX, or three-line cards at 4.8Tbps / 200Gbps = 24 SRX. The second MX Series Router and other SRX Series Firewall, the second members of each pair as backup to be able to handle a full load in case of large failure.
Counting the number of available ports (without a distribution layer like QFX) would provide MX304 with two-line cards (and 2 RE) * 16 ports = 32 ports, or three-line cards (and 1 RE) * 16 ports = 48 ports. This is within theorical limits as it does not consider the use of aggregate interface (2 ports) per SRX Series Firewalls, which divides that number by 2.
Load Balancing
ECMP Consistent Hashing has shown steady restoration times in milliseconds.
Using TLB on MX Series Router platforms shows that it also works with non-tested MX Series Router here, where TLB uses a control function on the RE (like MX304) or on a service card (for example, MS-MPC for MX240). TLB has been in Junos since Junos OS Release 18.1R1 when BGP acquired multipath function. This connection with BGP results in service providers often using it internally and externally.
TLB scenario is working with restoration timers and shows flexibility in deployment options (like single or dual MX Series Router is used) as well as a better handling of SRX Series Firewalls in the MNHA cluster.
Security Services
The SRX Series Firewalls features leveraged in this JVD focuses on stateful firewall and CGNAT and did not get into higher layer security features. The fact that scale-out architecture can handle standalone and SRX clusters, using an even distribution among multiple SRX Series Firewalls, without disturbing traffic, shows that the SRX Layer 7 security service can easily be added to this usage.
Note that, with ECMP, all SRX Series Firewalls need to be of the same model whereas, with TLB, this can leverage the notion of TLB groups to have groups of usage (for example, some SRX Series Firewalls in a SFW groups and other SRX Series Firewalls in a CGNAT group). The number of groups is around 2,000 per MX Series Router and the number of SRX Series Firewalls member is around 256. These numbers give a large potential for future use.
Carrier Grade NAT
About CGNAT, the logging capability is not specifically mentioned however can be a key factor. And it is to be noticed that a scalable syslog environment can be set on both sides of the MX Series Router and the SRX Series Firewalls, using their capacity to generate logs at the PFE level, logging at fast rate. Some local laws in various countries need to log a lot of security events and, more simply, what IP addresses have participated in specific events. Then the IP address and NAT attribution is important in those logs. Depending on the CGNAT used and the quantity of security policies actively logging, the solution can generate a fair number of logs.
CGNAT can use deterministic NAT to limit the need for logging related to NAT and port attribution (determined by known algorithm). Or another option is to use PBA (Port Block Allocation) where the ports are allocated during periods and then this attribution event is logged (begin, update, and release).
Each security policy on a SRX Series Firewalls can have the action, “log session-init”, “log session-update”, and “log session-close”. Each action generates a log with the real source/destination IP/port and the NATed source/destination IP/port. This is the feature that may generate the most log quantity. Also, for CGNAT, “category nat” needs to be logged in “security log stream” to record PBA “ALLOC” and “RELEASE” messages.
Scale-Out vs Chassis
This Scale-Out solution is considered as an alternative to the monolithic scale-up approach with the chassis based SRX Series Firewalls or security services on MX960/480 with MX-SPC3 service cards. However, nothing prevents such architectures of being used to benefit from both to leverage the possibility to add new services and the power of those existing platforms. The upcoming small platforms like MX304 and SRX4700 helps to create a smaller footprint architecture.
Management and Automation
On the management front, configuration automation is not covered. However, it is used to help build and test the solution with various use cases and tests. Basically, scripting is used with Junos access using Netconf. Lots of scripting already exists in the field (or Juniper automation places like GitHub) using Ansible, Terraform, Python, PyEZ (Python Easy for Junos), and so on. Some advanced users have already scripted their Junos, mostly in the service provider space, where APIs are important to integrate with their own management framework. Tools can be created to help with management and automation, running either on-box (on the router itself as it is Python capable) or off-box on a Linux server.
Security Director (on-prem or Security Director Cloud) have a major place for delivering common configurations to the security service layer (like security policies, address objects, and NAT pools), and for providing visibility on the security events and logs generated by each SRX Series Firewalls.
Routing
Junos integration with BGP peering between the MX Series Router and the SRX Series Firewall, includes the right BFD timers, allows you to create a perfectly matching environment with all Juniper solutions working seamlessly together. The redundancy of each router and security solution allows you to maintain steady traffic while providing for the addition of new capacities in a simple way. Similar configuration statements for MX Series Routers and SRX Series Firewalls allows a simple and seamless management of this solution.