Understanding the Algorithm Used to Load Balance Traffic on MX Series Routers
When a packet is received on the ingress interface of a device, the packet forwarding engine (PFE) performs a look up to identify the forwarding next hop. If there are multiple equal-cost paths (ECMPs) to the same next-hop destination, the ingress PFE can be configured to distribute the flow between the next hops. Likewise, distribution of traffic may be required between the member links of an aggregated interface such as aggregated Ethernet. The selection of the actual forwarding next-hop is based on the hash computation result over select packet header fields and several internal fields such as interface index. You can configure some of the fields that are used by the hashing algorithm.
For MX series routers with Modular Port Concentrators (MPCs) and Type 5 FPCs, configure the hash for the supported traffic types at the
forwarding-options enhanced-hash-key
hierarchy level. Details on which fields are included by default for which traffic family can be found below.In Junos OS Release 18.3R1, the default method for calculating the enhanced-hash was changed to provide improved entropy for IP tunnels, IPv6 flows and PPPoE payloads transmitted as family multiservice. These defaults can be disabled by setting their respective no- commands.
For MX series routers with DPCs, configure the hash for the supported traffic types at the
forwarding-options hash-key
hierarchy level.
Junos supports different types of load balancing.
-
Per-prefix load balancing –Each prefix is mapped to only one forwarding next-hop.
-
Per-packet load balancing–All next-hop addresses for a destination in the active route are installed in the forwarding table (the term per-packet load balancing in Junos is equivalent to what other vendors may call per-flow load balancing). See Configuring Per-Packet Load Balancing for more information.
-
Random packet load balancing–Next-hops are picked randomly for each packet. This method is available on MX routers with MPC line cards for Aggregated Ethernet interfaces and ECMP paths. To configure per-packet random spray load balancing, include the
per-packet
statement at the[edit interfaces aex aggregated-ether-options load-balance]
hierarchy level. SeeExample: Configuring Aggregated Ethernet Load Balancing for more information. -
Per-Packet Random Spray Load Balancing –When the adaptive load-balancing option fails, per-packet random spray load balancing serves as a last resort. It ensures that the members of ECMP are equally loaded without taking bandwidth into consideration. Per packet causes packet reordering and hence is recommended only if the applications absorb reordering. Per-packet random spray eliminates traffic imbalance that occurs as a result of software errors, except for packet hash.
Starting in Junos OS Release 20.2R1, you can configure per packet random load balancing on MX240, MX480, and MX960 routers with MPC10E (MPC10E-15C-MRATE and MPC10E-10C-MRATE) line card and MX2010 and MX2020 routers with MX2K-MPC11E line card.
-
Adaptive Load Balancing - Adaptive Load Balancing (ALB) is a method that corrects a genuine traffic imbalance by using a feedback mechanism to distribute the traffic across the links in an aggregated Ethernet bundle and on equal-cost multipath (ECMP) next hops. ALB optimizes traffic distribution when packet flows have widely varying traffic rates. ALB uses a feedback mechanism to correct traffic load imbalance by adjusting the bandwidth and packet streams on links within an AE bundle.
-
ALB on multiple Packet Forwarding Engines for aggregated Ethernet bundles
Starting in Junos OS Release 20.1R1, on MX Series MPCs, on aggregate Ethernet Bundles ALB redistributes the traffic evenly across multiple ingress Packet Forwarding Engines (PFE) on the same line card. In earlier releases, ALB was limited to a single PFE while redistributing traffic in an AE bundle. This impacted flexibility and redundancy. ALB is disabled by default.
You can configure ALB by setting the
adaptive
statement at the[edit interfaces ae-interface aggregated-ether-options load-balance]
hierarchy level.See Configuring Adaptive Load Balancing for more information.
-
ALB on multiple PFEs for ECMP next hops
Starting in Junos OS Release 20.1R1, you can configure ALB for ECMP next hops across multiple ingress PFEs on the same line card for even distribution of the traffic and redundancy. In earlier releases, ALB for ECMP next hops was limited to a single PFE. This limitation impacted flexibility and redundancy. ALB dynamically monitors the traffic load contributed by each flow in relation to overall ECMP link loading levels, and then takes corrective action when the threshold is reached.
You can configure ALB for ECMP next hops by configuring the
ecmp-alb
command under the[edit chassis]
hierarchy level.See ecmp-alb for more information.
Note:ALB will work for multiple PFEs residing on the same line card. This feature will not be supported for PFEs residing on different line cards.
For PFEs residing on different line cards, ingress traffic can cause an uneven load on the egress ports, even if the ALB is enabled.
-
Several additional configuration options are also available:
Per-slot hash function configuration –This method is based on a unique, load-balance hash value for each PIC slot and is only valid for M120, M320, and MX Series routers with DPCE and MS-DPC line cards.
Symmetrical load balancing –This method provides symmetrical load balancing on an 802.3ad LAG. The hash used for symmetrical load balancing is set at the
interface
level of the hierarchy. It ensures that a given flow of duplex traffic traverses the same devices in both directions, and is available on MX Series routers.
MX MPC and T-Series Type 5 FPC Specifics
The hash computation algorithm on MX MPC and T Series Type 5 FPCs produces identical results for packets with swapped layer 3 addresses or layer 4 transport ports. For example, the hash computation result for a packet with source address 192.0.2.1 and destination address 203.0.113.1 is identical to the hash computation result for a packet with source address 203.0.113.1 and destination address 192.0.2.1.
To avoid possible packet re-ordering, layer 4 transport protocol ports are never used
in hash computation for fragmented IPv4 packets. This is true for the first fragment of the
flow, identified by the more fragment
bit in a header, and
all subsequent fragments, identified by non-zero fragment offset. The first fragment and subsequent
fragments are always forwarded over same next-hop.
Hashing Algorithm Used in Junos 18.3R1 and later
In most cases, including layer 3 and layer 4 field information in the hash calculation produces results that are good enough for equitable distribution for traffic. However, in cases such as IP-in-IP or GRE tunneling, layer 3 and layer 4 field information alone may not be enough to produce a hash with sufficient entropy for load balancing. For example, in a deployment where MX series routers transit GRE flows, the GRE encapsulation tunnels typically occur as a single flow with the same source and destination, and same GRE key. Fat flows can also markedly increase the imbalance in link utilization, as traffic volume over the tunnels increases. Another example is when MX PE routers are being used as VPLS PE devices in a subscriber edge deployment where the routers back-haul broadband subscriber traffic from the access devices to a central broadband network gateway (BNG). In such a case, only the subscriber MAC addresses and the BNG router MAC addresses are available for hashing. But with few BNG MACs and relatively few subscriber MACs, the typical layer 3 and layer 4 fields are not sufficient to create a hash for optimal load balancing.
Therefore, for MX series routers with Trio MPCs and running Junos OS Release 18.3R1
or later, the default enhanced-hash-key
calculation has changed. A summary of the
changes is listed here:
For GRE packets, if the outer IP packet is not a fragmented packet (first fragment or any subsequent fragment), and the inner packet is IPv4 or IPv6, then the source and destination addresses from the inner packet are used in the hash computation in addition to the outer source and destination addresses. Layer 4 ports of the inner packet are also included if the protocol of the inner IP packet is TCP or UDP, and the inner IP packet is not a fragment (first fragment or any subsequent fragment). Likewise, if the outer IP packet is not a fragment packet, and the inner packet is MPLS, then the top inner label is included in the hash computation.
For PPPoE packets, if the inner packet is IPv4 or IPv6, then the source and destination addresses from the inner packet are included. Layer 4 ports are included if the protocol of the inner IP packet is TCP or UDP, and the inner IP packet is not a fragment. Inclusion of the PPPoE inner packet fields can be disabled by configuring the
no-payload
option at theforwarding-options enhanced-hash-key family multiservice
hierarchy level.For IPv6, the IPv6 header flow label field is included in the hash computation. RFC 6437 describes the 20-bit flow label field in the IPv6 header. Set the
no-flow-label
option at theforwarding-options enhanced-hash-key family inet6
hierarchy to disable the new default.
Hash fields used for GRE traffic sent over IPv4
The lists show the fields used in the hash calculation, for non-fragmented packets, in Junos 18.3R1 and later. By default, the field is used in the hash calculation unless otherwise noted. Also where noted, the IP and port fields used in the hash is symmetric, that is, swapping the fields does not change the hash result.
IPv4, GRE
GRE Key
Source and destination address; symmetric
Protocol
DSCP (disabled)
Incoming Interface Index (disabled)
IPv4 in IPv4, GRE
Payload (inner IPv4: source and destination ports, IP addresses); symmetric
GRE Key
GRE Protocol = IPv4
Source and destination address; symmetric
Protocol
DSCP (disabled)
Incoming Interface Index (disabled)
IPv6 in IPv4, GRE
Payload (inner IPv6: source and destination ports, IP addresses); symmetric
GRE Key
GRE Protocol = IPv6
Source and destination address; symmetric
Protocol
DSCP (disabled)
Incoming Interface Index (disabled)
MPLS in IPv4, GRE
Payload (inner MPLS: top label)
GRE Key
GRE Protocol = MPLS
Source and destination address; symmetric
Protocol
DSCP (disabled)
Incoming Interface Index (disabled)
IPv4, L2TPv2 used in Junos 17.2 and later
Inclusion of the L2TPv2 tunnel ID and session ID can be enabled by configuring the
forwarding-options enhanced-hash-key family inet l2tp-tunnel-session-identifier
option. Note that Juniper does not recommend enabling this option by default. This is because L2TP session identification is based on the destination UDP port match (1701), and this port may not be exclusively used for L2TP transport so the extraction of the tunnel and session ID fields from the packet may not always be accurate.Session ID
Tunnel ID
Source and destination port
Source and destination address; symmetric
Protocol (UDP)
DSCP (disabled)
Incoming Interface Index (disabled)
Hash fields used for GRE traffic sent over IPv6
The list shows the fields used in the hash calculation for non-fragmented packets. By default, the field is used in the hash calculation unless otherwise noted. Also where noted, the IP and port fields used in the hash is symmetric, that is, swapping the fields does not change the hash result.
IPv6, GRE
GRE Key
Source and destination address; symmetric
Next header
Flow label (Junos 18.3 and later)
Traffic class (disabled)
Incoming Interface Index (disabled)
IPv4 in IPv6, GRE (Junos 18.3 and later)
Payload (inner IPv4: source and destination ports, IP addresses); symmetric
GRE Key
GRE Protocol = IPv4
Source and destination address; symmetric
Next header
Flow label (Junos 18.3 and later)
Traffic class (disabled)
Incoming Interface Index (disabled)
IPv6 in IPv6, GRE (Junos 18.3 and later)
Payload (inner IPv6: source and destination ports, IP addresses); symmetric
GRE Key
GRE Protocol = IPv6
Source and destination address; symmetric
Next header
Flow label (Junos 18.3 and later)
Traffic class (disabled)
Incoming Interface Index (disabled)
MPLS in IPv6, GRE (Junos 18.3 and later)
Payload (inner MPLS: top labels); symmetric
GRE Key
GRE Protocol = MPLS
Source and destination address; symmetric
Next header
Flow label
Traffic class (disabled)
Incoming Interface Index (disabled)
Hash fields used for IPv4
The list shows the fields used in the hash calculation for non-fragmented packets, except where noted. By default, the field is used in the hash calculation unless otherwise noted. Also where noted, the IP and port fields hash is symmetric, that is, swapping the fields does not change the hash result.
IPv4, not TCP or UDP, or fragmented packets
Source and destination address; symmetric
Protocol
DSCP (disabled)
Incoming Interface Index (disabled)
IPv4, TCP and UDP, non fragmented packets
Source and destination port; symmetric
Source and destination address; symmetric
Protocol
DSCP (disabled)
Incoming Interface Index (disabled)
IPv4, PPTP
16 least significant bits of the GRE Key
Source and destination address; symmetric
Protocol
DSCP (disabled)
Incoming Interface Index (disabled)
IPv4, GTP, UDP traffic to destination port 2152
Inclusion of GPRS tunneling protocol (GTP) tunnel endpoint identifier (TEID) can be enabled at the
forwarding-options enhanced-hash-key family inet gtp-tunnel-endpoint-identifier
option. Note that Juniper does not recommend enabling this option by default. This is because GTP session identification is based on the destination UDP port match (2152), and this port may not be exclusively used for GTP transport, so the extraction of TEID field from the packet may not always be accurate.GTP TEID (disabled)
Source and destination port
Source and destination address; symmetric
Protocol
DSCP (disabled)
Incoming Interface Index (disabled)
Hash fields used for IPv6
The list shows the fields used in the hash calculation for non-fragmented packets, except where noted. By default, the field is used in the hash calculation unless otherwise noted. Also where noted, the IP and port fields hash is symmetric, that is, swapping the fields does not change the hash result.
IPv6, non TCP and UDP packet, or TCP and UDP packet fragmented by the originator
Source and destination address; symmetric
Next header
Flow label (Junos 18.3 and later)
Traffic class (disabled)
Incoming Interface Index (disabled)
IPv6, non fragmented TCP and UDP packet
Source and destination port; symmetric
Source and destination address; symmetric
Next header
Flow label (Junos 18.3 and later)
Traffic class (disabled)
Incoming Interface Index (disabled)
IPv6, PPTP
16 least significant bits of the GRE Key
Source and destination address; symmetric
Next header
Flow label (Junos 18.3 and later)
Traffic class (disabled)
Incoming Interface Index (disabled)
IPv6, GTP
Inclusion of GPRS tunneling protocol (GTP) tunnel endpoint identifier (TEID) can be enabled at the
forwarding-options enhanced-hash-key family inet gtp-tunnel-endpoint-identifier
hierarchy level. Note that Juniper does not recommend enabling this option by default. This is because GTP session identification is based on the destination UDP port match (2152), and this port may not be exclusively used for GTP transport, so the extraction of TEID field from the packet may not always be accurate.GTP TEID (disabled by default; enable at the
forwarding-options enhanced-hash-key family inet gtp-tunnel-endpoint-identifier
hierarchy level.Source and destination port
Source and destination address; symmetric
Next header
Flow label (Junos 18.3 and later)
Traffic class (disabled)
Incoming Interface Index (disabled)
Hash fields used for multiservice
Family multiservice hash configuration applies to packets entering into the router as family ccc
, vpls
, or bridge
. The list shows the fields used in the hash calculation for non-fragmented packets. By default,
the field is used in the hash calculation unless otherwise noted. Also where noted, the IP
and port fields used in the hash is symmetric, that is, swapping the fields does not change
the hash result.
Ethernet, non-IP or non-MPLS
If configured, payload information is extracted from untagged packets or packets with up to two VLAN tags.
Outer 802.1p (disabled)
Source and destination MAC; symmetric
Incoming Interface Index (disabled)
Ethernet, IPv4
Payload (inner IPv4: source and destination ports, IP addresses); symmetric
Outer 802.1p (disabled)
Source and destination MAC; symmetric
Incoming Interface Index (disabled)
Ethernet, IPv6
Payload (inner IPv6: source and destination ports, IP addresses); symmetric
Outer 802.1p (disabled)
Source and destination MAC; symmetric
Incoming Interface Index (disabled)
Ethernet, MPLS
Payload (inner MPLS: top labels plus inner IPv4 and IPv6 fields); symmetric. See Hash fields used for MPLS, Junos 18.3 and later, below, for related information.
Outer 802.1p (disabled)
Source and destination MAC; symmetric
Incoming Interface Index (disabled)
IPv4 in PPPoE (data packet)
Payload (inner IPv4: source and destination ports, IP addresses); symmetric
PPP protocol IPv4 version 0x1, type 0x1
Outer 802.1p (disabled)
Source and destination MAC; symmetric
Incoming Interface Index (disabled)
IPv6 in PPPoE (data packet)
Payload (inner IPv6: source and destination ports, IP addresses); symmetric
PPP protocol IPv6 version 0x1, type 0x1
Outer 802.1p (disabled)
Source and destination MAC; symmetric
Incoming Interface Index (disabled)
Hash fields used for MPLS, Junos 18.3 and later
The list shows the fields used in the hash calculation for non-fragmented packets. By default, the field is used in the hash calculation unless otherwise noted. Also where noted, the IP and port fields used in the hash is symmetric, that is, swapping the fields does not change the hash result.
MPLS, Encapsulated IPv4 or IPv6
-
Payload (inner IPv4: source and destination ports, IP addresses); symmetric
-
Payload (inner IPv6: source and destination ports, IP addresses, next header); symmetric
-
Label 1..16 (20 bits)
-
Outer Label EXP (disabled)
Incoming Interface Index (disabled)
-
MPLS, IPv4 or IPv6 in Ethernet pseudo-wire
Payload (IPv4/IPv6 in Ethernet pseudo-wire)
-
Label 2..16 (20 bits)
-
Outer Label EXP (disabled)
-
Label 1 (20 bits)
Incoming Interface Index (disabled)
-
MPLS, MPLS in Ethernet pseudo-wire
-
Payload (two top labels of MPLS label stack entry in Ethernet pseudo-wire)
-
Label 2..16 (20 bits)
-
Outer Label EXP (disabled)
-
Label 1 (20 bits)
-
Incoming Interface Index (disabled)
-
MPLS, entropy label
When an entropy label is detected, the payload field is not processed, and the indicator is not included into hash computation
-
Label 1..16 (20 bits)
-
Outer Label EXP (disabled)
Incoming Interface Index (disabled)
-
Hash fields used for MPLS from Junos 14.1 to Junos 18.3
The list shows the fields used in the hash calculation for non-fragmented packets. By default, the field is used in the hash calculation unless otherwise noted. Also where noted, the IP and port fields used in the hash is symmetric, that is, swapping the fields does not change the hash result.
MPLS, Encapsulated IPv4 or IPv6
Payload (inner IPv4: source and destination ports, IP addresses); symmetric
Payload (inner IPv6: source and destination ports, IP addresses, next header); symmetric
Label 2.8 (20 bits)
Outer Label EXP (disabled)
Label 1 (20 bits)
Incoming Interface Index (disabled)
MPLS, IPv4 or IPv6 in Ethernet pseudo-wire
Payload (IPv4/IPv6 in Ethernet pseudo-wire)
Label 2.8 (20 bits)
Outer Label EXP (disabled)
Label 1 (20 bits)
Incoming Interface Index (disabled)
-
MPLS, MPLS in Ethernet pseudo-wire
-
Payload (two top labels of MPLS label stack entry in Ethernet pseudo-wire)
-
Label 2..16 (20 bits)
-
Outer Label EXP (disabled)
-
Label 1 (20 bits)
-
Incoming Interface Index (disabled)
-
MPLS, entropy label
When an entropy label is detected, the payload field is not processed, and the indicator is not included into hash computation
Label 2.8 (20 bits)
Outer Label EXP (disabled)
Label 1 (20 bits)
Incoming Interface Index (disabled)
List of Junos Updates for Hash Calculation and Load Balancing for MX series routers with MPCs
Junos Release |
Change |
18.3R1 |
Includes IPv6 flow label, inner GRE header, and inner PPPoE in default hash computation. Increases MPLS label stack depth to 16 labels. |
17.2R1 |
Load balancing for L2TP encapsulated IPv4 and IPv6 packets. |
16.1R1 |
Includes EoMPLS payload hash with control word. Introduces source-only and destination-only based hashing. |
15.1R1 |
Provides targeted distribution of static interfaces across AE member links. Includes source, destination, and MAC of MPLS encapsulated PPPoE payload in the default hash computation. |
14.2R3 |
Increases scaling of LAG and MC-LAG. |
14.2R2 |
Provides aggregate Ethernet bundle with 10G, 40G and 100G links. |
14.1R1 |
Decouples aeX interface creation from Increases aggregate Ethernet interface name space. Provides adaptive load balancing for ECMP next hops. |
13.3R1 |
Includes enhancements for adaptive, per-packet-random, and periodic-rebalance load balancing. |
11.4R1 |
provides load sharing across ECMP next hops. |