Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Navigation

Understanding Default CoS Scheduling on QFabric System Interconnect Devices (Junos OS Release 13.1 and Later Releases)

The default class-of-service (CoS) properties on the QFabric system Interconnect device interfaces are optimized to best utilize the fabric resources. You cannot configure CoS properties on QFabric System Interconnect device interfaces.

Hierarchical CoS Architecture Across a QFabric System Interconnect Device

Because Interconnect devices support traffic from multiple Node devices that have multiple CoS configurations, CoS on Interconnect device fabric interfaces differs from CoS on Node device access and fabric interfaces.

The hierarchical CoS scheduling structure on the Interconnect device interfaces consists of two tiers:

  1. Fabric forwarding class sets—Similar to fc-sets on Node devices, fabric fc-sets group traffic for transport across the Interconnect device fabric. Fabric fc-sets are global and apply to all traffic that crosses the fabric from all Node devices. See Understanding CoS Fabric Forwarding Class Sets for a detailed description of fabric fc-sets.
  2. Class groups—Fabric fc-sets are grouped into class groups for transport across the Interconnect device.

Node devices and Interconnect devices each have a two-tier hierarchical CoS scheduling architecture. The architectures are slightly different, but each tier of the scheduling hierarchy performs analogous functions, as shown in Table 1.

Table 1: Hierarchical Scheduler Architecture on Node Devices and Interconnect Devices

Bandwidth Pool

Bandwidth Configuration on Node Devices

Bandwidth Configuration on Interconnect Devices

Port—Entire amount of bandwidth available to traffic on a port.

Access (xe) or fabric (fte) interfaces

Fabric (fte) or Clos fabric (bfte) interfaces

Priority group—Group of traffic types that requires similar CoS treatment. Each priority group receives a portion of the total available port bandwidth.

Forwarding class set (fc-set)

Class group

Priority—Most granular tier of bandwidth allocation. Each priority receives a portion of the total available priority group bandwidth.

Forwarding class (mapped to output queue)

Fabric fc-set (mapped to output queue)

Fabric FC-Sets

Fabric fc-sets are groups of forwarding classes that receive similar CoS treatment across the Interconnect device. Fabric fc-sets are global to the QFabric system and apply to all traffic that traverses the fabric, from all connected Node devices. The CoS on a fabric fc-set applies to all the traffic that belongs to that fabric fc-set.

For example, a fabric fc-set that includes the best-effort forwarding class handles all of the best-effort traffic from all of the connected Node devices that traverses the Interconnect device fabric.

There are 12 default fabric fc-sets, including 5 visible fabric fc-sets and 7 hidden fabric fc-sets. The five visible fabric fc-sets have forwarding classes mapped to them by default. By default, the seven hidden fabric fc-sets do not carry traffic, but you can map forwarding classes to the hidden fabric fc-sets if you want to use them.

You can configure the forwarding class membership of each fabric fc-set. However, you cannot create new fabric fc-sets, and you cannot delete the 12 default fabric fc-sets.

Each fabric fc-set is mapped to an output queue. Each fabric interface has 12 output queues, one for each of the 12 fabric fc-sets. The traffic from all of the forwarding classes mapped to a fabric fc-set uses that fabric fc-set’s output queue.

Fabric fc-sets are grouped into class groups for transport across the Interconnect device.

Class Groups for Fabric FC-Sets

To transport traffic across the fabric, the fabric organizes the fabric fc-sets into three classes called class groups. Class groups are not user-configurable. The three class groups are:

  • Strict-high priority—All traffic in the fabric fc-set fabric_fcset_strict_high. This class group includes the traffic in strict-high priority and network-control forwarding classes, and in any forwarding classes you create on a Node device that consist of strict-high priority traffic.
  • Unicast—All traffic in the fabric fc-sets fabric_fcset_be, fabric_fcset_noloss1, and fabric_fcset_noloss2. This class group includes the traffic in the best-effort, fcoe, and no-loss forwarding classes, and the traffic in any forwarding classes you create on a Node device that consist of best-effort or lossless traffic. If you use any of the hidden no loss fabric fc-sets (fabric_fcset_noloss3, fabric_fcset_noloss4, fabric_fcset_noloss5, or fabric_fcset_noloss6), that traffic is part of this class group.
  • Multidestination—All traffic in the fabric fc-set fabric_fcset_multicast1. This class group includes the traffic in the mcast forwarding class and in any forwarding classes you create on a Node device that consist of multidestination traffic. If you use any of the hidden multidestination fabric fc-sets (fabric_fcset_multicast2, fabric_fcset_multicast3, or fabric_fcset_multicast4), that traffic is part of this class group.

Default CoS on Interconnect Device Fabric Interfaces

The Interconnect device interfaces use the default CoS configuration as described in these sections:

Default Class Group Scheduling

Default class group bandwidth scheduling is analogous to default fc-set (priority group) scheduling on a Node device. Default class group scheduling uses weighted round-robin (WRR) scheduling, in which each class group receives a portion of the total available fabric interface bandwidth, based on the class group’s traffic type, as shown in Table 2:

Table 2: Class Group Default Scheduling Properties and Membership

Class Group

Fabric fc-sets

Forwarding Classes (Default Mapping)

Class Group Scheduling Properties (Weight)

Strict-high priority

fabric_fcset_strict_high

  • All strict-high priority forwarding classes
  • network-control

Traffic in the strict-high priority class group is served first. This class group receives all of the bandwidth it needs to empty its queues and therefore can starve other types of traffic during periods of high-volume strict priority traffic. Plan carefully and use caution when determining how much traffic to configure as strict-high priority traffic.

Unicast

  • fabric_fcset_be
  • fabric_fcset_noloss1
  • fabric_fcset_noloss2

Includes the hidden lossless fabric fc-sets if used:

  • fabric_fcset_noloss3
  • fabric_fcset_noloss4
  • fabric_fcset_noloss5
  • fabric_fcset_noloss6
  • best-effort
  • fcoe
  • no-loss

Note: No forwarding classes are mapped to the hidden lossless fabric_fcsets by default.

Traffic in the unicast class group receives an 80% weight in the weighted round-robin (WRR) calculations. After the strict-high priority class group has been served, the unicast class group receives 80% of the remaining fabric bandwidth. (If more bandwidth is available, the unicast class group can use more bandwidth.)

Multidestination

fabric_fcset_multicast1

Includes the hidden multidestination fabric fc-sets if used:

  • fabric_fcset_multicast2
  • fabric_fcset_multicast3
  • fabric_fcset_multicast4
  • mcast

Note: No forwarding classes are mapped to the hidden multidestination fabric_fcsets by default.

Traffic in the multidestination class group receives a 20% weight in the WRR calculations. After the strict-high priority class group has been served, the multidestination class group receives 20% of the remaining fabric bandwidth. (If more bandwidth is available, the multidestination class group can use more bandwidth.)

Only the five visible fabric fc-sets have traffic mapped to them by default. The fabric fc-sets within each class group are weighted by their transmit rates (guaranteed minimum bandwidth), and they receive bandwidth from the class group’s total bandwidth using weighted round-robin (WRR) scheduling.

Default Fabric FC-Set Scheduling

Default fabric fc-set bandwidth scheduling is analogous to default forwarding class (priority) scheduling on a Node device. Each fabric fc-set receives a guaranteed minimum percentage of the port bandwidth that the class group receives. The guaranteed minimum percentage is called the transmit rate.

Table 3 shows the default transmit rate for each of the default fabric fc-sets.

Table 3: Default Fabric FC-Set Scheduler Configuration

Default Fabric FC-Set

Transmit Rate (Percentage of Class Group Bandwidth)

fabric_fcset_strict_high

N/A

Strict-high priority traffic is served first, before any other traffic is served. Strict-high priority traffic receives all of the bandwidth it needs to empty its queues and therefore can starve other types of traffic during periods of high-volume strict priority traffic. Plan carefully and use caution when determining how much traffic to configure as strict-high priority traffic.

fabric_fcset_noloss1

35%

fabric_fcset_noloss2

35%

fabric_fcset_be

10%

fabric_fcset_multicast1

20%

Each fabric fc-set belongs to a class group. Each class group receives a portion of the total available port bandwidth. Each fabric fc-set in a class group receives a portion of the total available class group bandwidth based on the transmit rate (weight) of the fabric fc-set.

Traffic in fabric_fcset_strict_high does not have a default transmit rate because fabric_fcset_strict_high receives all of the bandwidth needed to empty its queue before other queues are served. Traffic in the remaining fabric fc-sets receive bandwidth in a ratio proportional to the default transmit rate of each fabric fc-set.

Each of the following hidden fabric fc-sets receives a default scheduling weight of 1:

  • fabric_fcset_noloss3
  • fabric_fcset_noloss4
  • fabric_fcset_noloss5
  • fabric_fcset_noloss6
  • fabric_fcset_multicast2
  • fabric_fcset_multicast3
  • fabric_fcset_multicast4

You must explicitly map forwarding classes to hidden fabric fc-sets if you want to use the hidden fabric fc-sets.

Caution: Bandwidth is allocated to fabric fc-sets based on scheduling weight. The scheduling weights of the visible (default) fabric fc-sets are the same as their transmit rates, so in the unicast class group, fabric_fcset_noloss1 and fabric_fcset_noloss2 each have a weight of 35 and fabric_fcset_be has a weight of 10. In the multidestination class group, the default fabric_fcset_multicast1 has a weight of 20. The hidden multicast and noloss fabric fc-sets each have a scheduling weight of 1.

The scheduling weights mean that when the visible fabric fc-sets are fully utilizing their allocated bandwidth:

  • The hidden noloss fc-sets (fabric_fcset_noloss3, fabric_fcset_noloss4, fabric_fcset_noloss5, and fabric_fcset_noloss6) receive bandwidth at a proportional rate of 1:35 compared to the default noloss fc-sets.
  • The hidden multicast fc-sets (fabric_fcset_multicast2, fabric_fcset_multicast3, and fabric_fcset_multicast4) receive bandwidth at a proportional rate of 1:20 compared to the default multicast fc-sets.

If you map traffic to a hidden fabric fc-set, that fabric fc-set receives the proportional amount of class group bandwidth that corresponds to its scheduling weight (1). The amount of bandwidth allocated to a hidden fabric fc-set depends on how much bandwidth the other fc-sets in the same class group consume. When the visible fabric fc-sets fully utilize their bandwidth, hidden fabric fc-sets receive only their minimum weight in bandwidth. (However, even a low scheduling weight results in a relatively large absolute bandwidth allocation because each fabric port is a 40-Gbps port.)

For example, if fabric_fcset_noloss1 and fabric_fcset_noloss2 each consume all of the 35 percent of bandwidth allocated to them, and fabric_fcset_be consumes all of the 10 percent of bandwidth allocated to it, then fabric_fcset_noloss3, fabric_fcset_noloss4, fabric_fcset_noloss5, and fabric_fcset_noloss6 receive bandwidth at a rate of 1:80 compared to the visible noloss fabric fc-sets. (If the visible fabric fc-sets do not use all of their allocated bandwidth, then the hidden fabric fc-sets receive more bandwidth.)

Another example is if we map lossless traffic to fabric_fcset_noloss3 and to fabric_fcset_noloss4. Fabric_fcset_noloss1 uses 10 percent of its 35 percent allocation of unicast class group bandwidth. Fabric_fcset_noloss2 uses 15 percent of its 35 percent allocation of unicast class group bandwidth. Fabric_fcset_be uses 5 percent of its allocated bandwidth. Fabric_fcset_noloss3 and fabric_fcset_noloss4 can use the remaining unicast class group bandwidth allocated to lossless traffic. However, if the traffic on fabric_fcset_noloss1, fabric_fcset_noloss2, or fabric_fcset_be increases, the bandwidth allocated to the hidden fabric fc-sets decreases.

Similarly, if you map traffic to a hidden multidestination fabric fc-set (fabric_fcset_multicast2, fabric_fcset_multicast3, fabric_fcset_multicast4), that multidestination fabric fc-set receives the proportional amount of class group bandwidth that corresponds to its scheduling weight (1). The amount of bandwidth allocated to a hidden multidestination fabric fc-set depends on how much bandwidth the other fc-sets in the multidestination class group consume. When fabric_fcset_multicast1 (the visible fabric fc-set) fully utilizes its bandwidth, hidden fabric fc-sets receive only their minimum weight in bandwidth. For example, if fabric_fcset_multicast1 uses its full bandwidth allocation, then the hidden multidestination fabric fc-sets receive bandwidth at a rate of 1:20 compared to fabric_fcset_multicast1.

Default Class Group and Fabric FC-Set Scheduling Example

The following example shows how default scheduling allocates the total port bandwidth among the class groups and their fabric fc-sets. In the example, traffic is mapped to each of the forwarding classes in the five visible fabric fc-sets, and the strict-high priority class group consumes an average of 10 percent of the 40-Gbps fabric interface bandwidth (4 gigabits), leaving 90 percent of the fabric interface bandwidth (36 gigabits) for the remaining class groups.

In this scenario, by default, the strict-high priority class group includes one fabric fc-set (fabric_fcset_strict_high), the unicast class group includes three fabric fc-sets (fabric_fcset_be, fabric_fcset_noloss1, and fabric_fcset_noloss2), and the multidestination class group includes one fabric fc-set (fabric_fcset_multicast1). Each individual fabric fc-set receives the following treatment:

  • Strict-high priority class group (fabric_fcset_strict_high)—This group is assumed to average 10 percent (4 gigabits) for the purposes of this example. Because the strict-high priority class group is served first and receives all of the bandwidth it requires to empty its queue, in real networks the amount of required bandwidth fluctuates and affects the amount of bandwidth available to the other class groups.

    Tip: To prevent strict-high priority traffic from using too much bandwidth, you can set a maximum bandwidth limit by configuring a scheduler shaping rate for the fabric_fcset_strict_high fabric fc-set.

  • Unicast class group (fabric_fcset_be, fabric_fcset_noloss1, and fabric_fcset_noloss2)—Each of these fabric fc-sets receives a weighted portion of the 80 percent of the total port bandwidth available after the strict-high traffic has been served. The weight corresponds to the transmit rate of each fabric fc-set. The following calculations show the minimum port bandwidth allocated to each of the unicast class group fabric fc-sets:
    • fabric_fcset_be

      10 / (35 + 35 + 10)% of 80% of the available port bandwidth (12.5 percent of 80 percent of port bandwidth)

      The 10 that is the numerator in 10 / (35 + 35 + 10) is the percentage of bandwidth allocated to the fabric_fcset_be by the transmit rate weight. The (35 + 35 + 10) in the denominator sums the percentage of bandwidth (transmit rate weights) allocated to each of the three fabric fc-sets in the unicast class group.

      The 80 percent represents 80 percent of the port bandwidth available after strict-high priority traffic is served (36 gigabits).

      The resulting equation is:

      10 / (35 + 35 + 10)% x (0.8 x 36 gigabits) = approximately 3.6 gigabits

    • fabric_fcset_noloss1 and fabric_fcset_noloss2

      The default minimum bandwidth for the two visible lossless fabric fc-sets is the same because both of these fabric fc-sets have the same transmit rate weight.

      35 / (35 + 35 + 10)% of 80% of the port bandwidth (43.75 percent of 80 percent of port bandwidth)

      The 35 that is the numerator in 35 / (35 + 35 + 10) is the percentage of bandwidth allocated to each of the noloss fabric fc-sets by the transmit rate weight. The (35 + 35 + 10) in the denominator sums the percentage of bandwidth (transmit rate weights) allocated to each of the three fabric fc-sets in the unicast class group.

      The 80 percent represents 80 percent of the port bandwidth available after strict-high priority traffic is served (36 gigabits).

      The resulting equation is:

      35 / (35 + 35 + 10)% x (0.8 x 36 gigabits) = approximately 12.6 gigabits

  • Multidestination class group (fabric_fcset_multicast1)—Because only one fabric fc-set is configured by default in the multidestination class group, it receives 100 percent of the 20 percent of the total port bandwidth available to the multidestination class group after the strict-high traffic has been served:

    100 / (100)% of 20% of the available port bandwidth (100 percent of 20 percent of available port bandwidth)

    The resulting equation is:

    100 / 100% x (0.2 x 36 gigabits) = approximately 7.2 gigabits

Default PFC and Lossless Transport Across the Interconnect Device

The Interconnect device incorporates flow control mechanisms to support lossless transport during periods of congestion on the fabric. To support the priority-based flow control (PFC) feature on the Node devices, the Interconnect device fabric supports lossless transport for up to six IEEE 802.1p priorities when the following two configuration constraints are met:

  1. The IEEE 802.1p priority used for the traffic that requires lossless transport is mapped to a lossless forwarding class (a forwarding class configured with the no-loss parameter or the default fcoe or no-loss forwarding class).
  2. The lossless forwarding class must be mapped to one of the lossless fabric fc-sets (fabric_fcset_noloss1, fabric_fcset_noloss2, fabric_fcset_noloss3, fabric_fcset_noloss4, fabric_fcset_noloss5, or fabric_fcset_noloss6). If you do not explicitly map lossless forwarding classes to fabric fc-sets, lossless forwarding classes are mapped by default to lossless fabric fc-sets fabric_fcset_noloss1 and fabric_fcset_noloss2.

When traffic meets these two constraints, the fabric propagates back-pressure from egress queues during periods of congestion. However, to achieve end-to-end lossless transport across the QFabric system, you must also configure a congestion notification profile to enable PFC on the Node device ingress interfaces. To achieve end-to-end lossless transport across the network, you must configure PFC on all of the devices in the lossless traffic path.

For all other combinations of IEEE 802.1p priority to forwarding class mapping and all other combinations of forwarding class to fabric fc-set mapping, the default congestion control mechanism is normal packet drop. For example:

  • Case 1—If the IEEE 802.1p priority 5 is mapped to the lossless fcoe forwarding class, and the fcoe forwarding class is mapped to the fabric_fcset_noloss1 fabric fc-set, then the congestion control mechanism is PFC.
  • Case 2—If the IEEE 802.1p priority 5 is mapped to the lossless fcoe forwarding class, and the fcoe forwarding class is mapped to the fabric_fcset_be fabric fc-set, then the congestion control mechanism is packet drop, and the traffic does not receive lossless treatment.
  • Case 3—If the IEEE 802.1p priority 5 is mapped to the lossless no-loss forwarding class, and the no-loss forwarding class is mapped to the fabric_fcset_noloss2 fabric fc-set, then the congestion control mechanism is PFC.
  • Case 4—If the IEEE 802.1p priority 5 is mapped to the lossless no-loss forwarding class, and the no-loss forwarding class is mapped to the fabric_fcset_be fabric fc-set, then the congestion control mechanism is packet drop, and the traffic does not receive lossless treatment.
  • Case 5—If the IEEE 802.1p priority 5 is mapped to the lossy best-effort forwarding class, and the best-effort forwarding class is mapped to the fabric_fcset_be fabric fc-set, then the congestion control mechanism is packet drop.
  • Case 6—If the IEEE 802.1p priority 5 is mapped to the lossy best-effort forwarding class, and the best-effort forwarding class is mapped to the fabric_fcset_noloss1 fabric fc-set, then the congestion control mechanism is packet drop.

Note: Lossless transport across the fabric must also meet the following two conditions:

  1. The maximum cable length between the Node device and the Interconnect device is 150 meters of fiber cable.
  2. The maximum frame size is 9216 bytes.

If the MTU is 9216 KB, in some cases the QFabric system supports only five lossless forwarding classes instead of six lossless forwarding classes because of headroom buffer limitations.

The number of IEEE 802.1p priorities (forwarding classes) the QFabric system can support for lossless transport across the Interconnect device fabric depends on several factors:

  • Approximate fiber cable length—The longer the fiber cable that connects Node device fabric (FTE) ports to the Interconnect device fabric ports, the more data the connected ports need to buffer when a pause is asserted. (The longer the fiber cable, the more frames are traversing the cable when a pause is asserted. Each port must be able to store all of the “in transit” frames in the buffer to preserve lossless behavior and avoid dropping frames.)
  • MTU size—The larger the maximum frame sizes the buffer must hold, the fewer frames the buffer can hold. The larger the MTU size, the more buffer space each frame consumes.
  • Total number of Node device fabric ports connected to the Interconnect device—The higher the number of connected fabric ports, the more headroom buffer space the Node device needs on those fabric ports to support the lossless flows that traverse the Interconnect device. Because more buffer space is used on the Node device fabric ports, less buffer space is available for the Node device access ports, and a lower total number of lossless flows are supported.

The QFabric system supports six lossless priorities (forwarding classes) under most conditions. The priority group headroom that remains after allocating headroom to lossless flows is sufficient to support best-effort and multidestination traffic.

Table 4 shows how many lossless priorities the QFabric system supports under different conditions (fiber cable lengths and MTUs) in cases when the QFabric system supports fewer than six lossless priorities. The number of lossless priorities is the same regardless of how many Node device FTE ports are connected to the Interconnect device. However, the higher the number of FTE ports connected to the Interconnect device, the lower the number of total lossless flows supported. In all cases that are not shown in Table 4, the QFabric system supports six lossless priorities.

Note: The system does not perform a configuration commit check that compares available system resources with the number of lossless forwarding classes configured. If you commit a configuration with more lossless forwarding classes than the system resources can support, frames in lossless forwarding classes might be dropped.

Table 4: Lossless Priority (Forwarding Class) Support for QFX3500 and QFX3600 Node Devices When Fewer than Six Lossless Priorities Are Supported

MTU in Bytes

Fiber Cable Length in Meters (Approximate)

Maximum Number of Lossless Priorities (Forwarding Classes) on the Node Device

9216 (9K)

100

5

9216 (9K)

150

5

Note: The total number of lossless flows decreases as resource consumption increases. For a Node device, the higher the number of FTE ports connected to the Interconnect device, the larger the MTU, and the longer the fiber cable length, the fewer total lossless flows the QFabric system can support.

Published: 2013-09-26