Understanding CoS Hierarchical Port Scheduling (ETS)
Scheduling defines the class-of-service (CoS) properties of output queues. Output queues are mapped to forwarding classes. CoS scheduler properties include the amount of interface bandwidth assigned to the queue, the queue priority, and the drop profiles associated with the queue.
Hierarchical port scheduling is a two-tier process that provides better port bandwidth utilization and greater flexibility to allocate resources to queues (forwarding classes) and to groups of queues (forwarding class sets). Hierarchical scheduling includes the Junos OS implementation of enhanced transmission selection (ETS), as described in IEEE 802.1Qaz.
This topic describes:
Hierarchical Scheduling Tiers
The two tiers used in hierarchical scheduling are priorities and priority groups, as shown in Table 1.
Junos OS Configuration Construct |
Equivalent ETS Construct |
Description |
---|---|---|
Forwarding class |
Priority |
Think about priorities (forwarding classes) as output queues. You map forwarding classes to queues, so each forwarding class represents an output queue. When you use a classifier to map a forwarding class to an IEEE 802.1p code point, the code point identifies that traffic’s priority for priority-based flow control (PFC). Thus the forwarding class, the queue mapped to the forwarding class, and the priority (code point) mapped to the forwarding class all identify the same traffic. |
Forwarding class set |
Priority group |
Priority groups (forwarding class sets) are groups of priorities (forwarding classes). Forwarding class membership in a forwarding class set defines the priority group to which each priority belongs. You can configure up to three unicast priority groups and one multicast priority group. |
You apply scheduling properties to each hierarchical scheduling tier as descried in the next section.
If you explicitly configure one or more priority groups on an interface, any priority (forwarding class) that is not assigned to a priority group (forwarding class set) on that interface is assigned to an automatically generated default priority group and receives no bandwidth. This means that if you configure hierarchical scheduling on an interface, every forwarding class that you want to forward traffic on that interface must belong to a forwarding class set.
Hierarchical Scheduling and ETS
Two-tier hierarchical scheduling manages bandwidth efficiently by enabling you to define the CoS properties for each priority group and for each priority. The first tier of the hierarchical scheduler allocates port bandwidth to a priority group. The second tier of the hierarchical scheduler determines the portion of the priority group bandwidth that a priority (queue) can use.
The CoS properties of a priority group define the amount of port bandwidth resources available to the queues in that priority group. The CoS properties you configure for each queue specify the amount of the bandwidth available to the queue from the bandwidth allocated to the priority group. Figure 1 shows the relationship of port resource allocation to priority groups, and priority group resource allocation to queues (priorities).
If a queue (priority) does not use its allocated bandwidth, ETS shares the unused bandwidth among the other queues in the priority group in proportion to the minimum guaranteed rate (transmit rate) scheduled for each queue. If a priority group does not use its allocated bandwidth, ETS shares the unused bandwidth among the priority groups on the port in proportion to the minimum guaranteed rate (guaranteed rate) scheduled for each priority group.
In this way, ETS improves link bandwidth utilization, and it provides each queue and each priority group with the maximum available bandwidth. For example, priorities that consist of bursty traffic can share bandwidth during periods of low traffic transmission, instead of reserving their entire bandwidth allocation when traffic loads are light.
The available link bandwidth is the bandwidth remaining after
servicing strict-high
priority flows. Strict-high priority
takes precedence over all other traffic. We recommend that you configure
a shaping-rate (transmit-rate on QFX10000 switches) to limit the maximum amount of bandwidth that
a strict-high priority forwarding class can use to prevent starving
other queues.
ETS Advertisement in DCBX
When you configure hierarchical scheduling on a port, Data Center Bridging Capability Exchange protocol (DCBX) advertises:
-
Each priority group
-
The priorities in each priority group
-
The bandwidth properties of each priority group and priority
When you configure hierarchical scheduling on a port, any priority that is not part of an explicitly configured priority group is assigned to the automatically generated default priority group and receives no bandwidth. The default priority group is transparent. It does not appear in the configuration.
Hierarchical Scheduling Process
Hierarchical scheduling consists of multiple configuration steps that create the priorities and the priority groups, schedule their resources, and assign them to interfaces. The steps below correspond to the six blocks in the packet flow diagram shown in Figure 2:
Packet classification:
Configure classification of incoming traffic into forwarding classes (priorities). This consists of either using the default classifiers or configuring classifiers to map code points and loss priorities to the forwarding classes.
Apply the classifiers to ingress interfaces or use the default classifiers. Applying a classifier to an interface groups incoming traffic on the interface into forwarding classes and loss priorities, by applying the classifier code point mapping to the incoming traffic.
Configure the output queues for the forwarding classes (priorities). This consists of either using the default forwarding classes and forwarding-class-to-queue mapping, or creating your own forwarding classes and mapping them to output queues.
Allocate resources to the forwarding classes:
Define resources for the priorities. This consists of configuring schedulers to set minimum guaranteed bandwidth, maximum bandwidth, drop profiles for Weighted Random Early Detection (WRED), and bandwidth priority to apply to a forwarding class. Extra bandwidth is shared among queues in proportion to the minimum guaranteed bandwidth (transmit rate) of each queue.
Map resources to priorities. This consists of mapping forwarding classes to schedulers, using a scheduler map.
Configure priority groups. This consists of mapping forwarding classes (priorities) to forwarding class sets (priority groups) to define the priorities that belong to each priority group.
Define resources for the priority groups. This consists of configuring traffic control profiles to set minimum guaranteed bandwidth (guaranteed-rate) and maximum bandwidth (shaping-rate on switches other than QFX10000 switches, transmit-rate on QFX10000 switches) for a priority group. Traffic control profiles also specify a scheduler map, which defines the resources (schedulers) mapped to the priorities in the priority group. Extra port bandwidth is shared among priority groups in proportion to the minimum guaranteed bandwidth of each priority group.
The traffic control profile bandwidth settings determine the port resources available to the priority group. The schedulers specified in the scheduler map determine the amount of priority group resources that each priority receives.
Note:QFX10000 switches do not support defining a shaping rate for priority groups. Instead, set the maximum bandwidth for a priority group by defining a transmit rate. See transmit-rate.
Apply hierarchical scheduling to a port. This consists of attaching one or more priority groups (forwarding class sets) to an interface. For each priority group, you also attach a traffic control profile, which contains the scheduling properties of the priority group and the priorities in the priority group. Different priority groups on the same port can use different traffic control profiles, which provides fine tuned control of scheduling for each queue on each interface.
Strict-High Priority Queues and Hierarchical Scheduling
If you configure a strict-high priority queue, you must observe the following rules:
You must create a separate forwarding class set (priority group) for the strict-high priority queue.
Only one forwarding class set can contain strict-high priority queues.
Strict-high priority queues cannot belong to the same forwarding class set as queues that are not strict-high priority.
A strict-high priority queue cannot belong to a multidestination forwarding class set.
We recommend that you always apply a shaping-rate (transmit-rate on QFX10000 switches) to strict-high priority queues to limit the amount of bandwidth a strict-high priority queue can use. If you do not limit the amount of bandwidth a strict-high priority queue can use, then the strict-high priority queue can use all of the available port bandwidth and starve other queues on the port.
On a QFabric system, if a fabric (fte) interface handles strict-high priority traffic, you must define a separate forwarding class set (priority group) for strict-high priority traffic. Strict-high priority traffic cannot be mixed with traffic of other priorities in a forwarding class set. For example, you might choose to create different forwarding class sets for best effort, lossless, strict-high priority, and multidestination traffic.
Default Hierarchical Scheduling
There is no default hierarchical scheduling on QFX10000 switches. QFX10000 switches use port scheduling by default, and you must explicitly configure hierarchical scheduling to enable ETS. Also on QFX10000 switches, changing from port scheduler to ETS or from ETS to port scheduler requires a reboot.
If you do not explicitly configure hierarchical scheduling, the switch uses the default settings:
-
The switch automatically creates a default forwarding class set that contains all of the forwarding classes on the switch. The switch assigns 100 percent of the port output bandwidth to the default forwarding class set. The default forwarding class set is transparent. It does not appear in the configuration and is used for Data Center Bridging Capability Exchange protocol (DCBX) advertisement.
Ingress traffic is classified based on the default classifier settings.
The forwarding classes (queues) in the default forwarding class set receive bandwidth based on the default scheduler settings.