Understanding Default CoS Scheduling and Classification
If you do not explicitly configure classifiers and apply them to interfaces, the switch uses the default classifier to group ingress traffic into forwarding classes. If you do not configure scheduling on an interface, the switch uses the default schedulers to provide egress port resources for traffic. Default classification maps all traffic into default forwarding classes (best-effort, fcoe, no-loss, network-control, and mcast). Each default forwarding class has a default scheduler, so that the traffic mapped to each default forwarding class receives port bandwidth, prioritization, and packet drop characteristics.
The switch supports direct port scheduling and enhanced transmission selection (ETS), also known as hierarchical port scheduling, except on QFX5200 and QFX5210 switches.
Hierarchical scheduling groups IEEE 802.1p priorities (IEEE 802.1p code points, which classifiers map to forwarding classes, which in turn are mapped to output queues) into priority groups (forwarding class sets). If you use only the default traffic scheduling and classification, the switch automatically creates a default priority group that contains all of the priorities (which are mapped to forwarding classes and output queues), and assigns 100 percent of the port output bandwidth to that priority group. The forwarding classes (queues) in the default forwarding class set receive bandwidth based on the default classifier settings. The default priority group is transparent. It does not appear in the configuration and is used for Data Center Bridging Capability Exchange (DCBX) protocol advertisement.
If you explicitly configure one or more priority groups on an interface, any forwarding class that is not assigned to a priority group on that interface receives no bandwidth. This means that if you configure hierarchical scheduling on an interface, every forwarding class (priority) that you want to forward traffic on that interface must belong to a forwarding class set (priority group). ETS is not supported on QFX5200 or QFX5210 switches.
The following sections describe:
Default Classification
On switches except QFX10000 and NFX Series devices, the default classifiers assign unicast and multicast best-effort and network-control ingress traffic to default forwarding classes and loss priorities. The switch applies default unicast IEEE 802.1, unicast DSCP, and multidestination classifiers to each interface that does not have explicitly configured classifiers.
On QFX10000 switches and NFX Series devices, the default classifiers
assign ingress traffic to default forwarding classes and loss priorities.
The switch applies default IEEE 802.1, DSCP, and DSCP IPv6 classifiers
to each interface that does not have explicitly configured classifiers.
If you do not configure and apply EXP classifiers for MPLS traffic
to logical interfaces, MPLS traffic on interfaces configured as family mpls
uses the IEEE classifier.
If you explicitly configure one type of classifier but not other types of classifiers, the system uses only the configured classifier and does not use default classifiers for other types of traffic. There are two default IEEE 802.1 classifiers: a trusted classifier for ports that are in trunk mode or tagged-access mode, and an untrusted classifier for ports that are in access mode.
The default classifiers apply to unicast traffic except on QFX10000 switches and NFX Series devices. Tagged-access mode does not apply to QFX10000 switches or NFX Series devices.
Table 1 shows the default mapping of IEEE 802.1 code-point values to forwarding classes and loss priorities for ports in trunk mode or tagged-access mode.
Code Point |
Forwarding Class |
Loss Priority |
---|---|---|
be (000) |
best-effort |
low |
be1 (001) |
best-effort |
low |
ef (010) |
best-effort |
low |
ef1 (011) |
fcoe |
low |
af11 (100) |
no-loss |
low |
af12 (101) |
best-effort |
low |
nc1 (110) |
network-control |
low |
nc2 (111) |
network-control |
low |
Table 2 shows the default mapping of IEEE 802.1p code-point values to forwarding classes and loss priorities for ports in access mode (all incoming traffic is mapped to best-effort forwarding classes).
Table 2 applies only to unicast traffic except on QFX10000 switches and NFX Series devices.
Code Point |
Forwarding Class |
Loss Priority |
---|---|---|
000 |
best-effort |
low |
001 |
best-effort |
low |
010 |
best-effort |
low |
011 |
best-effort |
low |
100 |
best-effort |
low |
101 |
best-effort |
low |
110 |
best-effort |
low |
111 |
best-effort |
low |
Table 3 shows the default mapping of IEEE 802.1 code-point values to multidestination (multicast, broadcast, and destination lookup fail traffic) forwarding classes and loss priorities.
Table 3 does not apply to QFX10000 switches or NFX Series devices.
Code Point |
Forwarding Class |
Loss Priority |
---|---|---|
be (000) |
mcast |
low |
be1 (001) |
mcast |
low |
ef (010) |
mcast |
low |
ef1 (011) |
mcast |
low |
af11 (100) |
mcast |
low |
af12 (101) |
mcast |
low |
nc1 (110) |
mcast |
low |
nc2 (111) |
mcast |
low |
Table 4 shows the default mapping of DSCP code-point values to forwarding classes and loss priorities for DSCP IP and DCSP IPv6.
Table 4 applies only to unicast traffic except on QFX10000 switches and NFX Series devices.
Code Point |
Forwarding Class |
Loss Priority |
---|---|---|
ef (101110) |
best-effort |
low |
af11 (001010) |
best-effort |
low |
af12 (001100) |
best-effort |
low |
af13 (001110) |
best-effort |
low |
af21 (010010) |
best-effort |
low |
af22 (010100) |
best-effort |
low |
af23 (010110) |
best-effort |
low |
af31 (011010) |
best-effort |
low |
af32 (011100) |
best-effort |
low |
af33 (011110) |
best-effort |
low |
af41 (100010) |
best-effort |
low |
af42 (100100) |
best-effort |
low |
af43 (100110) |
best-effort |
low |
be (000000) |
best-effort |
low |
cs1 (001000) |
best-effort |
low |
cs2 (010000) |
best-effort |
low |
cs3 (011000) |
best-effort |
low |
cs4 (100000) |
best-effort |
low |
cs5 (101000) |
best-effort |
low |
nc1 (110000) |
network-control |
low |
nc2 (111000) |
network-control |
low |
There are no default DSCP IP or IPv6 multidestination classifiers for multidestination traffic. DSCP IPv6 multidestination classifiers are not supported for multidestination traffic.
Table 5 shows the default mapping of MPLS EXP code-point values to forwarding classes and loss priorities, which apply only on QFX10000 switches and NFX Series devices.
Code Point |
Forwarding Class |
Loss Priority |
---|---|---|
000 |
best-effort |
low |
001 |
best-effort |
high |
010 |
expedited-forwarding |
low |
011 |
expedited-forwarding |
high |
100 |
assured-forwarding |
low |
101 |
assured-forwarding |
high |
110 |
network-control |
low |
111 |
network-control |
high |
Default Scheduling
The default schedulers allocate egress bandwidth resources to egress traffic as shown in Table 6:
Default Scheduler and Queue Number |
Transmit Rate (Guaranteed Minimum Bandwidth) |
Shaping Rate (Maximum Bandwidth) |
Excess Bandwidth Sharing |
Priority |
Buffer Size |
---|---|---|---|---|---|
best-effort forwarding class scheduler (queue 0) |
5% 15% (QFX10000, NFX Series) |
None |
5% 15% (QFX10000, NFX Series) |
low |
5% 15% (QFX10000, NFX Series) |
fcoe forwarding class scheduler (queue 3) |
35% |
None |
35% |
low |
35% |
no-loss forwarding class scheduler (queue 4) |
35% |
None |
35% |
low |
35% |
network-control forwarding class scheduler (queue 7) |
5% 15% (QFX10000, NFX Series) |
None |
5% 15% (QFX10000, NFX Series) |
low |
5% 15% (QFX10000, NFX Series) |
(Excluding QFX10000 and NFX Series) mcast forwarding class scheduler (queue 8) |
20% |
None |
20% |
low |
20% |
By default, the minimum guaranteed bandwidth (transmit rate)
determines the amount of excess (extra) bandwidth that a queue can
share. Extra bandwidth is allocated to queues in proportion to the
transmit rate of each queue. On switches that support the excess-rate
statement, you can override the default setting and configure the
excess bandwidth percentage independently of the transmit rate on
queues that are not strict-high priority queues.
By default, only the four (QFX10000 switches and NFX Series devices) or five (other switches) default schedulers shown in Table 6 have traffic mapped to them. Only the forwarding classes and queues associated with the default schedulers receive default bandwidth, based on the default scheduler transmit rate. (You can configure schedulers and forwarding classes to allocate bandwidth to other queues or to change the bandwidth and other scheduling properties of a default queue.)
On QFX10000 switches and NFX Series devices, if a forwarding class does not transport traffic, the bandwidth allocated to that forwarding class is available to other forwarding classes. Unicast and multidestination (multicast, broadcast, and destination lookup fail) traffic use the same forwarding classes and output queues.
On switches other than QFX10000 and NFX Series devices, multidestination queue 11 receives enough bandwidth from the default multidestination scheduler to handle CPU-generated multidestination traffic.
On QFX10000 and NFX Series devices, default scheduling is port scheduling. Default hierarchical scheduling, known as enhanced transmission selection (ETS, defined in IEEE 802.1Qaz), allocates the total port bandwidth to the four default forwarding classes served by the four default schedulers, as defined by the four default schedulers. The result is the same as direct port scheduling. Configuring hierarchical port scheduling, however, enables you to group forwarding classes that carry similar types of traffic into forwarding class sets (also called priority groups),and to assign port bandwidth to each forwarding class set. The port bandwidth assigned to the forwarding class set is then assigned to the forwarding classes within the forwarding class set. This hierarchy enables you to control port bandwidth allocation with greater granularity, and enables hierarchical sharing of extra bandwidth to better utilize link bandwidth.
Except on QFX10000 switches and NFX Series devices, default
hierarchical scheduling divides the total port bandwidth between two
groups of traffic: unicast traffic and multidestination traffic. By
default, unicast traffic consists of queue 0 (best-effort
forwarding class), queue 3 (fcoe
forwarding class), queue
4 (no-loss
forwarding class), and queue 7 (network-control
forwarding class). Unicast traffic receives and shares a total of
80 percent of the port bandwidth. By default, multidestination traffic
(mcast
queue 8) receives a total of 20 percent of the port
bandwidth. So on a 10-Gigabit port, unicast traffic receives 8-Gbps
of bandwidth and multidestination traffic receives 2-Gbps of bandwidth.
Except on QFX5200, QFX5210, and QFX10000 switches and NFX Series devices, which do not support queue 11, multidestination queue 11 also receives a small amount of default bandwidth from the multidestination scheduler. CPU-generated multidestination traffic uses queue 11, so you might see a small number of packets egress from queue 11. In addition, in the unlikely case that firewall filter match conditions map multidestination traffic to a unicast forwarding class, that traffic uses queue 11.
Default scheduling uses weighted round-robin (WRR) scheduling. Each queue receives a portion (weight) of the total available interface bandwidth. The scheduling weight is based on the transmit rate of the default scheduler for that queue. For example, queue 7 receives a default scheduling weight of 5 percent, or 15 percent on QFX10000 and NFX Series devices, of the available bandwidth, and queue 4 receives a default scheduling weight of 35 percent of the available bandwidth. Queues are mapped to forwarding classes, so forwarding classes receive the default bandwidth for the queues to which they are mapped.
On QFX10000 switches and NFX Series devices, for example, queue 7 is mapped to the network-control forwarding class and queue 4 is mapped to the no-loss forwarding class. Each forwarding class receives the default bandwidth for the queue to which it is mapped. Unused bandwidth is shared with other default queues.
If you want non-default (unconfigured) queues to forward traffic, you should explicitly map traffic to those queues (configure the forwarding classes and queue mapping) and create schedulers to allocate bandwidth to those queues. By default, queues 1, 2, 5, and 6 are unconfigured.
Except on QFX5200, QFX5210, and QFX10000 switches and NFX Series devices, which do not support them, multidestination queues 9, 10, and 11 are unconfigured. Unconfigured queues have a default scheduling weight of 1 so that they can receive a small amount of bandwidth in case they need to forward traffic. However, queue 11 can use more of the default multidestination scheduler bandwidth if necessary to handle CPU-generated multidestination traffic.
All four (two on QFX5200 and QFX5210 switches) multidestination queues have a scheduling weight of 1. Because by default multidestination traffic goes to queue 8, queue 8 receives almost all of the multidestination bandwidth. (There is no traffic on queue 9 and queue 10, and very little traffic on queue 11, so there is almost no competition for multidestination bandwidth.)
However, if you explicitly configure queue 9, 10, or 11 (by mapping code points to the unconfigured multidestination forwarding classes using the multidestination classifier), the explicitly configured queues share the multidestination scheduler bandwidth equally with default queue 8, because all of the queues have the same scheduling weight (1). To ensure that multidestination bandwidth is allocated to each queue properly and that the bandwidth allocation to the default queue (8) is not reduced too much, we strongly recommend that you configure a scheduler if you explicitly classify traffic into queue 9, 10, or 11.
If you map traffic to an unconfigured queue, the queue receives only the amount of excess bandwidth proportional to its default weight (1). The actual amount of bandwidth an unconfigured queue gets depends on how much bandwidth the other queues are using.
If some queues use less than their allocated amount of bandwidth, the unconfigured queues can share the unused bandwidth. Sharing unused bandwidth is one of the key advantages of hierarchical port scheduling. Configured queues have higher priority for bandwidth than unconfigured queues, so if a configured queue needs more bandwidth, then less bandwidth is available for unconfigured queues. Unconfigured queues always receive a minimum amount of bandwidth based on their scheduling weight (1). If you map traffic to an unconfigured queue, to allocate bandwidth to that queue, configure a scheduler for the forwarding class that is mapped to the queue.
Default DCBX Advertisement
When you configure hierarchical scheduling on an interface, DCBX advertises each priority group, the priorities in each priority group, and the bandwidth properties of each priority and priority group.
If you do not configure hierarchical scheduling on an interface, DCBX advertises the automatically created default priority group and its priorities. DCBX also advertises the default bandwidth allocation of the priority group, which is 100 percent of the port bandwidth.
Default Scheduling and Classification Summary
If you do not configure scheduling on an interface:
Default classifiers classify ingress traffic.
Default schedulers schedule egress traffic.
DCBX advertises a single default priority group with 100 percent of the port bandwidth allocated to that priority group. All priorities (forwarding classes) are assigned to the default priority group and receive bandwidth based on their default schedulers. The default priority group is generated automatically and is not user-configurable.