Understanding CoS IEEE 802.1p Priorities for Lossless Traffic Flows
The switch supports up to six lossless forwarding classes. (Junos OS Release 12.3
increased support for lossless priorities from two lossless forwarding classes—the
default fcoe
and no-loss
forwarding classes—to a
maximum of six lossless forwarding classes.) Each forwarding class is mapped to an IEEE
802.1p code point (priority).
Junos OS Release 13.1 introduced support for up to six lossless forwarding classes on QFabric systems. Throughout this document, features introduced on standalone switches in Junos OS Release 12.3 are introduced on QFabric systems in Junos OS Release 13.1 unless otherwise noted.
Only switches with native Fibre Channel (FC) interfaces, such as the QFX3500, support native FC traffic and configuration as an FCoE-FC gateway. Throughout this document, features that pertain to native FC traffic and to FCoE-FC gateway configuration apply only to switches that support native FC interfaces.
The default configuration is the same as the default configuration in Junos OS
Release 12.2 and is backward-compatible. If you need only two (or fewer) lossless
forwarding classes, use the default configuration, in which the fcoe
and no-loss
forwarding classes are lossless. If you need more than two
lossless forwarding classes, you can use the two default lossless forwarding classes and
configure additional lossless forwarding classes. If you do not want to use the default
lossless forwarding classes, you can change them, or use only the lossless forwarding
classes that you explicitly configure.
Default Lossless Priority Configuration
If you do not explicitly configure forwarding classes, the system uses the default forwarding class configuration, which provides two default lossless forwarding classes (fcoe and no-loss). (If you change the forwarding class configuration, the changes apply to all traffic on that device because forwarding classes are global to a particular device.)
If you do not explicitly configure classifiers, and you do not explicitly configure flow control to pause output queues (configured in the output stanza of the CNP), the default classifier and the default output queue pause configurations are applied to all Ethernet interfaces on the switches (or Node devices). You can override the default classifier and the default output queue pause configuration on a per-interface basis by applying an explicit configuration to an Ethernet interface. The default configuration is used on all Ethernet interfaces that do not have an explicit configuration.
If you do not configure flow control on output queues, the default configuration uses a one-to-one mapping of IEEE 802.1p code points (priorities) to output queues by number. For example, priority 0 (code point 000) is mapped to queue 0, priority 1 (code point 001) is mapped to queue 1, and so on. If you do not use the default configuration, you must explicitly configure flow control on each output queue that you want to enable for PFC pause in the output stanza of the CNP.
In the default configuration, only queue 3 and queue 4 are enabled to respond to pause messages from the connected peer. For queue 3 to respond to pause messages, priority 3 (code point 011) must be enabled for PFC in the input stanza of the CNP. For queue 4 to respond to pause messages, priority 4 (code point 100) must be enabled for PFC in the input stanza of the CNP.
The default configuration provides the following lossless behavior:
Two default lossless forwarding classes (the
no-loss
packet drop attribute is applied to these forwarding classes automatically):fcoe—Mapped to output queue 3 no-loss—Mapped to output queue 4A default classifier that maps the fcoe forwarding class to IEEE 802.1p priority 3 (011) and the no-loss forwarding class to IEEE 802.1p priority 4 (100)
Priority-based flow control (PFC) enabled on Ethernet interface output queues 3 and 4 when those queues carry lossless traffic (traffic that is mapped to the fcoe and no-loss forwarding classes, respectively).
On switches that can be configured as an FCoE-FC gateway, native FC interfaces (NP_Ports), with default flow control enabled on output queue 3 (IEEE 802.1p priority 3) for FCoE/FC traffic.
DCBX is enabled on all interfaces in autonegotiation mode, and automatically exchanges FCoE application protocol type, length, and values (TLVs) on interfaces that carry FCoE traffic. However, if you explicitly configure DCBX protocol TLV exchange for any application, then you must explicitly configure protocol TLV exchange for every application for which you want DCBX to exchange TLVs, including FCoE.
On Ethernet ports, PFC buffer calculations use the following default values to determine the headroom buffer size:Cable length—100 meters (approximately 328 feet)MRU for priority 3 traffic—2500 bytes MRU for priority 4 traffic—9216 bytesMaximum transmission unit (MTU)—1522 (or the configured MTU value for the interface)
Note:If you configure flow control on a priority that is not one of the default flow control priorities, the default MRU value is 2500 bytes. For example, if you configure flow control on priority 5 and you do not configure an MRU value, the default MRU value is 2500 bytes.
In addition, to support lossless transport, PFC must be enabled explicitly on the lossless IEEE 802.1p priorities (code points) on ingress Ethernet interfaces; no default PFC configuration is applied at ingress interfaces. If you do not enable PFC on lossless priorities, those priorities might experience packet loss during periods of congestion. For example, if you want lossless FCoE traffic and you are using the default fcoe forwarding class, you use a CNP to enable PFC on priority 3 (code point 011), and apply that CNP to all ingress interfaces that carry FCoE traffic.
You can override the default classifier and the default output queue pause configuration on a per-interface basis by applying an explicit configuration to an Ethernet interface.
The default CoS configuration is backward-compatible with the default CoS configuration of software releases before Junos OS Release 12.3. If you explicitly configure lossless transport, ensure that the input and output queues corresponding to the lossless forwarding classes are explicitly configured for PFC pause.
Table 1 summarizes the default forwarding classes and their mapping to output queues, IEEE 802.1p priorities, and drop attributes.
Forwarding Class Name |
Output Queue |
Priority |
Drop Attribute |
---|---|---|---|
best-effort |
0 |
0 |
drop |
fcoe |
3 |
3 |
no-loss |
no-loss |
4 |
4 |
no-loss |
network-control |
7 |
7 |
drop |
On switches that use the same forwarding classes and output queues for unicast and multidestination (multicast, broadcast, and destination lookup fail) traffic, these forwarding classes carry both unicast and multidestination traffic. Only unicast traffic is treated as lossless traffic. Multidestination traffic is not treated as lossless traffic, even on lossless output queues.
On switches that use different forwarding classes and output queues for unicast and multidestination traffic, there is one default multidestination forwarding class named mcast, which is mapped to output queue 8 with a drop attribute of drop. (Incoming multidestination traffic on all IEEE 802.1p priorities is mapped to the mcast forwarding class by default.)
Configuring Lossless Priorities
To configure more than two lossless priorities (forwarding classes), or to change the default mapping of lossless forwarding classes to priorities and paused output queues, you must explicitly configure the switch instead of using the default configuration. Configuring lossless priorities includes:
Configuring forwarding classes with the no-loss packet drop attribute.
Using a CNP to configure PFC on ingress interfaces and flow control (PFC) on egress interfaces.
Configuring a classifier to map IEEE 802.1p priorities (code points) to the correct forwarding classes (the forwarding classes for which you want lossless transport).
If you expect a large amount of lossless traffic on your network and configure multiple lossless traffic classes, ensure that you reserve enough scheduling resources (bandwidth) and buffer space to support the lossless flows. (For switches that support shared buffer configuration, Understanding CoS Buffer Configuration describes how to configure buffers and provides a recommended buffer configuration for networks with larger amounts of lossless traffic. Buffer optimization is automatic on switches that use virtual output queues.)
In addition, on Ethernet interfaces, DCBX must exchange the appropriate application protocol TLVs for the lossless traffic. On switches that can act as an FCoE-FC gateway, you need to remap the FCoE priority on native FC interfaces if your network uses a priority other than 3 (IEEE code point 011) for FCoE traffic. This section describes:
- Configuring Lossless Forwarding Classes (Packet Drop Attribute)
- Congestion Notification Profiles (PFC Configuration)
- Configuring DCBX (Application Protocol TLV Exchange)
- Fate Sharing Among Traffic Classes
- Transit Switch Configuration Versus FCoE-FC Gateway Configuration
- Configuration Results and Commit Checks
Configuring Lossless Forwarding Classes (Packet Drop Attribute)
Junos OS Release 12.3 introduced the no-loss parameter for forwarding class configuration. (Although it uses the same name, this is not the no-loss default forwarding class. It is a packet drop attribute you can specify to configure any forwarding class as a lossless forwarding class.)
On switches that use different forwarding classes for unicast and multidestination traffic, the forwarding class must be a unicast forwarding class. On switches that use the same forwarding classes for unicast and multidestination traffic, only unicast traffic receives lossless treatment.
You can configure up to six forwarding classes (depending on
system architecture and the availability of system resources) as lossless
forwarding classes by including the no-loss
drop attribute
at the [edit class-of-service forwarding-classes class forwarding-class-name queue-num queue-number]
hierarchy level.
If you use the default fcoe or no-loss forwarding classes, they include the no-loss drop attribute by default. If you explicitly configure the fcoe or no-loss forwarding classes and you want to retain their lossless behavior, you must include the no-loss drop attribute in the configuration.
All forwarding classes mapped to the same output queue must have the same packet drop attribute. (All forwarding classes mapped to the same output queue must be either lossy or lossless. You cannot map both a lossy and a lossless forwarding class to the same queue.)
To avoid fate sharing (a congested flow affecting an uncongested flow), use a one-to-one mapping of lossless forwarding classes to IEEE 802.1p code points (priorities) and queues. Map each lossless forwarding class to a different queue, and classify incoming traffic into forwarding classes so that each forwarding class transports traffic of only one priority (code point).
The fcoe and no-loss forwarding classes are special cases, because in the default configuration, they are configured for lossless behavior (providing that you also enable PFC on the priorities mapped to the fcoe and no-loss forwarding classes in the CNP input stanza).
Table 2 summarizes the possible configurations of the fcoe and no-loss forwarding classes in Junos OS Release 12.3 and later, and the result of those configurations in terms of lossless traffic behavior. It is assumed that PFC, DCBX, and classifiers are properly configured.
Explicit (User-Configured) or Default Forwarding Class Configuration |
Packet Drop Attribute |
Result and Notes |
---|---|---|
Default |
Default |
The fcoe and no-loss forwarding classes are lossless. Note:
Even if you explicitly configure other forwarding classes (lossy or lossless forwarding classes), the fcoe and no-loss forwarding classes remain lossless because they are not explicitly configured. |
Explicit |
Not specified in the explicit forwarding class configuration |
The fcoe and no-loss forwarding classes are lossy because they do not include the no-loss drop attribute. |
Explicit |
No-loss |
The fcoe and no-loss forwarding classes are lossless. |
Explicit, configured in Junos OS Release 12.2 or earlier |
Not specified (packet drop attribute was not available before Junos OS Release 12.3) |
The fcoe and no-loss forwarding classes are lossy in Junos OS Release 12.3 and later because they do not include the no-loss drop attribute. Note:
To retain lossless behavior, before you upgrade to Junos OS Release 12.3, delete the explicit configuration so that the system uses the default configuration. Alternatively, you can reconfigure the forwarding classes with the no-loss packet drop attribute after upgrading to Junos OS Release 12.3 or later. |
For all other forwarding classes except the fcoe
and no-loss
forwarding classes, you must explicitly configure lossless
transport by specifying the no-loss packet drop attribute, because
the default configuration for all other forwarding classes is lossy
(the no-loss packet drop attribute is not applied).
Congestion Notification Profiles (PFC Configuration)
Use CNPs to configure lossless PFC characteristics on input and output interfaces.
The input stanza of a CNP enables PFC on specified IEEE 802.1p priorities (code points) and fine-tunes headroom buffer settings by configuring the maximum receive unit (MRU) value and cable length on ingress interfaces.
The output stanza of a CNP enables PFC (flow control) on output queues for specified IEEE 802.1p priorities so that the queues can respond to PFC pause messages from the connected peer on the priority of your choice. (By default, output queues 3 and 4 respond to received PFC messages when those queues carry lossless traffic in the fcoe and no-loss forwarding classes, respectively.)
To achieve lossless transport, the priority paused at the ingress interfaces must match the priority paused at the egress interfaces for a given traffic flow. For example, if you configure ingress interfaces to pause traffic tagged with IEEE 802.1p priority 5 (code point 101) and priority 5 traffic is mapped to output queue 5, then you must also configure the corresponding output interfaces to pause priority 5 on queue 5. In addition, the forwarding class mapped to queue 5 must be configured as a lossless forwarding class (using the no-loss drop attribute).
Any change to the PFC configuration on a port temporarily blocks the entire port (not just the priorities affected by the PFC change) so that the port can implement the change, then unblocks the port. Blocking the port stops ingress and egress traffic, and causes packet loss on all queues on the port until the port is unblocked.
A change to the PFC configuration means any change to a CNP, including changing the input portion of the CNP (enabling or disabling PFC on a priority, or changing the MRU or cable-length values) or changing the output portion the CNP that enables or disables output flow control on a queue. A PFC configuration change only affects ports that use the changed CNP.
The following actions change the PFC configuration:
Deleting or disabling a PFC configuration (input or output) in a CNP that is in use on one or more interfaces. For example:
An existing CNP with an input stanza that enables PFC on priorities 3, 5, and 6 is configured on interfaces xe-0/0/20 and xe-0/0/21.
We disable the PFC configuration for priority 6 in the input CNP, and then commit the configuration.
The PFC configuration change causes all traffic on interfaces xe-0/0/20 and xe-0/0/21 to stop until the PFC change has been implemented. When the PFC change has been implemented, traffic resumes.
Configuring a CNP on an interface. (This changes the PFC state by enabling PFC on one or more priorities.)
Deleting a CNP from an interface. (This changes the PFC state by disabling PFC on one or more priorities.)
Configuring Input Interface Flow Control (PFC and Headroom Buffer Calculation)
On Ethernet interfaces, the input stanza of the CNP enables PFC on specified priorities so that the ingress interface can send a pause message to the connected peer during periods of congestion. Input CNPs also fine-tune the headroom buffers used for PFC support by allowing you to configure the MRU value and cable length (if you do not want to use the default configuration).
Headroom buffers support lossless transport by storing the traffic that arrives at an interface after the interface sends a PFC flow control message to pause incoming traffic. Until the connected peer receives the flow control message and pauses traffic, the interface continues to receive traffic and must buffer it (and the traffic that is still on the wire after the peer pauses) to prevent packet loss.
The system uses the MRU and the length of the attached physical cable to calculate buffer headroom allocation. The default configuration values are:
MRU for priority 3 traffic—2500 bytes
MRU for priority 4 traffic—9216 bytes
Cable length—100 meters (approximately 328 feet)
If you configure flow control on a priority that is not one of the default flow control priorities, the default MRU value is 2500 bytes. For example, if you configure flow control on priority 5 and you do not explicitly configure an MRU value, the default MRU value is 2500 bytes.
You can fine-tune the MRU and the cable length to adjust the size of the headroom buffer on an interface. The switch has a shared global buffer pool and dynamically allocates headroom buffer space to lossless queues as needed.
A lower MRU or a shorter cable length reduces the amount of headroom buffer required on an interface and leaves more headroom buffer space for other interfaces. A higher MRU or a longer cable length increases the amount of headroom buffer space required on an interface and leaves less headroom buffer space for other interfaces.
In many cases, you can better utilize the headroom buffers by reducing the MRU value (for example, an MRU of 2180 is sufficient for most FCoE networks) and by reducing the cable length value if the physical cable is less than 100 meters long.
When you configure the headroom buffers by changing the MRU or the cable length, and commit the configuration, the system performs a commit check and rejects the configuration if sufficient headroom buffer space is not available.
However, the system does not perform a commit check but instead returns a syslog error if:
The buffers are configured on a LAG interface.
The default classifier is used on the interface (instead of a user-configured classifier).
The interface has not been created yet.
Configuring Output Interface Flow Control (PFC)
On Ethernet interfaces, you can use the output stanza of the CNP to configure flow control on output queues and enable PFC pause response on specified IEEE 802.1p priorities.
On switches that use different output queues for unicast and multidestination traffic, the queue must be a unicast output queue.
By default, output queues 3 and 4 are enabled for PFC pause on priorities 3 (IEEE 802.1p code point 011) and 4 (IEEE 802.1p code point 100). The default PFC pause response supports the default lossless forwarding class configuration, which maps the fcoe forwarding class to queue 3 and priority 3, and maps the no-loss forwarding class to queue 4 and priority 4.
Configuring PFC on output queues enables you to pause any priority on any output queue on any Ethernet interface. Output flow control enables you to use more than two output queues to support lossless traffic flows (you can configure up to six lossless forwarding classes and map them to different output queues that are enabled for PFC pause). Output queue flow control also enables you to support multiple lossless forwarding classes (each mapped to a different priority and output queue) for one class of traffic.
Output flow control only works when PFC is enabled in the CNP input stanza on the corresponding priorities on the interface. For example, if you enable output flow control on priority 5 (IEEE 802.1p code point 101), then you must also enable PFC in the CNP on the input stanza on priority 5.
For example, if the converged Ethernet network uses two different priorities for FCoE traffic (for example, priority 3 and priority 5), then you can classify those priorities into different lossless forwarding classes that are mapped to different output queues:
Configure two lossless forwarding classes for FCoE traffic, with each forwarding class mapped to a different output queue. For example, you could use the default fcoe forwarding class, which is mapped to queue 3, and you could configure a second lossless forwarding class called fcoe1 and map it to queue 5. The fcoe forwarding class is for priority 3 FCoE traffic (code point 011), and the fcoe1 forwarding class is for priority 5 (code point 101) FCoE traffic.
Configure a classifier that maps each forwarding class to the desired IEEE 802.1p code point (priority). If FCoE traffic on both priorities uses one interface, the classifier must classify both forwarding classes to the correct priorities. If FCoE traffic of different priorities uses different interfaces, the classifier configuration on each interface must map the correct priority to the corresponding lossless forwarding class.
Apply the classifier to the interfaces that carry FCoE traffic. The classifier determines the mapping of forwarding classes to priorities on each interface.
To configure lossless transport for these forwarding classes, you also need to:
Enable PFC on the two priorities (3 and 5 in this example) at the ingress interfaces in the CNP input stanza.
Configure PFC on the output queues and priorities for the forwarding classes in the CNP output stanza so that the interface can respond to pause messages received from the connected peer.
Note:When you configure the CNP on an interface, all ingress and egress traffic is blocked until the configuration is implemented, then the interface is unblocked and traffic resumes. During the time the interface is blocked, all queues on the interface experience packet loss.
Configure DCBX to exchange application protocol TLVs on both FCoE priorities.
If you do not configure flow control to pause output queues, the default configuration uses a one-to-one mapping of IEEE 802.1p code points (priorities) to output queues by number. For example, priority 0 (code point 000) is mapped to queue 0, priority 1 (code point 001) is mapped to queue 1, and so on. By default, only queues 3 and 4 are enabled to respond to pause messages from the connected peer, and you must explicitly enable PFC on the corresponding priorities in the CNP input stanza to achieve lossless behavior.
If you do not use the default configuration, you must explicitly configure flow control on each output queue that you want to enable for PFC pause. For example, if you explicitly configure flow control on output queue 5, the default configuration is no longer valid, and only output queue 5 is enabled for PFC pause. Output queues 3 and 4 are no longer enabled for PFC pause, so traffic using those queues no longer responds to PFC pause messages even if the corresponding forwarding class is configured with the no-loss drop attribute. To retain the pause configuration on output queues 3 and 4 and configure flow control on queue 5, you need to explicitly configure flow control on queues 3, 4, and 5.
On switches that use different output queues for unicast and multidestination traffic, you cannot configure flow control to pause a multidestination output queue. You can configure flow control to pause only unicast output queues. On switches that use the same output queues for unicast and multidestination traffic, only unicast traffic receives lossless treatment.
Output Interface Flow Control Profiles
Configuring the CNP output stanza creates an output flow control profile that tells egress ports the queues on which the Ethernet interface should respond to PFC pause messages. Although you can create an unlimited number of CNPs that contain input stanzas only, the number of CNPs that you can configure with output stanzas is limited:
For standalone switches that are not part of a QFabric system, you can configure up to two output interface flow control profiles. (You can configure up to two CNPs with output stanzas.)
For QFabric systems, you can configure one output interface flow control profile per Node device. (You can configure one CNP with an output stanza per Node device.)
There are a total of four output flow control profiles.
The system has a default output flow control profile that is applied to all Ethernet interfaces when the CNP attached to the interface has only an input stanza and does not include an output stanza. The default profile responds to PFC pause messages received on queue 3 (for priority 3, for the default fcoe forwarding class) and on queue 4 (for priority 4, for the default no-loss forwarding class), and is effective only if PFC is configured on those priorities in the CNP input stanza.
Additionally, the system has two internal output flow control profiles that it applies automatically to fabric (FTE) ports and to native FC interfaces (NP_Ports). When the switch is not part of a QFabric system, the profile normally used for FTE ports is available for user configuration and provides a second user-configurable profile. (That is why standalone switches have two user-configurable output flow control profiles, but Node devices on a QFabric system have only one user-configurable output flow control profile.)
Because one output CNP can configure PFC pause response on multiple output queues (priorities), one user-configurable output CNP is usually flexible enough to specify the desired PFC response on all programmed interfaces.
Each port can use one output flow control profile. You cannot apply more than one profile to one port.
Output flow control profiles can be expressed in table format. For example, Table 3 shows the default output flow control profile that pauses priorities 3 and 4 on queues 3 and 4 (remember that PFC must also be enabled on code points 3 and 4 in the CNP input stanza in order for PFC to work):
IEEE 802.1p Priority Specified in Received PFC Frame |
Paused Output Queue |
---|---|
0 (000) |
— |
1 (001) |
— |
2 (010) |
— |
3 (011) |
3 |
4 (100) |
4 |
5 (101) |
— |
6 (110) |
— |
7 (111) |
— |
Table 4 is an example of a user-configured output flow control profile. Using the example from the preceding section, the CNP output stanza configures flow control on output queue 5, and also explicitly configures output flow control on queues 3 and 4 for the fcoe and no-loss forwarding classes. (If you explicitly configure an output CNP, you must explicitly configure every output queue that you want to respond to PFC messages, because the user-configured profile overrides the default profile. If this example did not include queues 3 and 4, those queues would no longer respond to received PFC messages.)
IEEE 802.1p Priority Specified in Received PFC Frame |
Paused Output Queue |
---|---|
0 (000) |
— |
1 (001) |
— |
2 (010) |
— |
3 (011) |
3 |
4 (100) |
4 |
5 (101) |
5 |
6 (110) |
— |
7 (111) |
— |
Remember that you must also enable PFC on code points 3, 4, and 5 in the CNP input stanza for this configuration to work. When you configure the CNP on an interface, all ingress and egress traffic is blocked until the configuration is implemented, then the interface is unblocked and traffic resumes. During the time the interface is blocked, all queues on the interface experience packet loss.
Configuring PFC Across Layer 3 Interfaces on QFX5210, QFX5200, QFX5100, EX4600, and QFX10000 Switches
Enabling PFC on traffic flows is based on the IEEE 802.1p code point (priority) in the priority code point (PCP) field of the Ethernet frame header (sometimes known as the CoS bits). To enable PFC on traffic that crosses Layer 3 interfaces, the traffic must be classified by its IEEE 802.1p code point, not by its DSCP (or DSCP IPv6) code point.
See Understanding PFC Functionality Across Layer 3 Interfaces for a conceptual overview of how to enable PFC on traffic across Layer 3 interfaces. See Example: Configuring PFC Across Layer 3 Interfaces for an example of how to configure PFC on traffic that traverses Layer 3 interfaces.
Configuring DCBX (Application Protocol TLV Exchange)
For applications that require lossless transport, DCBX exchanges application protocol TLVs with the connected peer interface. By default, DCBX advertises FCoE application protocol TLVs on all interfaces that are enabled for DCBX, and by default, DCBX is enabled on all interfaces. DCBX advertises no other applications by default.
For each application (for example, iSCSI) that you want to configure for lossless transport, you must enable the interfaces which carry that application traffic to exchange DCBX protocol TLVs with the connected peer. The TLV exchange allows the peer interfaces to negotiate a compatible configuration to support the application.
If you configure DCBX to advertise any application, the default DCBX advertisement is overridden, and DCBX advertises only the configured applications. If you want an interface to advertise only the FCoE application, you do not have to configure DCBX application protocol TLV exchange; instead, you can use the default configuration.
If you want DCBX to advertise other applications, you must explicitly configure an application map and apply it to the interfaces on which you want to exchange protocol TLVs for those applications. If you want to exchange FCoE application protocol TLVs in addition to other application protocol TLVs, you must also explicitly configure the FCoE application in the application map. Understanding DCBX Application Protocol TLV Exchange describes how application mapping works.
Lossless transport also requires that you enable PFC on the correct priority (IEEE 802.1p code point) on the ingress interfaces using an input CNP. If the priority you pause at the ingress interfaces is not mapped to queue 3 or queue 4 (the two output queues that are enabled for PFC pause flow control by default), then you must also enable the output queues that correspond to paused input priorities to pause using the output stanza of the CNP.
Fate Sharing Among Traffic Classes
You can configure different lossless (or lossy) traffic flows to share fate—that is, to receive the same CoS treatment.
Fate sharing is not desirable for I/O convergence. Instead of independent control of the fate of each type of flow, different types of flows receive the same treatment. Fate sharing is particularly undesirable for lossless flows. If one lossless flow experiences congestion and must be paused, that affects flows that share fate with the congested flow even if the other flows are not experiencing congestion, and also can cause ingress port congestion. If your network requires that all 802.1p priorities be lossless, you can achieve that by allowing some fate sharing among the eight priorities by spreading them across up to six lossless forwarding classes.
If the number of lossless priorities is less than or equal to the number of configured lossless forwarding classes, then you can avoid fate sharing by configuring a one-to-one mapping of forwarding classes to IEEE 802.1p code points (priorities) and output queues. (Each forwarding class should be mapped to a different output queue and classified to a different priority.)
If you want to configure different traffic flows to share fate, two fate-sharing configurations are supported: mapping one forwarding class to more than one IEEE 802.1p code point (priority), and mapping two forwarding classes to the same output queue:
If you map one lossless forwarding class to more than one priority, the traffic tagged with each of the priorities uses the same CoS properties associated (the CoS properties associated with the forwarding class). For example, configuring a forwarding class called fc1, mapping it to queue 1, and mapping it to code points 101 and 110 using a classifier named classify1 results in the traffic tagged with priorities 101 and 110 sharing fate:
user@switch# set class-of-service forwarding-classes class fc1 queue-num 1 no-loss user@switch# set class-of-service classifiers ieee-802.1 classify1 forwarding class fc1 loss-priority low code-points 101 user@switch# set class-of-service classifiers ieee-802.1 classify1 forwarding class fc1 loss-priority low code-points 110
In this case, if the traffic mapped to either priority experiences congestion, both priorities are paused because they are mapped to the same forwarding class and are therefore treated similarly.
If you map multiple lossless forwarding classes to the same output queue, the traffic mapped to the forwarding classes uses the same output queue. This increases the amount of traffic on the queue, and can create congestion that affects all of the traffic flows that are mapped to the queue. For example, configuring two forwarding classes called fc1 and fc2, mapping both forwarding classes to queue 1, and mapping the forwarding classes to code points 101 and 110 (respectively) using a classifier named classify1, results in the traffic tagged with priorities 101 and 110 sharing fate on the same output queue:
user@switch# set class-of-service forwarding-classes class fc1 queue-num 1 no-loss user@switch# set class-of-service forwarding-classes class fc2 queue-num 1 no-loss user@switch# set class-of-service classifiers ieee-802.1 classify1 forwarding class fc1 loss-priority low code-points 101 user@switch# set class-of-service classifiers ieee-802.1 classify1 forwarding class fc2 loss-priority low code-points 110
In this case, even though the two forwarding classes use different IEEE 802.1p priorities, if one forwarding class experiences congestion, it affects the other forwarding class. The reason is that if the output queue is paused because of congestion on either forwarding class, all traffic that uses that queue is paused. Since both forwarding classes are mapped to the queue, the traffic mapped to both forwarding classes is paused.
Note:If you map more than one forwarding class to a queue, all of the forwarding classes mapped to the same queue must have the same packet drop attribute (all of the forwarding classes must be lossy, or all of the forwarding classes mapped to a queue must be lossless).
Transit Switch Configuration Versus FCoE-FC Gateway Configuration
On a transit switch (all Ethernet ports, no native FC ports) that forwards FCoE traffic (or other traffic that requires lossless transport across the Ethernet network), the configuration of classifiers, lossless forwarding classes, DCBX, and PFC on ingress and egress interfaces to support lossless transport is as described in this document.
When a switch acts as an FCoE-FC gateway (if native FC interfaces are supported on your switch), the system uses native FC interfaces (NP_Ports) to connect to the FC switch (or FCoE forwarder) at the FC network edge. You cannot apply CNPs or DCBX to native FC interfaces, only to Ethernet interfaces.
On an FCoE-FC gateway, the Ethernet interface configuration of classifiers, DCBX, and PFC is the same as the Ethernet interface configuration on a transit switch. The configuration of lossless forwarding classes is also the same.
However, supporting lossless transport on native FC interfaces requires that you rewrite the IEEE 802.1p priority value if your network uses any priority other than 3 (IEEE code point 011) for FCoE traffic. If your network uses priority 3 for FCoE traffic, you can and should use the default configuration on native FC interfaces.
By default, native FC interfaces tag packets with priority 3 when they encapsulate the incoming FC packets in Ethernet. If your FCoE network uses a different priority than 3 for FCoE traffic, you need to rewrite the priority value to the value that your network uses on the FC interface, classify the FCoE traffic to the correct priority on the Ethernet interfaces, and enable PFC on the correct priority on the Ethernet interfaces, as described in Understanding CoS IEEE 802.1p Priority Remapping on an FCoE-FC Gateway.
Configuration Results and Commit Checks
Different configurations of forwarding classes and their drop attributes, classifiers, CNPs (PFC flow control), and Ethernet PAUSE (IEEE 802.3X flow control) result in different system behaviors.
Table 5 describes the results of the possible lossless transport configurations in each case. The assumption in the Result column is that the system’s buffer headroom calculation resulted in a successful configuration.
However, if the system calculates that there is insufficient buffer space to support the configuration, a commit check prevents you from committing the configuration on an individual Ethernet interface. For LAG interfaces, the system does not issue a commit check error but instead issues a syslog message.
After you configure lossless transport for a LAG interface, be sure to check the syslog messages to confirm that the commit was successful.
Classifier Configuration |
Congestion Notification Profile Configuration |
Ethernet PAUSE (IEEE 802.3X) Configuration |
Result |
---|---|---|---|
None (default classifier) |
None |
None |
System default configuration. No flows are lossless. To achieve lossless behavior for the default fcoe and no-loss forwarding classes, you must configure an input CNP to enable PFC on their IEEE 802.1p code points (011 and 100 respectively). |
Classifier with no lossless forwarding classes |
None |
None |
No lossless traffic flows are configured; all traffic is best effort. |
Classifier with at least one lossless forwarding class |
None |
None |
Because no CNP is attached to interfaces, PFC is not enabled on the code point of the lossless traffic and no headroom buffer is allocated to the lossless queue, so packets can drop during periods of congestion. This configuration does not achieve lossless behavior. |
None (default classifier) |
PFC enabled on the fcoe and no-loss forwarding class code points (priorities) |
None |
The default classifier classifies traffic into two lossless forwarding classes, fcoe and no-loss. The CNP enables PFC on the priorities mapped to both lossless forwarding classes, resulting in lossless behavior for traffic mapped to the fcoe and no-loss forwarding classes. |
None (default classifier) |
None |
Flow control enabled |
The system calculates buffer headroom for the physical link based on the interface MTU and the default cable length. The system does not calculate buffer headroom for individual output queues. Because Ethernet PAUSE is enabled on the link instead of PFC being enabled on the lossless priorities, the entire link is paused during periods of congestion. This configuration results in lossless behavior for all of the forwarding classes on the link, but because all traffic is paused, this can cause greater overall network congestion. |
Classifier with at least one lossless forwarding class |
PFC enabled on the lossless forwarding class code points (priorities) |
None |
Headroom buffer allocated only to priorities that are mapped to the lossless forwarding classes and on which PFC is enabled. This configuration achieves lossless behavior for the lossless forwarding classes. |
Classifier with no lossless forwarding classes |
None |
Flow control enabled |
The system calculates buffer headroom for the physical link based on the interface MTU and the default cable length, and it pauses all traffic on the link during periods of congestion. |
Classifier with at least one lossless forwarding class |
None |
Flow control enabled |
The system calculates buffer headroom for the physical link based on the interface MTU and the default cable length, and it pauses all traffic on the link during periods of congestion. |
Classifier with at least one lossless forwarding class |
PFC enabled on the lossless forwarding class code points (priorities) |
Flow control enabled on a different interface than the interface with the CNP |
The system checks the available buffer space for both the PFC-enabled priorities and for the other link. If sufficient buffer space is available, the lossless forwarding classes configured with PFC on one interface and also all of the traffic on the link with Ethernet PAUSE enabled achieve lossless behavior. |
If you attempt to configure both PFC and Ethernet PAUSE on a link, the system returns a commit error. PFC and Ethernet PAUSE are mutually exclusive configurations on an interface.
Configuration Rules and Recommendations
Keep in mind the following configuration rules and recommendations when you configure lossless traffic flows:
You can configure a maximum of six lossless forwarding classes (forwarding classes with the no-loss packet drop attribute).
All forwarding classes that you map to the same queue must have the same packet drop attribute (all of the forwarding classes must be lossy, or all of the forwarding classes must be lossless).
Do not configure weighted random early detection (WRED) on lossless forwarding classes. (Do not associate a drop profile with a forwarding class that has the no-loss packet drop attribute.)
On switches that use different forwarding classes and output queues for unicast and multidestination traffic, you cannot configure flow control to pause a multidestination output queue. You can configure PFC flow control only to pause unicast output queues.
On switches that use different forwarding classes and output queues for unicast and multidestination traffic, forwarding classes mapped to multidestination queues (queues 8 through 11) cannot have the no-loss packet drop attribute. (Multidestination forwarding classes cannot be configured as lossless forwarding classes.)
Lossless Transport Features Introduced in Junos OS Release 12.3 (Legacy Non-ELS CLI)
Support for lossless transport introduced in Junos OS Release 12.3 includes:
Configuring up to six lossless forwarding classes.
Configuring PFC pause on output queues to program the output queues that can respond to PFC pause messages received from the connected peer. The priorities you pause on output queues must match the priorities on which you enable PFC on the corresponding ingress interfaces. For example, if you program output queues to pause priorities 3 (011) and 5 (101), then you must also enable pause on priorities 3 and 5 on the corresponding ingress interfaces. Configuring flow control on the output queues and enabling PFC on the corresponding input queues allows you to pause up to six priorities (forwarding classes).
Controlling the headroom buffer on Ethernet interfaces by configuring the maximum receive unit (MRU) size for the traffic mapped to an IEEE 802.1p priority (configured per priority) and the length of the attached cable (configured per interface). The MRU size can range up to full jumbo packet size (9216 bytes).
Remapping (rewriting) IEEE 802.1p priorities on native Fibre Channel (FC) interfaces when the system is acting as an FCoE-FC gateway. If the Ethernet (FCoE) network uses a different IEEE 802.1p priority than priority 3 (011) for FCoE traffic, then you can use priority remapping to classify FCoE traffic into a lossless forwarding class mapped to that different priority (see Understanding CoS IEEE 802.1p Priority Remapping on an FCoE-FC Gateway).
Lossless transport still requires configuring previously existing features, including enabling PFC on the lossless priorities on ingress interfaces, and configuring classifiers to classify incoming traffic into lossless forwarding classes based on the IEEE 802.1p priority tag of the packet.
If you expect a large amount of lossless traffic on your network and configure multiple lossless traffic classes, ensure that you reserve enough scheduling resources (bandwidth) and lossless headroom buffer space to support the lossless flows. (Understanding CoS Buffer Configuration describes how to configure buffers and provides a recommended buffer configuration for networks with larger amounts of lossless traffic.)
Backward Compatibility with Junos OS Releases Earlier Than Release 12.3 (Legacy Non-ELS CLI)
The addition of the no-loss packet drop attribute to forwarding class configuration means that when you upgrade from an earlier release to Junos OS Release 12.3, the new software might not preserve the lossless forwarding class configuration of the fcoe and no-loss forwarding classes.
If you used the default forwarding class configuration for the fcoe and no-loss forwarding classes, the CoS configuration is backward-compatible. You do not have to do anything to preserve the lossless behavior of traffic that uses those forwarding classes when you upgrade to Junos OS Release 12.3. (This is because the default configuration of these two forwarding classes includes the no-loss packet drop attribute.)
However, if you explicitly configured the fcoe or the no-loss
forwarding class by including the set forwarding-classes class forwarding-class-name queue-num queue-number
statement at the [edit class-of-service]
hierarchy
level, then those forwarding classes are no longer lossless, they
are lossy. (They are lossy because explicit configuration in releases
earlier than Junos OS Release 12.3 did not use the no-loss packet
drop attribute.) In Junos OS Release 12.3 and later, you must include
the no-loss packet drop attribute in explicit forwarding class configurations
to configure a lossless forwarding class.
For example, before Junos OS Release 12.3, the following explicit configuration resulted in a lossless forwarding class:
user@switch# set class-of-service forwarding-classes class fcoe queue-num 3
However, in Junos OS Release 12.3, this configuration is lossy because it does not include the no-loss packet drop attribute. To preserve lossless behavior, after upgrading to Junos OS Release 12.3, you need to add the no-loss drop attribute:
user@switch# set class-of-service forwarding-classes class fcoe queue-num 3 no-loss
Alternatively, you can delete the explicit configuration before you upgrade to Junos OS Release 12.3 so that the system uses the default forwarding class, which is lossless:
user@switch# delete class-of-service forwarding-classes class fcoe queue-num 3
The explicit configuration of other forwarding classes does not affect the lossless (or lossy) state of the fcoe and no-loss forwarding classes, because only the fcoe and no-loss forwarding classes were lossless forwarding classes before Junos OS Release 12.3. For example, if you explicitly configured the best-effort forwarding class but you used the default fcoe and no-loss forwarding classes in Junos OS Release 12.2, then when you upgrade to Junos OS Release 12.3, the fcoe and no-loss forwarding classes are still lossless (and the best-effort forwarding classes retains its explicit configuration).
To achieve lossless behavior for the traffic belonging to any forwarding class, you must also use a CNP to enable PFC on the IEEE 802.1p priority mapped to the forwarding class and apply the CNP to the relevant interfaces, and ensure that DCBX exchanges the protocol TLVs for the application with the connected peer.