Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Navigation

Understanding CoS Flow Control (Ethernet PAUSE and PFC)

Flow control supports lossless transmission by regulating traffic flows so that frames are not dropped during periods of congestion. Flow control stops and resumes the transmission of network traffic between two nodes on a full-duplex Ethernet physical link. Controlling the flow prevents buffers on the nodes from overflowing and dropping frames. You configure flow control on a per-interface basis.

The QFX Series supports two methods of flow control:

  • IEEE 802.3X Ethernet PAUSE
  • IEEE 802.1Qbb priority-based flow control (PFC)

Both Ethernet PAUSE and PFC are link-level flow control mechanisms.

Ethernet PAUSE pauses transmission of all traffic on a link.

PFC decouples the pause function from the physical link and enables you to divide traffic on one physical link into eight priorities. You can think of the eight priorities as eight “lanes” of traffic that are mapped to forwarding classes and output queues. Each priority is mapped to a 3-bit IEEE 802.1p CoS code point flag in the VLAN header. You can enable PFC on one or more priorities (IEEE 802.1p code points) on a link. When PFC-enabled traffic is paused on a link, traffic that is not PFC-enabled continues to flow (or is dropped if congestion is severe enough).

Use Ethernet PAUSE when you want to prevent packet loss on all of the traffic on a link. Use PFC to prevent traffic loss only on specified types of traffic (for example, Fibre Channel over Ethernet traffic).

Note: Depending on the amount of traffic on a link or assigned to a priority, pausing traffic can cause head-of-line blocking and spread congestion through the network.

If you attempt to configure both Ethernet PAUSE and PFC on a link, the switch returns a commit error. Ethernet PAUSE and PFC are mutually exclusive configurations on an interface.

By default, all forms of flow control are disabled. You must explicitly enable flow control on interfaces to pause traffic.

Ethernet PAUSE

Ethernet PAUSE is a congestion relief feature that works by providing link-level flow control for all traffic on a full-duplex Ethernet link. Ethernet PAUSE works in both directions on the link. In one direction, an interface generates and sends PAUSE messages to stop the connected peer from sending more traffic. In the other direction, the interface responds to PAUSE messages it receives from the connected peer to stop sending traffic. Ethernet PAUSE also works on aggregated Ethernet interfaces. For example, if the connected peer interfaces are called Node A and Node B:

  • When the receive buffers on interface Node A reach a certain level of fullness, the interface generates and sends a PAUSE message to the connected peer (interface Node B) to tell the peer to stop sending frames. The Node B buffers store frames until the time period specified in the PAUSE frame elapses; then Node B resumes sending frames to Node A.
  • When interface Node A receives a PAUSE message from interface Node B, interface Node A stops transmitting frames until the time period specified in the PAUSE frame elapses; then Node A resumes transmission. (The Node A transmit buffers store frames until Node A resumes sending frames to Node B.)

    In this scenario, if Node B sends a PAUSE frame with a time value of 0 to Node A, the 0 time value indicates to Node A that it can resume transmission. This happens when the Node B buffer empties to below a certain threshold and the buffer can once again accept traffic.

Symmetric flow control means an interface has the same PAUSE configuration in both directions. The PAUSE generation and PAUSE response functions are both configured as enabled or they are both disabled. You configure symmetric flow control by including the flow-control statement at the [edit interfaces interface-name ether-options] hierarchy level.

Asymmetric flow control allows you to configure the PAUSE functionality in each direction independently on an interface. The configuration for generating PAUSE messages and for responding to PAUSE messages does not have to be the same. It can be enabled in both directions, disabled in both directions, or enabled in one direction and disabled in the other direction. You configure asymmetric flow control by including the configured-flow-control statement at the [edit interfaces interface-name ether-options] hierarchy level.

On any particular interface, symmetric and asymmetric flow control are mutually exclusive. Asymmetric flow control overrides and disables symmetric flow control. (If PFC is configured on an interface, the PFC configuration overrides Ethernet PAUSE flow control.) The QFX Series supports both symmetric and asymmetric flow control.

Symmetric Flow Control

Symmetric flow control configures both the receive and transmit buffers in the same state. The interface can both send PAUSE messages and respond to them (flow control is enabled), or the interface cannot send PAUSE messages or respond to them (flow control is disabled).

When you enable symmetric flow control on an interface, the PAUSE behavior depends on the configuration of the connected peer. With symmetric flow control enabled, the interface can perform any PAUSE functions that the connected peer can perform. (When symmetric flow control is disabled, the interface does not send or respond to PAUSE messages.)

Asymmetric Flow Control

Asymmetric flow control enables you to specify independently whether or not the interface receive buffer generates and sends PAUSE messages to stop the connected peer from transmitting traffic, and whether or not the interface transmit buffer responds to PAUSE messages it receives from the connected peer and stops transmitting traffic. The receive buffer configuration determines if the interface transmits PAUSE messages, and the transmit buffer configuration determines if the interface receives and responds to PAUSE messages:

  • Receive buffers on—Enable PAUSE transmission (generate and send PAUSE frames)
  • Transmit buffers on—Enable PAUSE reception (respond to received PAUSE frames)

You must explicitly set the flow control for both the receive buffer and the transmit buffer (on or off) to configure asymmetric PAUSE. Table 1 describes the configured flow control state when you set the receive (Rx) and transmit (Tx) buffers on an interface:

Table 1: Asymmetric Ethernet PAUSE Flow Control Configuration

Receive (Rx) Buffer

Transmit (Tx) Buffer

Configured Flow Control State

On

Off

Interface generates and sends Ethernet PAUSE messages. Interface does not respond to PAUSE messages (interface continues to transmit even if peer requests that the interface stop sending traffic).

Off

On

Interface responds to Ethernet PAUSE messages received from the connected peer, but does not generate or send PAUSE messages. (The interface does not request that the connected peer stop sending traffic.)

On

On

Same functionality as symmetric Ethernet PAUSE. Interface generates and sends PAUSE messages and responds to received PAUSE messages.

Off

Off

Ethernet PAUSE flow control is disabled.

The configured flow control is the PAUSE state configured on the interface.

On 1-Gigabit Ethernet interfaces, the QFX Series supports autonegotiation of Ethernet PAUSE with the connected peer. (The QFX Series does not support autonegotiation on 10-Gigabit Ethernet interfaces.) Autonegotiation enables the interface to exchange state advertisements with the connected peer so that the two devices can agree on the PAUSE configuration. Each interface advertises its flow control state to the connected peer using a combination of the PAUSE and ASM_DIR bits, as described in Table 2:

Table 2: Flow Control State Advertised to the Connected Peer (Autonegotiation)

Rx Buffer State

Tx Buffer State

PAUSE Bit

ASM_DIR Bit

Description

Off

Off

0

0

The interface advertises no PAUSE capability. This is equivalent to disabling flow control on an interface.

On

On

1

0

The interface advertises symmetric flow control (both the transmission of PAUSE messages and the ability to receive and respond to PAUSE messages).

On

Off

0

1

The interface advertises asymmetric flow control (the transmission of PAUSE messages, but not the ability to receive and respond to PAUSE messages).

Off

On

1

1

The interface advertises both symmetric and asymmetric flow control. Although the interface does not generate and send PAUSE requests to the peer, the interface supports both symmetric and asymmetric PAUSE configuration on the peer because the peer is not affected if the peer does not receive PAUSE requests. (If the interface responds to the peer’s PAUSE requests, that is sufficient to support either symmetric or asymmetric flow control on the peer.)

The flow control configuration on each switch interface interacts with the flow control configuration of the connected peer. Each peer advertises its state to the other peer. The interaction of the flow control configuration of the peers determines the flow control behavior (resolution) between them, as shown in Table 3. The first four columns show the PAUSE configuration on the local QFX Series and on the connected peer (also known as the link partner). The last two columns show the PAUSE resolution that results from the local and peer configurations on each interface. This illustrates how the PAUSE configuration of each interface affects the PAUSE behavior on the other interface.

Note: In the Resolution columns of the table, disabling PAUSE transmit means that the interface receive buffers do not generate and send PAUSE messages to the peer. Disabling PAUSE receive means that the interface transmit buffers do not respond to PAUSE messages received from the peer.

Table 3: Asymmetric Ethernet PAUSE Behavior on Local and Peer Interfaces

Local Interface (QFX Series)

Peer Interface

Local Resolution

Peer Resolution

PAUSE Bit

ASM_DIR Bit

PAUSE Bit

ASM_DIR Bit

0

0

Don’t care

Don’t care

Disable PAUSE transmit and receive

Disable PAUSE transmit and receive

0

1

0

Don’t care

Disable PAUSE transmit and receive

Disable PAUSE transmit and receive

0

1

1

0

Disable PAUSE transmit and receive

Disable PAUSE transmit and receive

0

1

1

1

Enable PAUSE transmit and disable PAUSE receive

Disable PAUSE transmit and enable PAUSE receive

1

0

0

Don’t care

Disable PAUSE transmit and receive

Disable PAUSE transmit and receive

1

0

1

Don’t care

Enable PAUSE transmit and receive

Enable PAUSE transmit and receive

1

1

0

0

Disable PAUSE transmit and receive

Disable PAUSE transmit and receive

1

1

0

1

Enable PAUSE receive and disable PAUSE transmit

Enable PAUSE transmit and disable PAUSE receive

1

1

Don’t care

Don’t care

Enable PAUSE transmit and receive

Enable PAUSE transmit and receive

Note: For your convenience, Table 3 replicates Table 28B-3 of Section 2 of the IEEE 802.X specification.

PFC

PFC is a lossless transport and congestion relief feature that works by providing granular link-level flow control for each IEEE 802.1p code point (priority) on a full-duplex Ethernet link. When the switch receive buffer fills to a threshold, the switch sends a pause frame to the sender to temporarily stop the sender from transmitting more frames. The buffer threshold must be low enough so that the sender has time to stop transmitting frames and the receiver can accept the frames already on the wire before the buffer overflows. The switch automatically sets queue buffer thresholds to prevent frame loss.

All of the other priorities on the link continue to send frames. Only the frames sent with the paused priority are not transmitted. When the receive buffer empties below another threshold, the switch sends a message that starts the flow again.

Configure PFC for a priority end to end along the entire data path to create a lossless lane of traffic on the network. You can selectively pause the traffic in any queue without pausing the traffic for other queues on the same link. You can create lossless lanes for traffic such as Fibre Channel over Ethernet (FCoE), LAN backup, or management while using standard frame-drop congestion management for IP traffic on the same link.

Potential consequences of link-level flow control are:

  • Head-of-line blocking
  • A paused priority that caused upstream devices to pause the same priority, thus spreading congestion back through the network

By definition, PFC supports symmetric pause only (as opposed to Ethernet PAUSE, which supports symmetric and asymmetric pause). With symmetric pause, a device can:

  • Transmit pause frames to pause incoming traffic.
  • Receive pause frames and stop sending traffic to a device whose buffer is too full to accept more frames.

When it receives a PFC frame, the QFX Series pauses traffic on egress queues based on the priorities that the PFC pause frame identifies. The priorities are 0 through 7. By default, the priorities map to queue numbers 0 through 7, respectively, and to specific forwarding classes, as shown in Table 4:

Table 4: Default PFC Priority to Queue and Forwarding Class Mapping

IEEE 802.1p Priority (Code Point)

Queue

Forwarding Class

0 (000)

0

best-effort

1 (001)

1

best-effort

2 (010)

2

best-effort

3 (011)

3

fcoe

4 (100)

4

no-loss

5 (101)

5

best-effort

6 (110)

6

network-control

7 (111)

7

network-control

For example, when the QFX Series receives a PFC pause frame that pauses priority 3, queue 3 is paused. If you do not want to use the default configuration, you can configure customized mapping of priorities to different queues and forwarding classes.

Note: By convention, deployments with converged server access typically use IEEE 802.1p priority 3 for FCoE traffic. The default forwarding class configuration sets the fcoe forwarding class as a lossless forwarding class that is mapped to queue 3, and default classifiers map incoming priority 3 traffic to the fcoe forwarding class. However, you must apply PFC to the entire FCoE data path to configure the end-to-end lossless behavior that FCoE traffic requires. If your network uses priority 3 for FCoE traffic, we recommend that you use the default configuration. If your network uses a priority other than 3 for FCoE traffic, you can configure lossless FCoE transport on any IEEE 80.21p priority as described in Understanding CoS IEEE 802.1p Priorities for Lossless Traffic Flows and Understanding CoS IEEE 802.1p Priority Remapping on an FCoE-FC Gateway.

You enable PFC on a priority on an ingress interface by configuring an input CNP and then associating the CNP with an interface. The CNP enables PFC on a specified IEEE 802.1 code point. When you associate the CNP with an interface, ingress traffic mapped to that priority uses PFC to pause transmission whenever the queue buffer fills to the pause threshold.

Although unicast traffic and multidestination (multicast, broadcast, and destination lookup fail) traffic must use different classifiers, you can map a unicast queue (queue 0 through 7) and a multidestination queue (queue 8, 9, 10, or 11) to the same PFC priority so that both unicast and multicast traffic use that priority. Do not map multidestination traffic to lossless priorities. Starting with Junos OS Release 12.3, you can map one priority to multiple output queues.

Note: You can attach a maximum of one CNP to an interface, but you can create an unlimited number of CNPs.

Lossless Transport Support

The QFX Series supports up to six lossless forwarding classes. For lossless transport, you must enable PFC on the IEEE 802.1p priority (code point) of lossless forwarding classes.

The following limitation applies to support lossless transport on QFabric systems only:

  • The internal cable length from the QFabric system Node device to the QFabric system Interconnect device cannot exceed 150 meters.

The default CoS configuration provides two lossless forwarding classes, fcoe and no-loss. If you explicitly configure lossless forwarding classes, you must include the no-loss packet drop attribute to enable lossless behavior, or the traffic is not lossless. For both default and explicit configuration, you must configure CNPs to enable PFC on the priority of the lossless traffic.

Note: Junos OS Release 12.2 introduced changes to the way the QFX Series handles lossless forwarding classes (the fcoe and no-loss forwarding classes).

In Junos OS Release 12.1, both explicitly configuring the the fcoe and no-loss forwarding classes and using the default configuration for these forwarding classes resulted in the same lossless behavior for traffic mapped to those forwarding classes.

However, in Junos OS Release 12.2, if you explicitly configure the fcoe or the no-loss forwarding class, that forwarding class is no longer treated as a lossless forwarding class. Traffic mapped to these forwarding classes is treated as lossy (best-effort) traffic. This is true even if the explicit configuration is exactly the same as the default configuration.

If your CoS configuration from Junos OS Release 12.1 or earlier includes the explicit configuration of the fcoe or the no-loss forwarding class, then when you upgrade to Junos OS Release 12.2, those forwarding classes are not lossless. To preserve the lossless treatment of these forwarding classes, delete the the explicit fcoe and no-loss forwarding class configuration before you upgrade to Junos OS Release 12.2.

See Overview of CoS Changes Introduced in Junos OS Release 12.2 for detailed information about this change and how to delete an existing lossless configuration.

In Junos OS Release 12.3, the default behavior of the fcoe and no-loss forwarding classes is the same as in Junos OS Release 12.2. However, in Junos OS Release 12.3, you can configure up to six lossless forwarding classes. All explicitly configured lossless forwarding classes must include the new no-loss packet drop attribute or the forwarding class is lossy.

Published: 2013-01-16