Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

PFC Using DSCP at Layer 3 for Untagged Traffic

Overview

AI and ML applications are rapidly expanding in data centers. When dealing with AI and ML workloads and large data sets, one critical challenge is handling the size of the data. Offloading the computation to graphics processing units (GPUs) can significantly speed up this task. However, the data size and the model, especially with large language models (LLMs), often exceed the memory capacity of a single GPU. As a result, you commonly require multiple GPUs to achieve reasonable job completion times, especially for training.

The performance of an AI data center depends on the number of GPUs that are used and the efficiency of the network that connects them. Slowdowns in the network can lead to underutilization of GPUs and longer job completion times. Ethernet-based networks are becoming more popular as an alternative to InfiniBand for AI data center networking. One solution is the Remote Direct Memory Access (RDMA) over Converged Ethernet version 2 (RoCEv2) network.

RoCEv2 involves encapsulating RDMA protocol packets within UDP packets for transport over Ethernet networks. The RoCEv2 protocol utilizes priority-based flow control (PFC) to establish a drop-free network, while data center quantized congestion notification (DCQCN) provides end-to-end congestion control for RoCEv2. Junos OS Evolved supports DCQCN by combining explicit congestion notification (ECN) and PFC to enable end-to-end lossless AI Ethernet networking.

To support lossless IPv6 traffic across Layer 3 (L3) connections to Layer 2 (L2) subnetworks, you can configure PFC to operate using 6-bit Differentiated Services code point (DSCP) values from L3 headers of untagged VLAN traffic. You can use PFC with DSCP as an alternative to IEEE 802.1p priority values in L2 VLAN-tagged packet headers. You need DSCP-based PFC to support RoCEv2.

Benefits

  • Utilize Ethernet-based networks for AI-ML data center networking.

  • Improve network efficiency for large data sets.

  • Enable end-to-end lossless AI-ML Ethernet networking.

Configuration

Enable DSCP-Based PFC

  1. Map a forwarding class (FC) to a PFC priority using the pfc-priority statement.
  2. Define a congestion notification profile to enable PFC on traffic specified by a 6-bit DSCP value. Map the code-point configuration to no-loss queues.
  3. Set up a classifier for the DSCP value and the PFC-mapped FC.

Verify the Configuration

  1. Check the ingress port.
  2. Check the ingress port.
  3. Display the DSCP-based input CNP.
  4. Display which FCs are mapped to each PFC priority.

Platform Support

See Feature Explorer for platform and release support.