Skip to main content

Data Center Bridging Exchange Protocol Technology

1 Overview

DCBX (Data Center Bridging Exchange) is a protocol defined by IEEE 802.1Qaz standard. DCBX uses LLDP (Link Layer Discovery Protocol) as the transport mechanism to enable automatic negotiation and configuration of DCB (Data Center Bridging) features between network devices, thereby achieving lossless Ethernet transmission and enhanced QoS (Quality of Service) capabilities.

1.1 Functional Scenarios

DCBX protocol is a key enabling technology for implementing data center network automation. In data center network convergence scenarios, to achieve lossless Ethernet transmission, the PFC and ETS parameter configurations on both ends of a link must remain consistent. If administrators rely on manual configuration, not only is the workload enormous, but it is also prone to errors. DCBX, as a link discovery protocol, enables devices on both ends of a link to discover and exchange DCB configuration information, greatly reducing the administrator’s workload and lowering the probability of network failures caused by configuration errors.

1.2 Related Technologies

1.2.1 DCBX Protocol Main Functions

  • Discover the DCB configuration information of peer devices
  • Update peer device DCB parameters to the local device
  • Monitor changes in device DCB configuration

1.2.2 Supported DCB Configuration Information for Exchange

  • PFC (Priority-based Flow Control): Provides lossless transmission for specific priority classes, controls traffic congestion, and reduces packet loss
  • ETS (Enhanced Transmission Selection): Enables bandwidth allocation and priority control of different traffic flows, thereby achieving service quality management for different types of traffic
  • Application Priority: Configure service priorities carried by APP TLV in DCB protocol

1.2.3 LLDP DCBX Extension TLV

The DCBX protocol is based on LLDP protocol extensions, where DCB information is encapsulated in LLDP-specific extension TLVs. The Type field is fixed as 127, and the OUI field is 0x0080C2 (IEEE protocol standard).

LLDP Data Center Bridging Exchange Extension TLV

The DCBX TLV includes the ETS Configuration TLV, ETS Recommendation TLV, PFC Configuration TLV, and Application Priority TLV. The details are shown in the table below:

TLV
SubType
Length
Value
ETS Configuration TLV
09
25
ETS parameter configuration and bandwidth allocation
ETS Recommendation TLV
0A
25
ETS recommended configuration parameters
PFC Configuration TLV
0B
6
PFC enable/disable configuration and PFC capability negotiation
Application Priority TLV
0C
Variable
Application to priority mapping for PFC and Application Priority TLV

2 Technical Implementation

2.1 DCBX State Machine

Data Center Bridging Exchange (DCBX) configuration announcement, negotiation, and update behaviors are implemented through a state machine. The DCBX state machine operates on each port where DCBX is enabled. The DCBX state machine has five states, with its transition process as shown in the table below:

Current State
State Processing
Condition & Next State (Default Flow)
Condition & Next State (Optional Flow)
Local Configuration Collection
Initialize local configuration, local capability, and local consensus
Condition: Peer exists
Next State: Local Configuration Announcement
Condition: Peer does not exist
Next State: Configuration Change Monitoring
Local Configuration Advertisement


Advertises local configuration to endpoints
Condition: Peer exists, and local configuration is willing with consensus
Next State: Peer Configuration Collection
Condition: Peer does not exist, or local configuration is unwilling with consensus
Next State: Configuration Change Monitoring
Peer Configuration Collection
Initialize peer configuration, peer capability, and peer consensus
Condition: None
Next State: Local Configuration Update
Condition: Peer does not exist, or local configuration is unwilling with consensus, or peer is willing with consensus, and local MAC address is smaller
Next State: Configuration Change Monitoring
Local Configuration Update
Negotiate peer configuration with local configuration, perform negotiation based on certain result query data configuration, if inconsistent with local configuration, update the configuration in the database
Condition: None
Next State: Configuration Change Monitoring
Condition: Peer does not exist, or local configuration is unwilling with consensus, or peer is willing with consensus, and local MAC address is smaller
Next State: Configuration Change Monitoring
Configuration Change Monitoring
Monitor local and peer configuration changes
Condition: None
Next State: Configuration Change Monitoring
Condition: Peer exists with state change, or local configuration changes, or peer configuration changes
Next State: Local Configuration Collection

2.2 Data Center Bridging Exchange (DCBX) Workflow

The following section details the DCBX state machine workflow using PFC configuration as an example:

Data Center Bridging Exchange (DCBX) working flow

3 Application Cases

3.1 High-Performance Computing Applications

In modern large-scale, multi-cloud interconnected data centers, the network carries a wide variety of traffic. This includes mission-critical business traffic that is highly sensitive to latency and packet loss (such as storage, HPC, and real-time computing), as well as general data traffic that can tolerate some delay. Therefore, different priorities need to be assigned to different types of traffic to ensure the quality of service for key applications.

In the traditional model, administrators have to manually configure DCB functions (such as PFC, ETS, CN, etc.) on each switch. This method is not only inefficient but also highly prone to inconsistent configurations, which can lead to configuration failures and even further network outages. As shown in the figure below, this can result in packet loss or the spread of congestion due to PFC not being enabled end-to-end.

fault-configuration-diagram
  1. A switch experiences congestion and sends a PFC Pause frame to the server.
  2. The server, with PFC not enabled, continues to send traffic to the switch.
  3. The switch’s BUFFER usage exceeds its limit, leading to packet loss. This requires retransmission, which can increase latency or trigger a network outage.

As shown in the figure below, the DCBX protocol ensures the end-to-end consistency of DCB functions by enabling bidirectional capability discovery and configuration negotiation between devices. This greatly simplifies network deployment and maintenance, while also reducing the likelihood of network outages caused by inconsistent configurations.

dcbx-configuration-exchange-diagram-between-server-and-switch
  1. Exchange Switch Configuration Parameters and enable DCBX.
  2. The server enables DCBX and configures a willingness to receive, with optional PFC parameters.
  3. Additionally, uses LLDP extension fields to complete the configuration exchange.

Furthermore, as shown in the figure below, switches can also complete configuration exchanges with each other via the DCBX protocol, ensuring the consistency of DCB configurations along the forwarding path.

dcbx-configuration-exchange-diagram-between-switches
  1. Local switch configures PFC on priority queues 3 and 4 and enables DCBX to accept configurations.
  2. Peer switch configures PFC on priority queues 6 and 7 and enables DCBX.
  3. The local side discovers that the peer’s PFC configuration is inconsistent with its own and synchronizes the peer’s PFC configuration to the local side.

In the AI era, the massive scale of training and inference processes places extremely high demands on networks. DCBX achieves automated configuration and unified deployment of PFC and ETS lossless networking features, providing stable communication with low latency, zero packet loss, and high bandwidth. This is one of the key requirements for supporting high-efficiency AI compute clusters.

In a multi-pod, large-scale cluster environment, in addition to the DCBX configurations supported by the switches, the RoCE configurations between devices also need to be synchronized, which is also critical. The DCBX technology lays the foundation for achieving end-to-end RoCE configuration consistency in large-scale network deployment scenarios.