INT Technology: Buffer Drop Capture (BDC) and High Delay Capture (HDC)
1 Overview
With the rapid development of high-performance applications such as AI large model training and distributed computing, AI computing networks face increasingly stringent requirements for real-time performance and stability. Issues such as network latency and buffer overflow directly impact training efficiency and model accuracy. Traditional monitoring technologies (e.g., SNMP) struggle to meet complex scenario demands due to limitations such as low collection precision and insufficient granularity.
INT (In-band Network Telemetry), as a next-generation network quality analysis technology, achieves millisecond-level data collection through an active device “push” mode, accurately capturing network microbursts and anomalies. Among these, BDC (Buffer Drop Capture) and HDC (High Delay Capture) are core sub-solutions of INT technology, focusing on buffer packet loss and high latency monitoring, respectively.
- BDC enables users to record information about data packets dropped due to buffer capacity limitations.
- HDC allows users to record information about data packets experiencing high latency caused by queue congestion within devices.
Both BDC and HDC provide sampling mechanisms such as probabilistic capture and microburst capture, which can be applied to both BDC and HDC packet processing.
1.1 Functional Scenarios
BDC and HDC are essential technologies for data center network and AI computing network operations and troubleshooting.
Through BDC technology, when a data packet is dropped due to buffer capacity limitations, the switching device captures the first 150 bytes of the dropped packet and appends metadata, then sends it as a BDC packet to a remote collector or the local switch CPU.
Through HDC technology, the switching device captures all queue-congested packets exceeding the user-defined latency threshold, packages the first 150 bytes of the original packet along with metadata into an HDC packet, and sends it to a remote collector or the local switch CPU, while the original packet continues normal transmission.
1.2 Fundamental Concepts
1.2.1 BDC Packet Format
Figure 1: BDC Packet Format
- L2/IPv4
Users specify the outer Layer 2 and IPv4 headers in the BDC configuration. - GRE Header
Figure 2 shows the GRE Header packet format, with Table 1 describing each field.
Figure 2: GRE Header
|
Table 1: BDC GRE Header Information
- BDC Shim Header
Figure 3 shows the BDC Shim Header packet format, with Table 2 describing each field.
Figure 3: BDC Shim Header
Field | Length(bits) | Description |
Next Header | 8 | Indicates the next header. For Ethernet II, the value is 3. |
Length | 4 | Shim Header length in 4-byte units. For BDC, this value is 7 (i.e., 7×4 = 28 bytes). |
Switch ID | 16 | Identifies the device's Switch ID |
Extension Header | 6 | Type of extension header. For BDC, this value is 6. |
Sinfo* | 12 | Information about the port through which the packet entered the device. |
Dinfo* | 14 | Information about the destination port and queue where the packet was dropped. |
Dev Class | 6 | Unique device identifier encoding used to decode packet information. |
Queue Size Info* | 12 | Information about queue size. |
Table 2: BDC Shim Header Information
*Note: Decoding actual values from raw data depends on Dev Class.
1.2.2 HDC Packet Format
Figure 4: HDC Packet Format
- L2/IPv4
Users specify the outer Layer 2 and IPv4 headers in the HDC configuration. - GRE Header
Figure 5: GRE Header
|
Table 3: HDC GRE Header Information
- HDC Shim Header
Figure 6 shows the HDC Shim Header packet format, with Table 4 describing each field.
Figure 6: HDC Shim Header
|
Table 4: HDC Shim Header Information
*Note: Decoding actual values from raw data depends on Dev Class.
2 Operating Principles
Figure 7: BDC and HDC Operating Principle Diagram
2.1 Configuration Distribution
The green line segments in Figure 7 represent the BDC/HDC configuration distribution flow. First, BDC and HDC configuration information is distributed via CLI or APP to the Control APP in the AsterNOS system’s Telemetry container. It is then written to the syncd process SDK by the Innovium Shell in the Syncd container, passed through ioctl to the Innovium Driver, and finally reaches the ASIC via the PCIe channel where it takes effect.
2.2 Feature Triggering
BDC: When a packet is dropped due to buffer capacity limitations, the BDC feature triggers. The switching device adds an outer encapsulation to the first 150 bytes of the original packet and inserts a BDC header, sending it out as a BDC packet.
HDC: When a packet is forwarded through the switching device and the latency caused by queue congestion exceeds the user-defined threshold, the HDC feature triggers. The ASIC clones the first 150 bytes of the original packet, adds an outer encapsulation and inserts an HDC header, sending it out as an HDC packet, while the original packet continues normal transmission.
2.3 Packet Collection
Both BDC and HDC support two collection modes: remote collection and local collection, as shown by the yellow line segments in Figure 7.
- Remote Collection
Using the outer packet header information (L2/IPv4) configured by the user, the ASIC looks up forwarding entries to forward Telemetry Packets to the corresponding egress interface, ultimately reaching the server specified by the destination IP through network forwarding. - Local Collection
After BDC and HDC are triggered, the ASIC transmits packets to the control plane CPU via the PCIe channel. The Telemetry APP in the control plane Telemetry container reads packet information through a Unix Domain Socket.
3 Typical Application Scenarios
3.1 AI Computing Network Operations and Troubleshooting
In an AI training center, a GPU cluster exceeding one thousand cards is deployed for large-scale distributed model training. Multiple GPU servers within the cluster are interconnected via CX-N series switches, relying on high-performance networks to achieve inter-node data synchronization (such as All Reduce operations). At this point, low latency and zero packet loss characteristics are critical to training efficiency. BDC and HDC can play important roles in the following scenarios:
3.1.1 BDC: Buffer Overflow Alerting and Optimization
Figure 8: Multiple Servers Transmitting Traffic Simultaneously Through the Same Switch Port
During training, GPU servers transmit gradient data to switches at high frequency via the RoCEv2 protocol. When multiple servers simultaneously send data through the same switch port, the egress queue buffer may approach overflow due to instantaneous traffic spikes. At this point, BDC monitors the buffer size, QP, and other information of that port queue in real-time. If buffer overflow is detected, it immediately triggers an alert and records critical data such as node ID and queue number. Network operations personnel can quickly identify the problematic queue through the AsterNOS Exporter + Grafana visualization monitoring platform (as shown in Figures 9 and 10), and prevent significant data loss leading to All Reduce synchronization delays by adjusting buffer allocation strategies (such as increasing burst traffic handling capacity for that queue).
Figure 9: BDC Packet Information Example
Figure 10: HDC/BDC Traffic Statistics Example
3.1.2 HDC: Rapid High-Latency Node Localization
During a training session, the AI platform reported abnormally slow model convergence. HDC monitored switch forwarding latency and, combined with node ID information, identified elevated forwarding latency at a Leaf switch. Operations personnel quickly located the problematic node through the AsterNOS Exporter + Grafana visualization monitoring platform (as shown in Figures 11 and 12), and subsequently adjusted traffic forwarding paths. Additionally, HDC supports full-queue configuration, ensuring all queues that might impact training are covered, preventing overall performance degradation due to single-queue latency.
Figure 11: HDC Packet Information Example
Figure 12: HDC/BDC Traffic Statistics Example