Asterfusion Easy RoCE: Enabling Lossless Ethernet with a Single Command Line
written by Asterfuison
Table of contents
RDMA (Remote Direct Memory Access) technology is a technology that allows memory data to be transferred directly between computers, bypassing the CPU or operating system. It frees up memory bandwidth and CPU and enables communication between nodes with lower latency and higher throughput. Currently, RDMA technology is widely used in HPC, AI workloads, storage and many other scenarios.
RoCEv1 is implemented based on the Ethernet link layer and ensures reliable transmission at the physical layer through flow control techniques on the switch. RoCEv2, which is implemented on top of the UDP layer, compensates for some of the limitations of InfiniBand and enables a wider range of RDMA applications.
Why RoCEv2 Needs Lossless Ethernet?
Compared to TCP protocol, UDP is faster and consumes less resources, but unlike TCP there are mechanisms such as sliding windows and acknowledgement responses to ensure reliable transmission. In a RoCEv2 network, if there is a packet loss, the NIC will discard any packets received, so the sender needs to retransmit all subsequent packets. Therefore, we usually use PFC (Priority Flow Control) and ECN (Explicit Congestion Notification).
Configuring those functions on Ethernet switches requires familiarity with QoS mechanisms, configuration logic, and relevant command lines. It may not be difficult for an engineer who has been configuring RoCEv2 networks for customers for a long time.
But for the technicians I know who are engaged in high-performance computing and storage, they usually concentrate on servers, and this “must-have” network configurations has caused them a lot of trouble. Even the engineers who run the IB network need to spend time learning about it.
Routine Deployment Steps on a SONiC Switch
When deploying a RoCEv2 network, be sure to confirm your network hardware conditions first (e.g., low-latency network switches that support PFC and ECN, NICs that support RoCEv2).Then, start configuring and debugging.
- Enabling and disabling: Configure the PFC and ECN separately.
- Troubleshooting or status checking: Usually, it is necessary to go to different command line views and execute the “show” command several times. To determine the current queue mapping, buffer, enabled queues, threshold, queue throughput, Pause and CNP triggers.
So, let’s take a quick look at the basics of configuring a RoCEv2 network with Asterfusion SONiC switches, in the normal way. First step, ensure that the server NIC is working in RoCEv2 mode, and configure PCP or DSCP and enable ECN for RDMA traffic.
# Setting up the NIC RDMA CM work mode
[root@server ~]# cma_roce_mode -d mlx5_0 -p 1 -m
#Set the NIC's priority type to DSCP
[root@server ~]# mlnx_qos -i enp1s0f0 --trust=dscp
DCBX mode: OS controlled
Priority trust state: dscp
#Enable PFC on queue 3
[root@server ~]# mlnx_qos -i enp1s0f0 -f 0,0,0,1,0,0,0,0
#Enable DCQCN on queue 3
[root@server ~]# echo 1 > /sys/class/net/enp1s0f0/ecn/roce_np/enable/3
[root@server ~]# echo 1 > /sys/class/net/enp1s0f0/ecn/roce_rp/enable/3
#Setting up CNP DSCP
[root@server ~]# echo 48 >
Then configure the switch ports to enable the PFC and ECN features and specify queues. You need to enable PFC and ECN on a designated queue (matching the server) on the Ethernet switch and adjust buffer and threshold.
sonic(config)# buffer-profile pg_lossless_100000_100m_profile
sonic(config-buffer-profile-pg_lossless_100000_100m_profile)# mode lossless dynamic -2 size 1518 xon 0 xoff 46496 xon-offset 13440
sonic(config-buffer-profile-pg_lossless_100000_100m_profile)# exit
sonic(config)# priority-flow-control enable 3
sonic(config)# priority-flow-control enable 4
sonic(config)# exit
sonic(config)# wred roce-ecn
sonic(config-wred-roce-ecn)# mode ecn gmin 15360 gmax 750000 gprobability 10
sonic(config-wred-roce-ecn)# exit
#Configure DSCP to COS mapping to ensure that CNP arrives in time when congestion occurs. (using queue 6)
sonic(config)# diffserv-map type ip-dscp roce-dmap
sonic(config-diffservmap-roce-dmap)# ip-dscp 48 cos 6
#Specify the queue to be enabled by class map
sonic(config)# class-map roce-cmap
sonic(config-cmap-roce-cmap)# match cos 3 4
sonic(config-cmap-roce-cmap)# exit
#Bind PFC, ECN, and queue mapping configurations to RoCE network policies
sonic(config)# policy-map roce-pmap
sonic(config-pmap-roce-pmap )# class roce-cmap
sonic(config-pmap-c)# wred roce-ecn
sonic(config-pmap-c)# priority-group-buffer pg_lossless_100000_100m_profile
sonic(config-pmap-c)# exit
sonic(config-pmap-roce-pmap )# set cos dscp diffserv roce-dmap
sonic(config-pmap-roce-pmap )# exit
#Enter interface view, and enable the configuration through the binding policy
sonic(config)# interface ethernet 0/0
sonic(config-if-0/120)# service-policy roce-pmap
Easy RoCE on AsterNOS: Multi-command Line Encapsulation and Templating
Asterfusion has launched the “EasyRoCE” feature on AsterNOS, which is optimized for the configuration and deployment of RoCE networks.
Enabling Lossless Ethernet with a Single Command Line
Troubleshooting or Status Checking
The Easy RoCE feature of AsterNOS supports the show roce command line for a one-stop view (global or interface view) of RoCE configurations and counts, as well as clearing all of them.
# check RoCE configurations
sonic# show qos roce
# View port-specific RoCE counts
sonic# show counters qos roce interface 0/0 queue 3
sonic# clear counters qos roce
Automation and Visibility
The above commands can help you quickly configure Lossless Ethernet, you can also change the default template provided by the device if you need to fine-tune the parameters. Additionally, this RoCE template supports downstreaming to devices via cloud OS platforms or AI platforms.
Based on the open architecture of AsterNOS, we have also developed a containerized roce_exporter that extracts device RoCE-related information and seamlessly interfaces with Prometheus to enhance network visibility.
Automation and Visibility
Configuring Video
This video demonstrates the complete process of how to quick configue ROCEv2 on Asterfusion SONiC data center switches:
Asterfusion RoCEv2 Data Center Network Solution
Your AI/ML computing cluster, hungering for speed and reliability, fueled by the power of 800G/400G/200G Ethernet, RoCEv2 technology, and intelligent load balancing. This isn’t just any connection – it’s a high-performance, low-TCO network powerhouse, courtesy of Asterfusion. No matter the size of your cluster, we’ve got you covered with our one-stop solution.
For more about Asterfusion AI network solution :https://cloudswit.ch/solution/rocev2-ai-solution-with-dgx-superpod/
Related Products (All CX-N series switch products from 100G-800G support RoCE and EasyRoCE)