Skip to main content

On-site Test Result: Asterfusion RoCEv2-enabled SONiC Switches Vs. InfiniBand

written by Asterfuison

January 26, 2024

In early deployments of AI and ML, NVIDIA’s InfiniBand networking solution was the preferred choice because of its excellent low network latency. Over time, however, networking vendors and hyperscalers are actively exploring Ethernet alternatives to cater to the demands of rapidly expanding AI workloads. Currently, RoCEv2 replacing InfiniBand to carry RDMA traffic has become a common solution in the industry. Let’s explore the following article!

What is InfiniBand?

InfiniBand is a network communication protocol that facilitates the movement of data and messages by creating a dedicated, protected channel directly between nodes through a switch. RDMA and send/receive offloads managed and performed by InfiniBand adapters. One end of the adapter connects to the CPU via PCIe, and the other end connects to the InfiniBand subnet via an InfiniBand network port. This provides significant advantages over other network communication protocols, including higher bandwidth, lower latency, and enhanced scalability.

Challenges of InfiniBand Network Deployments

  • Vendor lock-in: Only one vendor has mature IB products & solutions, which are expensive.
  • Compatibility: InfiniBand uses a unique protocol, not TCP/IP; Specialized network, separate cabling and switches
  • Availability: IB switches generally have a long delivery time
  • Service: O&M depends on the manufacturer, which makes it difficult to locate faults and takes a long time to solve problems
  • Expansion and upgrading: Depends on the progress of the vendor’s product releases

What is RoCE and RoCEv2?

Among these alternatives, RDMA (Remote Direct Memory Access) emerges as a groundbreaking technology that enhances data throughput and reduces latency significantly. It initially deployed on the InfiniBand, but has now been successfully integrated into Ethernet infrastructure.

Currently, the prevailing solution for high-performance networks involves constructing an RDMA-supported network based on the RoCEv2 (RDMA over Converged Ethernet version 2) protocol. This innovative approach makes use of two crucial technologies, Priority Flow Control (PFC) and Explicit Congestion Notification (ECN). The utilization of this technology offers several advantages, including cost-effectiveness, impeccable scalability, and the elimination of vendor lock-in.

For more about ROCE please read: https://cloudswit.ch/blogs/roce-rdma-over-converged-ethernet-for-high-efficiency-network-performance/#what-is-ro-ce%EF%BC%9F

Next, we’ll take a look at how Asterfusion ROCEv2 low latency SONiC switches perform against InfiniBand switches or other Ethernet switches in AIGC, distributed storage and HPC network.

RoCEv2 Switches Test Result in AIGC Network

We built an AIGC network test environment using the Asterfusion CX664D-N Ultra Low Latency Switch and another brand RoCEv2 supported switch(Switch A) to compare NCCL performance.

NCCL is short for NVIDIA Collective Communications Library. It is an open source tool from NVIDIA for testing collection communication. It can be used to test whether the collection communication is normal, and to stress test the collection communication rate.

https://github.com/NVIDIA/nccl-tests

https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/overview.html

ItemModelSpecsQuantityRemark
SwitchAsterfusionCX664D-N (64-port 200G)
Switch A (48-port 200G,as a test control)
\11
ServerX86_64CPU: Intel Xeon Silver 4214Memory: 256G2Need 200G NIC and corresponding driver
Optical Module100GQSFP284\
FiberMulti-mode\2\
NIC\MT28800 Family CX-5 Lx2Switch to Ethernet Mode
OSUbuntu20.04\\Kernel version > 5.10
Mellanox DriverMLNX_OFED-5.0\\NIC driver adapted to host kernel
Switch OSAsterNOS SONiC 201911.R0314P02\\\
NCCLV2.13\\\
OpenMPIV4.0.3\\\

NCCL-Test(Bandwidth & Latency)

Layer 2 network, RoCEv2 enabled on CX664P-N Switch, GPU server NICs optimized for RDMA.

Bandwidth (GB/s)GPU0GPU1GPU2GPU3GPU4GPU5GPU6GPU7
Switch A GPU1-25.396.6111.317.222.012.451.772.28
Switch A GPU3-45.977.1711.217.762.42.812.052.66
Asterfusion GPU1-26.027.2211.247.82.392.851.872.66
Latency (μs)GPU0GPU1GPU2GPU3GPU4GPU5GPU6GPU7
Switch A GPU1-222.0421.8121.8422.3122.9322.4222.922.75
Switch A GPU3-421.6321.6521.9821.7122.4622.5123.123.05
Asterfusion GPU1-222.5721.7921.7821.6222.7922.6323.0123.16
Mellanox CX-5
(Direct connection)
CX664D-NSwitch A
Latency (ns)1400ns480ns580ns
Bandwidth (Gb/s)99.498.1498.14

More about Asterfusion AIGC Networking Solutions: Cost Effective AIGC Network Solution by Asterfusion ROCEv2 Ready Switch

RoCEv2 Switches Test Result in HPC Network

We build an HPC test network using an Asterfusion CX564P-N ultra-low latency switch and a Mellanox InifiBand switch to compare their performance.

ItemModelSpecsQuantityRemark
SwitchAsterfusionCX564P-N(64-port 100G)\1
ServerX86_64CPU: Intel Xeon Gold 6348Memory: 512G2\
Optical Module100GQSFP286\
FiberMulti-mode\3\
NICMCX455A-ECA_AXMellanox CX-4 Lx6Switch to Ethernet Mode
OSCentOS 7.9.2009\\Kernel Version > 3.10.0-957.el7
Mellanox NIC DriverMLNX_OFED-5.7\\\
Switch OSAsterNOS (SONiC201911.R0314P02)\\\
Kernel3.10.0-1160.el7.x86_64\\\
WRFV4.0\\\

Test Result

Layer 2 network, RoCEv2 enabled on CX532P-N, HPC node (Mellanox NICs) optimized for RDMA

E2E forwarding performance test

E2E LatencySwitch ModelT1T2T3AVG
AsterfusionCX564N-P1580ns1570ns1570ns1573.3ns
MellanoxIB Switch1210ns1220ns1220ns1213.3ns

HPC application performance test (WRF)

WRF/sT1T2T3AVG
Single Device 24 core1063.221066.621066.391065.41
Dual Device 24 core NIC direct connection1106.951110.731107.311108.33
Dual Device 24 core RoCE1117.211114.321114.341115.29
Dual Device 24 core IB1108.351110.441110.551109.78

RoCEv2 Switches Test Result in Distributed Storage Network

We built a storage network using Asterfusion CX532P-N ultra-low-latency switches and performed stress tests. The tests use FIO tools to complete 4K/1M read/write tests and record storage latency, bandwidth, IOPS, and other data.

ItemModelSpecsQuantityRemark
SwitchAsterfusionCX532P-N\132-port 100G
ServerX86_64CPU: AMD EPYC 7402P*2
Memory: 256G
2Need 100G NIC and corresponding driver
Optical Module100GQSFP284\
FiberMulti-mode\2\
NICMCX455A-ECA_AXMellanox CX-4 Lx2Switch to Ethernet Mode
OSCentOS 7.9.2009\\Kernel version > 3.10.0-957.el7
Mellanox DriverMLNX_OFED-5.7\\\
Switch OSSONiC201911.R0314P02\\\
FIOV3.19\\\
Kernel3.10.0-1160.el7.x86_64\\\

Test Result

1M RandRead1M RandWrite1M SeqRead1M SeqWrite
Bandwidth20.4GB/s6475MB/s14.6GB/s6455MB/s
IOPS20.9K646520.2K6470
Latency3060.02us9881.59us4249.79us9912.59us
4K RandRead4K RandWrite4K SeqRead4K SeqWrite
Bandwidth3110MB/s365MB/s656MB/s376MB/s
IOPS775K101K173K105K
Latency79.59us683.87us380.54us664.06us

More about Asterfusion Distributed Storage Network Solutions(NetAPP MetroCluster IP Compatible):https://cloudswit.ch/blogs/boost-netapp-metrocluster-ip-storage-with-asterfusion-data-center-switch/

Conclusion: Asterfusion RoCEv2 low latency SONiC switches are fully capable of replacing IB switches

Asterfusion low latency network solutions offer the following advantages:

  • Low cost: Less than a half of the cost of an IB switch
  • Availability: Asterfusion has hundreds of low-latency switches with a lead time of 2-4 weeks.
  • After-sales service: Professional, patient, reliable team, to provide 24-hour remote online technical support.
  • Expansion and upgrading: Based on AsterNOS (Asterfusion enterprise ready SONiC) to support flexible functionality expansion and online upgrade

We have summarised more test data from our customers in different industries, so if you are interested, please visit our website and download it yourself. https://help.cloudswit.ch/portal/en/home

Latest Posts