Skip to main content

RDMA vs. TCP/IP: The Ultimate Guide for AIDC

written by Asterfuison

November 3, 2025

Introducion

Wondering why we’re pitting RDMA against TCP/IP? If you’ve dug into data center tech, you’ve likely seen RDMA crop up constantly—but wrapping your head around it alone is no easy feat. TCP/IP, the foundational network protocol we all know, has deep connections with RDMA. Comparing them side by side makes RDMA much more accessible.

Let’s start with their common ground: both RDMA and TCPIP are low-level data transport layer technologies. Their core mission is the same—moving data for upper-layer applications like SMB or databases. Think of them as two distinct tools for the same job—that’s the premise for this head-to-head comparison.

Next, we’ll approach RDMA from a TCP/IP perspective. By breaking down their differences and connections, we’ll show you exactly why RDMA has become such a hit in the industry.

Ⅰ. What are RDMA and TCPIP?

To understand modern data center networking, a comparative look at RDMA and TCPIP is essential.

RDMA (Remote Direct Memory Access) is a high-performance networking technology. It enables direct memory read/write between hosts without CPU involvement.

It’s ideal for scenarios needing fast transfer speeds and low latency—like AI training, where compute nodes exchange large training datasets frequently. RDMA delivers low-latency, high-bandwidth transfer to speed up model training.

RDMA relies on three core protocols: InfiniBand (IB), RDMA over Converged Ethernet (RoCE), and Internet Wide Area RDMA Protocol (iWARP).

  • InfiniBand (IB): A native protocol built specifically for RDMA. It’s the go-to choice for high-performance computing (HPC) and data centers.
  • RoCE (RDMA over Converged Ethernet): An Ethernet-based RDMA protocol, split into RoCE v1 (Layer 2) and RoCE v2 (Layer 3). It offers stronger compatibility with existing Ethernet infrastructures.
  • iWARP (Internet Wide Area RDMA Protocol): A TCP/IP-based RDMA protocol. It supports wide area network (WAN) deployments and offers flexible implementation.

The table below highlights the key differences between these three foundational RDMA protocols:

FeatureInfiniBandRoCE v2iWARP
Underlying NetworkDedicated InfiniBand networkStandard EthernetStandard Ethernet
Protocol StackNative IB protocol stackUDP/IP over EthernetTCP/IP over Ethernet
RoutabilityYes (via IB Subnet Manager)Yes (IP-based)Yes (TCP/IP-based)
PerformanceHighest (lowest latency)Very high (close to IB)High (latency higher than the above)
Network RequirementsDedicated IB switching equipmentLossless Ethernet (PFC + ECN required)Standard TCP/IP network
Deployment CostHigh (new network deployment needed)Medium (leverages existing Ethernet)Medium (leverages existing Ethernet)
Key AdvantagesExtreme performance, stabilityHigh performance, Ethernet compatibilityIP network versatility, easy deployment

Since TCP/IP is widely understood, we won’t dive deep into the TCP/IP protocol suite here.

Ⅱ. How They Actually Work

When comparing RDMA and TCPIP, the fundamental difference lies in their data paths:

  • TCP/UDP Traffic: Takes the long way through the kernel. Your data goes through the entire network stack, with the CPU handling every step—encapsulation, segmentation, buffering. It works, but it’s inefficient for high-throughput workloads.
  • RoCE v2 Traffic: Cuts the line. RDMA traffic gets wrapped in UDP/IP packets (with a dedicated port) and goes straight from user space to the RNIC. The network card handles everything, freeing up the CPU and slashing latency.

Check the diagram below: The left side shows the traditional TCP/UDP transfer path, while the right side illustrates RDMA’s path. It clearly demonstrates how RDMA accelerates data transmission.

RDMA-and-TCPIP-1

Ⅲ. RDMA in AI Data Centers

TCP/IP is hitting a wall in AI workloads. All that data copying and context switching burns CPU cycles that should be doing math. The result? Longer training times and potential convergence issues. This bottleneck is precisely where the RDMA and TCPIP comparison becomes critical for AI infrastructure.

The competition between RDMA and TCPIP in AIDC comes down to this: RDMA bypasses the TCP/IP stack entirely, cutting down on data copy and context switch overhead to boost transfer efficiency. In large model training, this means faster data exchange between compute nodes, less waiting, and accelerated training speeds.

For Ethernet deployments, RDMA primarily runs via RoCE v1 and RoCE v2. RoCE v1 is a Layer 2 protocol that enables direct access between hosts in the same broadcast domain, but its reach is limited to L2 subnets. RoCE v2, by contrast, is a Layer 3 protocol that integrates IP for scalability. It supports Layer 3 routing of packets, making it ideal for large-scale AIDC environments where compute nodes across different regions need to communicate seamlessly.

Asterfusion’s CX-N series switches are built for this. With 560ns cut-through latency on 800G ports (thanks to the Marvell Teralynx 10 chip and its 200MB on-chip buffer), they’re designed from the ground up for RDMA workloads.

RDMA-and-TCPIP-2

By supporting RDMA and RoCE v2, Asterfusion delivers a high-efficiency, stable data transfer solution for AIDC scenarios—empowering the rapid growth of AI applications.

Ⅳ. Beyond Hype: Where RDMA Delivers Real Value

TCP/IP gave us the modern internet—HTTP for the web, HTTPS for security. These application-layer protocols built on TCP/IP are everywhere.

Similarly, RDMA has fostered its own set of application technologies, playing key roles across industries. A prime example is SMB (Server Message Block) Direct. SMB, a network file-sharing protocol widely used in Windows environments, enables file and printer sharing. In enterprise data centers, for instance, employees frequently access shared files for collaborative work—SMB Direct speeds up file reads and writes, cutting wait times and boosting productivity.

RDMA is equally vital in distributed storage systems. With data needing fast, reliable transfer between storage nodes, RDMA delivers low-latency performance to enhance read/write speeds. Ceph, a popular distributed storage system, supports RDMA to enable fast data synchronization and read/write operations between nodes, delivering high-efficiency storage services.

Ⅴ. Why Asterfusion Bets on RoCE

RoCE wins on operational simplicity. Every network engineer already knows Ethernet. Deploying RoCE doesn’t require learning a new fabric technology—it’s an extension of existing skills. InfiniBand, while powerful, introduces operational overhead that most enterprises would rather avoid.This operational ease is a key advantage when evaluating RDMA and TCPIP solutions for the data center..

Asterfusion recognized RoCE’s immense potential and integrated it maturely into product development, resulting in the CX-N series switches.

The Asterfusion CX-N series fully supports RoCE technology, especially RoCE v2, delivering a high-efficiency, stable data transfer solution for enterprises. Boasting industry-leading ultra-low latency, these switches meet the low-latency network demands of three key high-performance computing (HPC) scenarios. In high-frequency trading (HFT) in the financial industry—where real-time transaction performance is critical—the CX-N series’ ultra-low latency ensures fast, accurate transmission of trading orders, helping financial institutions gain a competitive edge.

The CX-N series also comes with rich data center features, including VXLAN, BGP EVPN, and INT-Based Routing, addressing enterprises’ diverse network needs. In cloud computing data centers, where efficient communication and flexible migration of virtual machines are essential, these features provide robust network support for cloud computing, ensuring stable operation of cloud services.

Ⅵ. Quick Questions

Here are quick answers to common questions about RDMA and TCPIP:

  1. What does RDMA stand for?

RDMA is short for Remote Direct Memory Access. It’s an extension of Direct Memory Access (DMA) technology, enabling computers to read from or write to another host’s memory directly—no CPU involvement required—for high-efficiency data transfer.

  1. What is the difference between RDMA and RoCE?

RDMA is a technical concept focused on enabling direct remote memory access. It cuts down on CPU involvement and data transfer latency to boost transmission efficiency. RoCE—short for RDMA over Converged Ethernet—is a way to implement RDMA over Ethernet, essentially a practical deployment method. In short, RoCE is a specific implementation of RDMA, letting enterprises leverage existing Ethernet infrastructure to deploy RDMA at a lower cost.

  1. What is the difference between SMB and RDMA?

TCP/IP and RDMA are peer-level communication technologies operating at the transport/network layer. RDMA SMB (SMB Direct) is a combined solution of an application-layer protocol (SMB) and a transport technology (RDMA)—it sits at a different layer than the two.

SMB (Server Message Block) is an application-layer protocol, commonly used for file and printer sharing in Windows systems.

SMB Direct (RDMA SMB) leverages RDMA’s high-speed, low-latency capabilities to boost SMB’s efficiency in file sharing and data transfer. Unlike TCP/IP and RDMA, it doesn’t belong to the transport/network layer.

  1. Should I enable RDMA?

If your network adapter supports RDMA, enabling it is beneficial in most cases. RDMA-capable adapters can run at full speed without impacting CPU utilization, while delivering lower latency. For example, in virtualized environments, RDMA reduces data transfer delays between VMs and boosts overall virtualization platform performance.

That said, whether to enable RDMA depends on factors like hardware compatibility, application requirements, and network configuration complexity.

  1. Does RDMA use TCP?

Not all RDMA implementations use TCP.

RDMA has three key implementations:

  • iWARP leverages TCP’s reliability to ensure stable data transmission.
  • InfiniBand features a standalone protocol stack—fully redesigned from physical to link and transport layers—optimized for high-speed, low-latency transfer.
  • RoCE is Ethernet-based: RoCE v1 adopts IB specifications at the network layer, while RoCE v2 uses UDP+IP for network and transport layers to enable packet routing.
  1. What protocol does RDMA use?

RDMA isn’t a single protocol—it’s a class of mechanisms that rely on multiple protocols for implementation. The main ones include InfiniBand (IB), RDMA over Converged Ethernet (RoCE), and Internet Wide Area RDMA Protocol (iWARP).

InfiniBand comes with a standalone protocol stack, covering the physical, link, and transport layers.

RoCE is Ethernet-based: RoCE v1 operates at the link layer, while RoCE v2 uses UDP+IP for the network and transport layers.

iWARP is built on the TCP/IP protocol, enabling RDMA deployment over standard Ethernet infrastructure.

Contact US !

Latest Posts