AI Inference Network

An all-to-all network with ultra-low latency and congestion-free performance, optimized for MoE architecture-based reasoning large models (such as DeepSeek), drives the Token Generation Rate and revenue to surge by over 27.5%.

All-to-all

Ultra-low Latency

Congestion-Free

Network

↑ 27.5%

Token Generate Rate

↑ 27.5%

Revenue

The Value of AI Inference Network

AI Inference Network Topology

Click to enlarge

Improve Token Generation Rate
Token Generation Rate (TGR) increases steadily with the number of concurrent users. When reaching 100 users, TGR improves by 27.5% compared to InfiniBand.
Reduce Inference Latency
By minimizing leaf-to-leaf network forwarding latency, the average inference time per token is reduced by 20.4% compared to InfiniBand.
Congestion Free

INT-driven adaptive routing, packet spray, flowlet-based auto load balancing, and WCMP work together to effectively prevent congestion across the fabric.
Improve GPU & Network Utilization
With the above technologies, the network achieves up to 97% utilization, directly enhancing GPU utilization during parallel computing workloads.

Traffic

During MoE model inference, traffic types such as All-to-All, Prefill-Decode Disaggregation, and Pipeline Parallelism are intelligently distinguished and routed using In-band-Network-Telemetry-driven, topology-aware optimization, which incorporates differentiated QoS priorities and ECMP group selection to achieve a balance between high throughput and low latency — ultimately enhancing network utilization.

AI Inference Network Topo-Aware-All-to-All-Pipelined-Round

Top-Aware All-to-All

Click to enlarge

Prefill-Decode-Disaggregation

Click to enlarge

Test Result

Two servers are used as AI computing nodes to perform inference on the DeepSeek-R1 671B model. Each server is equipped with 8 x NVIDIA H20 GPU cards and 4 x 400G CX-7 network cards, and is connected to an AI switch of the Asterfusion CX864E-N model to form a computing power network. Test metrics such as the latency of each inference and TGR. And conduct a comparative test under the same conditions with an IB switch (QM9700).

Under different concurrent inference request scenarios (ranging from 20 to 100), the inference latency when using Asterfusion’s RoCE switch is consistently lower than that when using an InfiniBand (IB) switch. Specifically, with 50 concurrent requests, the 90th percentile inference latency is reduced by 20.4%.

For 20–100 concurrent inference requests, Asterfusion’s RoCE switch consistently delivers a higher Token Generation Rate (TGR) than InfiniBand (IB) switches. The growth margin widens with more concurrent requests, reaching a 27.5% TGR improvement at 100 requests.

AI Inference Network average-first-token-latency

Related Products

Low Latency Data Center Switch

Campus Access & Aggregation

Wireless Access Point

OpenWiFi Network Controller

Enterprise Ready SONiC NOS

Optical Transceiver

Marvell OCTEON Platform

Network Packet Broker

P4-Programmable Switch

AI Networking

AI Inference Network

AI Training Network

All-to-all

Congestion-Free

↑ 27.5%

↑ 27.5%

The Value of AI Inference Network

AI Inference Network Topology

Traffic

Top-Aware All-to-All

Prefill-Decode-Disaggregation

Test Result

Related Products

800GbE Switch with 64x OSFP Ports, 51.2Tbps, Enterprise SONiC Ready

32-Port 400G QSFP-DD Data Center Switch for AI/ML Enterprise SONiC Ready

400G/800G Optical Transcivers

Low Latency Data Center Switch

Campus Access & Aggregation

Wireless Access Point

OpenWiFi Network Controller

Enterprise Ready SONiC NOS

Optical Transceiver

Marvell OCTEON Platform

Network Packet Broker

P4-Programmable Switch

AI Networking

Data Center

Enterprise

Carrier Network

Blogs

Whitepaper

AI Inference Network

AI Training Network

All-to-all

Congestion-Free

↑ 27.5%

↑ 27.5%

The Value of AI Inference Network

AI Inference Network Topology

Traffic

Top-Aware All-to-All

Prefill-Decode-Disaggregation

Test Result

Related Products

800GbE Switch with 64x OSFP Ports, 51.2Tbps, Enterprise SONiC Ready

32-Port 400G QSFP-DD Data Center Switch for AI/ML Enterprise SONiC Ready

400G/800G Optical Transcivers