Skip to main content

Case Study | Paratera × Asterfusion: Building a Future-Proof AI Inference Network

written by Asterfuison

May 15, 2025

As large AI models and inference workloads grow rapidly, enterprises demand networks with higher throughput, ultra-low latency, high availability, and scalable architecture.To meet these challenges, Paratera Technology partnered with Asterfusion to build a next-generation AI inference network—designed to be high-performance, scalable, cost-efficient, and easy to operate.

This collaboration not only improved Parallel’s AI performance and operations, but also demonstrated Asterfusion’s strength in AI networking technology.

📌 Project Background

Founded in 2007, Paratera has been deeply rooted in high-performance computing (HPC) and AI services for over 18 years, serving more than 10,000 enterprise clients across the world. In November 2023, Paratera was successfully listed on the Beijing Stock Exchange, becoming the first public company in the A-share market focused exclusively on computing power services. Leveraging its profound expertise in HPC and AI, Paratera has developed multiple full-stack intelligent computing platforms—Paratera AI Cloud, Paratera Smart Manufacturing Cloud, and Paratera Supercomputing Cloud—that are widely applied across cutting-edge fields such as artificial intelligence, intelligent manufacturing, life sciences, and earth sciences.

PARATERA

As AI models grow increasingly complex and inference workloads surge, Paratera sought to build a high-performance, low-latency, and highly available AI inference network from the ground up. To achieve this, they turned to Asterfusion, expecting an end-to-end solution that combined hardware and software innovation with engineering expertise.

🚧 Key Challenges & Asterfusion Solutions

To meet the demanding requirements of AI inference workloads, Paratera faced several critical challenges during the design of its next-generation computing network. Asterfusion provided comprehensive solutions tailored to each need:

1️⃣ High Bandwidth & Ultra-Low Latency : AI inference workloads are highly sensitive to data transfer delays, where even minor latency fluctuations can impact overall performance.

🔧 Asterfusion Solution: Deployed ultra-low latency 100G+ switches and enabled RoCE (RDMA over Converged Ethernet) to maximize RDMA data transmission efficiency.

2️⃣ Horizontal Scalability & Multi-Tenant Isolation : As business demands grow, the compute clusters must scale rapidly—while maintaining secure isolation across different user environments.

🔧 Asterfusion Solution: Adopted a VXLAN EVPN architecture to build a highly scalable Spine-Leaf fabric with seamless horizontal expansion and robust multi-tenant logical isolation.

3️⃣ High Availability & Failover Assurance : AI workloads are extremely sensitive to network interruptions. Zero packet loss and rapid recovery mechanisms are essential.

🔧 Asterfusion Solution: Deployed a EVPN + MC-LAG architecture with BFD (Bidirectional Forwarding Detection) protocol to ensure sub-second failover and uninterrupted connectivity.

4️⃣ Cost Control & Resource Efficiency : While high-performance equipment is essential, balancing performance and budget is critical for long-term sustainability.

🔧 Asterfusion Solution:

  • Server-Leaf: Deployed CX308P-48Y-N switches to ensure high-density access at optimized cost.
  • Border-Leaf: Utilized CX532P-N to balance high forwarding performance with rich edge features.

5️⃣ Operational Complexity at Scale :Large-scale AI networks involve extensive device counts and intricate topologies, making day-to-day monitoring and troubleshooting a major operational burden.

🔧 Asterfusion Solution: Introduced a visualized monitoring and intelligent alerting platform based on Prometheus + Grafana, enabling real-time global visibility and streamlined network operations. These capabilities provide full observability for lossless RoCE networks, greatly reducing troubleshooting difficulty and enhancing operational efficiency.

🧠 Network Architecture Highlights

Asterfusion AI Inference Network using CX-N Series
  • Topology: Spine-Leaf with distributed EVPN VXLAN gateways
  • Access Reliability: Dual uplinks per node using MC-LAG + Bond4
  • Underlay: OSPF-based Layer 3 IP routing
  • Overlay: VXLAN tunnels built using BGP on loopback interfaces
  • Monitoring: Full-fabric deployment of exporters integrated into Prometheus + Grafana

🔧 Key Devices Deployed:

  • Spine: 4 × CX564P-N
  • Server-Leaf: 62 × CX308P-48Y-N
  • Border-Leaf: 2 × CX532P-N

🌐 AI Inference Traffic Model

ScenarioPathProtocolUse Case
Node-to-node communicationServer-Leaf ⇌ Spine ⇌ Server-LeafTCP/IP or RoCEv2Distributed model training / inference
External accessServer-Leaf ⇌ Spine ⇌ Border-Leaf ⇌ FirewallTCP/IP (HTTP/HTTPS)Model downloads, cloud service access
Object storage accessServer-Leaf ⇌ Spine ⇌ Border-Leaf ⇌ StorageHTTP APIHandling unstructured data like images/videos
Block storage accessServer-Leaf ⇌ Storage NetworkiSCSI / RDMAHigh-frequency structured data I/O
Management trafficSeparate management networkSSH / SNMP / HTTPRemote monitoring and maintenance

🛠 Technical Challenges & Custom Optimization

Issue: ARP flooding induced CPU strain, threatening network stability.

Background: In VXLAN EVPN distributed gateway environments, it’s best practice to disable BUM traffic and enable ARP Proxy. However, in real-world AI scenarios, full suppression of broadcast traffic isn’t feasible due to:

  • IP conflicts triggered by automated provisioning
  • MAC address changes from VM live migration or scaling events
  • PXE-based provisioning, which relies on DHCP broadcasts

Asterfusion’s Tailored Solution:

  • Policy Refinement: Suppressed VXLAN BUM traffic from hitting the CPU to reduce load
  • Storm Control: Rate-limiting on downlink and peer-link ports to prevent flood storms
  • Visualized Monitoring Deployment: AsterNOS Exporter is enabled on both Leaf and Spine switches to integrate with the high-performance Prometheus monitoring platform, with all metrics ultimately visualized through Grafana for intuitive, user-friendly insights.
grafana-integrated-visualization-interface-1

🌟 Results & Value Delivered

With Asterfusion’s end-to-end network upgrade solution, Paratera successfully built a next-generation AI inference network that delivers outstanding performance, stability, and future scalability. This network not only supports their critical workloads in AI and scientific computing but also lays a solid foundation for scaling beyond 1000+ nodes.

💬 What Our Customer Says?

“Thanks to Asterfusion’s open networking platform and RoCE-optimized solution, we were able to deploy a high-performance, low-latency, and highly available AI inference network that’s also cost-effective and easy to manage. Their rapid response and deep technical expertise exceeded our expectations at every stage.” — Head of Network Infrastructure,  Paratera Technology.

Key Outcomes:

  • ✅ Zero downtime, significantly improved RDMA latency
  • ✅ Maximized bandwidth utilization and network efficiency
  • ✅ Greatly reduced O&M workload with faster issue detection
  • ✅ Achieved 70%+ cost savings compared to traditional InfiniBand solutions

Conclusion:

This project not only addressed key technical challenges in Paratera Tech’s AI network, but also demonstrated Asterfusion’s deep expertise in low-latency networking, scalable VXLAN architecture, and secure multi-tenant environments.

No buzzwords, no guesswork—just proven capabilities and real-world delivery.

Asterfusion is becoming the “network brain” behind more and more AI-driven enterprises.

Latest Posts