Skip to main content

In-band Network Telemetry based Routing: The Intelligent Routing for the AI Era

written by Asterfuison

April 21, 2025

In-band Network Telemetry based Routing (INT-Based Routing), developed by Asterfusion, offers a smarter alternative. It uses real-time telemetry to dynamically adapt routing paths based on live traffic and network conditions—enabling ultra-efficient, low-latency communication purpose-built for AI data centers.

Why Traditional Routing Is Obsolete in the AI Era

As AI technology accelerates, the Internet is undergoing a profound transformation. Routing—once considered a core infrastructure component—is inevitably being swept into this wave of change.

AI Traffic Challenge

At the heart of this transformation is a step-function shift in network traffic patterns triggered by AI:

  1. Dramatically Diverse Traffic Patterns
    In AI data centers, a wide variety of flows now coexist—from latency-sensitive “mice” flows to bandwidth-hungry “elephant” flows—unlike anything traditional networks were designed to handle.
  2. Extreme Traffic Volatility
    AI workloads, especially parallel computing tasks, generate highly volatile traffic patterns. A single training epoch can generate traffic equivalent to 2–3 days of internet-wide activity, while a typical inference task may involve over 20,000 messages per second.
  3. Unprecedented Network Congestion
    Intense, bursty traffic leads to severe congestion—especially in-cast congestion—pushing existing networking technologies to their limits and becoming a key bottleneck for AI scalability.
  4. Rapidly Evolving Applications
    AI models are iterating at breakneck speed, with new AI agents and applications emerging constantly, placing new and unpredictable demands on the network.
  5. Emergence of New Traffic Forwarding Paradigms
    To cope with these new challenges, traditional flow-based forwarding approaches like ECMP and congestion control are being replaced by new mechanisms such as flowlet switching and packet spraying.

Routing Must Evolve

To meet these unprecedented demands, routing—the core of the network control plane—must evolve. From early static rules to modern intelligent systems, the evolution of routing protocols can be broadly divided into four phases:

Phase 1: Static Routing (1960s–1970s)
Routing tables were manually configured, suitable only for small-scale networks like ARPANET. No ability to adapt to topology changes.

Phase 2: Dynamic Routing (from 1989)
With the introduction of OSPF and BGPv1, routing protocols gained the ability to adapt to changing topologies using metrics like link bandwidth (OSPF) or AS PATH (BGP). Over time, policy control and load balancing features were added.

Phase 3: SDN-based Routing (post-2008)
Traditional routing struggled to keep up with fast-changing traffic. Software-Defined Networking (SDN) emerged, running on centralized controllers to provide global visibility and traffic steering. However, the decoupling of centralized controllers from the switches introduced delays in reacting to real-time traffic changes, limiting SDN’s ability to fully replace traditional dynamic routing.

Phase 4: Hybrid Routing with Controller Collaboration (post-2012)
To support multi-tenant routing within data centers, BGP EVPN overlay was introduced. Later, Segment Routing (SR) provided more flexible and resilient routing. These technologies integrate tightly with controllers, enabling fine-grained traffic engineering—e.g., BGP EVPN with cloud orchestrators for virtual network provisioning, or SR with network controllers for programmable traffic flows.

Toward Intelligence and Adaptivity

The trajectory of routing technology is clearly traffic-driven but constrained by the network’s observability and computational capacity. From static routing to topology-aware, and now toward traffic-aware routing, the field is steadily progressing toward greater intelligence and automation—precisely what’s needed in the AI era.

INT Based Routing: Intelligent Networking Engine for AI-Driven Data Centers

Can enhanced network awareness and computation capabilities solve the traffic scheduling challenges of the AI era?

The answer is yes. This is where INT-Based Routing (In-band Network Telemetry-based Routing), developed by Asterfusion, comes into play. As a new generation of dynamic routing technology, INT-Based Routing not only detects changes in network topology, but also adapts to dynamic variations in traffic and device load. It represents a truly intelligent and fully dynamic routing approach.

What is In-band Network Telemetry based Routing?

In-band Network Telemetry (INT) marks a major milestone in the evolution of self-aware and self-optimizing networks. Introduced by Barefoot Networks in 2014 and built upon P4 programmable data planes and telemetry-driven architectures, INT has gained traction in large-scale data centers thanks to growing P4 ecosystem support and ASIC vendor adoption.

Curious about how INT works? Learn more here: 👉 What is INT (In-Band Network Telemetry)?

Key Advantages of INT-Based Routing for Modern Networks

Real-Time Traffic Awareness

Unlike traditional network measurement technologies, INT features:

  • Self-recording: It embeds metadata into actual service packets to capture key metrics at each hop, reducing measurement bias.
  • Real-time granularity: INT enables per-packet telemetry, achieving microsecond-level intervals, and with PTP (Precision Time Protocol), even 10ns-level precision.
  • Rich metadata: INT metadata includes fields such as Node ID, Interface ID, Timestamp, Hop Latency, Queue Depth, Buffer Occupancy, and Egress Interface Tx Utilization.

To enable these capabilities, INT requires implementation via ASICs, DPUs, or server-grade CPUs. Among mainstream switch ASICs, Marvell’s Teralynx provides comprehensive P4-INT support and advanced telemetry capabilities. Broadcom’s Trident family supports similar features via IFA 2.0, while NVIDIA’s Spectrum series leverages WJH (What Just Happened) for deep visibility and diagnostics.

In summary, INT revolutionizes network observability by embedding state awareness into packets themselves—a leap from passive monitoring to active network self-awareness.

Fine-Grained Traffic Scheduling

In traditional networks, routing was performed at the granularity of network prefixes. Later innovations introduced flow-level scheduling, based on transport-layer sessions (e.g., IP 5-tuple). Applications assumed all packets in a flow would follow the same path and remain in order, making traditional TCP flow control effective.

However, “flow” granularity is insufficient for modern AI workloads and persistent sessions (e.g., video, storage, AI training). Two newer techniques address this:

  • Packet spray: Distributes packets from a single flow across multiple paths, potentially causing out-of-order delivery, which must be corrected at the receiver with protocol modifications.
  • Flowlet switching: Divides flows into bursts (flowlets) separated by idle gaps, enabling multi-path routing without reordering, and requires no changes at the transport layer.

As switch and NIC compute capacity increases, more refined scheduling becomes feasible. But who decides which packets or flowlets go where?

Telemetry-Driven Intelligent Routing

Given the high volume and rate of flowlets or packets, manual policies are infeasible. SDN-based centralized control also falls short—the controller can’t react fast enough to volatile traffic and device loads. SmartNiC-based scheduling on hosts is similarly limited by lack of global topology awareness and coordination with the network fabric.

In contrast, modern switches with INT capabilities have a comprehensive view of topology, traffic, and device states. These data points are fed into the switch’s brain—its NOS (Network Operating System)—where increasingly powerful control CPUs/DPUs compute optimal scheduling in real time. Though distributed, this computation achieves global optimization, as switches share telemetry and route state via dynamic routing protocols.

AsterNOS, for example, combines OSPF, BGP, and INT to calculate multiple paths between any two endpoints. Each path’s cost is derived from INT-based metrics like latency and congestion. OSPF captures detailed link-level topology, BGP captures AS-level topology, and INT completes the picture with real-time in-path load data.

In-band Network Telemetry based Routing

Take a typical spine-leaf topology: Server0 and Server1 are connected to separate leaf switches, which share four paths. SmartNICs can’t see all these paths, and OSPF alone treats them as equal-cost paths. But with INT, the leaf switch sees differing latencies across paths and can apply adaptive strategies like minimum-latency routing or WCMP (Weighted Cost Multi-Path), improving throughput, reducing tail latency, and maximizing utilization.

INT-Based Routing supports both packet spray and flowlet-based scheduling. Coupled with OSPF/BGP topology discovery, it applies to arbitrary network topologies.

Compared to traditional ECMP, INT-Based Routing improves utilization to over 90%, increases throughput by 20–45%, and reduces P99 tail latency by over 50% — significantly shortening AI job completion time (JCT).

Intelligent Routing Drives Smart Switch Evolution

AI has shown us that more efficient and scalable distributed computing can reshape the world. The same principle applies to networking. When switches are equipped to perform distributed, real-time computation on topology, traffic, and device load, they can dramatically improve network performance. To realize this, network hardware must evolve—not only to deliver high-performance packet forwarding, but also to integrate powerful computing capabilities. This is the core philosophy behind Asterfusion’s new generation of Smart Switches.

A Smart Switch is built upon three key components: a programmable ASIC data plane, a DPU-enabled control plane, and a high-speed channel between control planes.

A Smart Switch is built upon: a programmable ASIC data plane, a DPU-enabled control plane, and a high-speed channel between control planes.

For example, the Asterfusion CX864E-N leverages Marvell’s Teralynx 10 programmable ASIC, supporting advanced features such as Flowlet, P4-INT, WCMP, PTP, and multicast replication. The control plane runs on a server-grade Intel Xeon processor and integrates DPU technologies like ePBF, DPDK, and VPP within AsterNOS. This enables millisecond-level awareness and on-the-fly scheduling computation. Furthermore, the switch can be enhanced with AI acceleration modules via M.2 interfaces to perform traffic analytics and predictive scheduling. Between the data and control planes, DMA and high-speed Ethernet channels are used to ensure tightly coupled, high-performance collaboration.

The Asterfusion CX306P-N, a data center leaf switch, uses a Marvell Falcon programmable ASIC paired with a Marvell OCTEON 10 DPU. These components are interconnected via dual 100G Ethernet links. Under the orchestration of AsterNOS and VPP, this switch supports INT-Based Routing alongside next-generation AIDC features such as centralized vRouter and vFirewall.

In-band Network Telemetry -based Routing is the “savior” in the AI ​​era

In essence, the Smart Switch represents a structural leap toward intelligent networking. It eliminates reliance on host-based SmartNICs or centralized controllers by embedding real-time sensing and adaptive scheduling directly into the network’s core physical layer—the switch itself. The result is a network that functions as a distributed computing platform, with self-awareness and self-optimization capabilities that adapt to millisecond-scale traffic fluctuations. This evolution is key to meeting the demands of the AI era.

Built on this foundation, INT-Based Routing emerges as a natural extension—pushing the network control plane further toward intelligent, autonomous operation. It represents a new paradigm in routing technology, boosting AIDC network utilization above 90% and unlocking the full potential of AI cluster compute resources. In many ways, INT-Based Routing is purpose-built for the AI era.

For more information on how INT-based routing has increased network utilization to over 90%. Learn here: 👉The AI Data Center Revolution: Can Ultra Ethernet Unlock 90%+ Network Utilization?

Latest Posts