Skip to main content

BGP EVPN VXLAN: Unified Spine-Leaf Network Topology for Campus and Data Center Based on SONiC

written by Asterfuison

January 1, 2026

Introduction

In Asterfusion’s technological vision, network architecture should not be artificially segmented due to differing use cases. We aim to use the Spine-Leaf topology based on the open SONiC network as the cornerstone for building all connections. This unified architecture, extending from data centers (DC) to campuses (Campus), represents not only physical synchronization but also a consistent underlying communication experience.

Based on the unified Spine-Leaf architecture, we have successfully brought core technologies, originally exclusive to data centers, into campus environments:

  • BGP EVPN VXLAN Popularization: Leveraging SONiC’s openness, we have implemented Overlay network construction based on standard protocols within campus environments. This means that the campus network can achieve high business flexibility, multi-tenant isolation, and Layer 2 interconnectivity across Layer 3 boundaries, similar to data centers.
  • From Fragmentation to Integration: The unified technology stack not only reduces overall operational complexity for enterprises but also allows for the standardized scheduling of network resources across different environments.

The main purpose of this article is to introduce the technologies used in our Spine-Leaf network topology for both campus and data center environments. Each section will have corresponding articles for a more detailed explanation.

Three Aspects of Unification Based on SONiC Spine-Leaf Network Topology

Unified Operating System (One OS):

Whether at the core or the edge, all devices run the enterprise-grade SONiC, ensuring consistent operational logic across the network.

More about Enterprise SONiC – AsterNOS: Enterprise Ready SONiC NOS

Unified Topology Architecture (One Architecture):

Unlike the traditional Clos architecture, every Leaf-to-Spine link follows Layer 3 Equal-Cost Multi-Path (ECMP), with no STP (Spanning Tree Protocol) and no single point of failure. Both campus and data center share the high bandwidth, low latency, and horizontal scalability capabilities of the Spine-Leaf topology.

More about Spine leaf network Architecture: What is Leaf-Spine Architecture and How to Build it?

Unified Technology Stack (One Stack):

Network services are increasingly becoming cloud-like. Multi-tenancy, virtualization, large-scale Layer 2, and cross-region connectivity cannot solely rely on VLANs and VRRP anymore. In Asterfusion’s solution, both data center and campus networks run BGP EVPN VXLAN, eliminating the exclusive use of data centers and enabling logical isolation (multi-tenancy) and cross-region connectivity within the campus network.

Application of BGP EVPN VXLAN in Campus and Data Centers

The foundation of implementing the Spine-Leaf network topology in both campus and data center environments lies in the deployment of the BGP EVPN VXLAN protocol.

Why BGP EVPN VXLAN is Needed

The emergence of BGP EVPN VXLAN primarily addresses the need for flexible, efficient, and scalable virtualized networks in data centers and large-scale network environments. As demand increases in campus networks, it has also been applied there. Here are some key reasons:

Large-Scale Network Virtualization Requirements With the rise of cloud computing and large-scale data centers, traditional VLAN-based network architectures can no longer meet the complex needs of modern networks. VLAN scalability is limited. VXLAN (Virtual Extensible LAN), an overlay network protocol, extends Layer 2 by encapsulating Ethernet frames. With a 24-bit VNI, VXLAN supports approximately 16 million logical networks, overcoming VLAN’s limitation of 4096 networks, enabling much larger-scale virtualized networks.

Network Isolation and Multi-Tenant Support Multi-tenancy is essential, especially in the cloud era. VXLAN provides fine-grained isolation, both for data centers and campus networks. Each VNI represents a virtual network, with different VNIs isolated from one another. This is ideal for cloud tenants or business domains.

Flexible Routing and Switching Capabilities VXLAN alone is not enough. Without a control plane, VXLAN relies on flooding for learning, which becomes inefficient as the scale grows. BGP EVPN (BGP-based Ethernet VPN) is introduced to provide the control plane, distributing MAC and IP addresses. This eliminates the need for flooding. Each VTEP has a global view, knowing where hosts are located and which tunnel to use. Using BGP EVPN VXLAN provides more flexible routing and switching.

Simplified Network Design The combination of BGP EVPN and VXLAN offers a unified control plane and data plane, simplifying network design.

High Availability and Fault Tolerance BGP EVPN naturally supports ECMP (Equal-Cost Multi-Path), which works perfectly with Spine-Leaf. With multi-path parallelism, if a link fails, traffic is automatically rerouted. VXLAN operates above Layer 3, does not rely on STP for loop prevention, and avoids the bottleneck of a “single tree” in the network.

Optimized Traffic Management The BGP EVPN VXLAN combination can leverage BGP for traffic engineering. For example, multi-path selection, better egress selection, and more efficient east-west traffic routing. In AI training clusters and large-scale east-west traffic, these optimizations are crucial.

More about BGP EVPN VXLAN Solution, please refer to Asterfusion SONiC based EVPN-VXLAN Solution

In the following sections, this article will introduce VXLAN from the data plane perspective and BGP EVPN from the control plane perspective.

VXLAN – Data Plane Technology for BGP EVPN VXLAN Network

1. What is VXLAN

To begin with, VXLAN is closely related to BGP EVPN VXLAN. VXLAN (Virtual Extensible LAN) is a tunneling technology that encapsulates Layer 2 Ethernet frames inside Layer 3 UDP packets, where the outer layer runs over an IP network, and the inner layer remains the original Layer 2 network. This creates a “Layer 2 network across Layer 3,” where business traffic appears to be within the same Layer 2, but it actually traverses a Layer 3 physical network.

In the overall BGP EVPN VXLAN solution, VXLAN primarily handles data plane forwarding.

2. Why Use VXLAN as Data Plane

As discussed earlier, there are several reasons for using VXLAN as the data plane protocol:

  • Disaggregated Physical and Logical Networks: The underlying Layer 3 topology, such as Spine-Leaf architecture, can remain unchanged. There’s no need to modify physical wiring or IP planning. VXLAN allows logical business networks to be built on top of it, supporting the dynamic addition and removal of tenants and subnets, which is especially important in campus environments.
  • Overcoming VLAN Limitations: Traditional VLANs have a limit of 4096 IDs. With the increase in multi-tenancy and IoT devices, the need for isolation becomes difficult to meet. VXLAN uses a 24-bit VNI (VXLAN Network Identifier), supporting approximately 16 million isolated networks, a vast improvement in scalability. This makes running out of VNIs highly unlikely.
  • Seamless Roaming for End Devices: In campus networks, devices like IP phones and laptops may move between access points. If a traditional Layer 2 network is used, this would often require frequent changes in subnets, leading to a poor user experience. With VXLAN’s “large Layer 2” architecture, devices can move between different access switches while keeping the same IP address and policies unchanged, providing a seamless experience for users. This is a critical feature.
  • Improved Link Utilization: Traditional Layer 2 networks rely on STP (Spanning Tree Protocol) to prevent loops, which leads to many links being blocked and left unused, wasting bandwidth. Under VXLAN architecture, the underlying network uses Layer 3 routing, making full use of ECMP (Equal-Cost Multi-Path) and enabling multiple links to operate simultaneously, significantly improving overall bandwidth utilization.

In simple terms, VXLAN is chosen for the data plane to make the BGP EVPN VXLAN combination more scalable and to fully utilize the underlying network. However, it is not a universal solution and must be evaluated based on the specific scenario.

3. Basic VXLAN Concepts

To help understand the subsequent technical implementations, here are some core components of the VXLAN architecture:

  • VTEP (VXLAN Tunnel End Point): The endpoint of a VXLAN tunnel, responsible for encapsulating and decapsulating the original Ethernet frames. In our architecture, this is typically implemented by physical switches (Leaf nodes) or software-defined endpoints.
  • VNI (VXLAN Network Identifier): Similar to a VLAN ID, it is used to differentiate between different logical tenants or services, with a 24-bit length.
  • VAP (Virtual Access Point): A virtual access point, which is a logical interface on a VTEP (usually a physical interface or sub-interface), used to associate specific incoming traffic to the corresponding VXLAN instance on the switch.
  • VXLAN Tunnel: A logical tunnel established between two VTEPs, used to transparently transport encapsulated VXLAN frames over the underlay network.

For more details about VXLAN, please refer to Why VXLAN is Critical for Data Centers ?

BGP EVPN – Control Plane Technology for BGP EVPN VXLAN Network

1. What is BGP-EVPN

In the entire BGP EVPN VXLAN solution, VXLAN handles the “data forwarding” part, while BGP EVPN is responsible for “where to forward the data,” similar to how the brain works.

BGP EVPN stands for BGP-based Ethernet VPN. Essentially, it is a BGP-based control plane used for distributing Layer 2 VPN control information. It reuses BGP’s mature routing mechanisms to synchronize MAC addresses, IP addresses, and VTEP locations across the entire network. Information such as which devices belong to which subnet and which VTEP they are on is broadcasted through BGP EVPN, so each VTEP knows where to send traffic, eliminating the need for flooding and trial-and-error learning.

In my understanding, it’s like delegating the “data searching for its path” task to the control plane, where routes are pre-calculated in advance.

2. Why Use BGP EVPN as the Control Plane

EVPN (Ethernet VPN) is a Layer 2 VPN technology specification defined by the IETF. The goal is straightforward: to use the control plane to learn MAC and IP addresses and replace large-scale flooding with precise messages, thus providing deterministic forwarding behavior for VXLAN.

In EVPN’s architecture definition, it does not force a binding to the underlying transport layer. Over the course of its industry evolution, several carriers have been commonly used:

  • PBB (Provider Backbone Bridging): A traditional operator technology, but its compatibility and scalability are limited in the modern open network ecosystem.
  • MPLS (Multi-Protocol Label Switching): A classic label-switching technology, advantageous in specific scenarios.
  • BGP (Border Gateway Protocol): The foundational protocol of the internet, with excellent routing distribution capabilities and scalability.

In Asterfusion’s technical approach, we ultimately chose BGP-based EVPN, or BGP EVPN, and this choice was not arbitrary.

First, BGP natively supports ECMP (Equal-Cost Multi-Path), which is highly suitable for multi-path scenarios. In a Spine-Leaf architecture, EVPN can evenly distribute traffic across multiple paths, eliminating the single-path limitation between Leaf and Spine, thus significantly improving reliability and bandwidth utilization.

Second, the data plane and control plane can be decoupled efficiently. VXLAN focuses on encapsulation and forwarding, while BGP EVPN focuses on information distribution and convergence.

From data centers to campus endpoints, the paths are clear and predictable.

Lastly, this aligns well with the open network ecosystem. In open-source projects like SONiC and FRR, BGP is the most tested and heavily utilized component. Extending EVPN on this foundation is far more reliable than building a solution from scratch.

Considering these factors, using BGP EVPN for the control plane in the BGP EVPN VXLAN architecture not only aligns with industry standards but also facilitates future alignment and evolution with the SONiC community.

Exploration: About MPLS. Click Here: Asterfusion’s SONiC-Powered Campus Switch Support MPLS

3. Basic Concepts of BGP EVPN

The commonly used IBGP and EBGP neighbors are not the focus here. In BGP EVPN networks, the following concepts are more important: Symmetric IRB, Asymmetric IRB, NLRI, and RD/RT.

IRB (Integrated Bridging and Routing): In EVPN networks, a VTEP can perform both Layer 2 bridging and Layer 3 routing functions, hence the term Integrated Bridging and Routing (IRB). Although this term may seem complicated, it’s commonly used. In a distributed gateway, IRB forwarding can be divided into two types: symmetric IRB and asymmetric IRB.

  • Symmetric IRB refers to the case where both the ingress and egress gateways perform only the Layer 3 routing function (or Layer 2 bridging if on the same network segment). The additional concepts of L3VNI and Router MAC are introduced in symmetric IRB.
    • L3VNI: The identifier used for cross-subnet routing within a tenant, associated with a VRF (Virtual Routing and Forwarding) instance.
    • Router MAC: The MAC address used by the VTEP node for Layer 3 forwarding, serving as the inner source MAC address.
  • Asymmetric IRB refers to the case where the ingress gateway performs both Layer 2 bridging and Layer 3 routing, and the egress gateway performs only Layer 2 bridging.

NLRI (Network Layer Reachability Information): In BGP-4, this carries IP prefixes. In EVPN, BGP not only carries IP addresses but also uses a new NLRI format to carry MAC addresses and Ethernet segment information. For example, information such as which MAC/IP belongs to which VTEP and which Ethernet segment it belongs to is conveyed through NLRI.

RD/RT (Route Distinguisher / Route Target): These are mature mechanisms inherited from MPLS VPN.

  • RD is used to differentiate routes for different tenants, ensuring that even if the prefixes are the same, they become unique with the RD.
  • RT is used to control which routes should be imported into which “VPN instances,” essentially acting as a tagging and distribution process. In a multi-tenant environment, RD/RT ensures that prefixes from different tenants do not interfere with each other.

4. BGP EVPN Message Types

In an EVPN network, there are five types of BGP EVPN-specific messages based on the BGP Update message, with the EVPN address family defined as: AFI=25, SAFI=70, based on MP-BGP.

EVPN Route TypesCore FunctionProblem Solved
Type 1 (Ethernet Segment Route)Negotiates the multi-active role of dual-homed access (DF)Prevents MAC flapping and enables link load balancing.
Type 2 (MAC/IP Route)Synchronizes MAC-IP-VTEP-VNI mappingReplaces ARP flooding and enables precise unicast forwarding.
Type 3 (Inclusive Ethernet Segment Route)Optimizes broadcast/unknown unicast traffic forwardingPrevents network-wide flooding and reduces bandwidth consumption.
Type 4 (ES Import Route)Synchronizes host redundancy information in multi-active scenariosPrevents route black holes and ensures traffic consistency.
Type 5 (IP Prefix Route)Transmits Layer 3 IP prefix informationEnables inter-VNI Layer 3 communication in VXLAN.

Route Type 1 and Route Type 4 for EVPN Multihoming

Route Type 1 and Route Type 4 are used for EVPN multihoming, and this functionality has already been implemented in enterprise campus networks.

Route Type 3 for BUM Traffic in Data Centers

Route Type 3 is used for BUM (Broadcast, Unknown Unicast, and Multicast) traffic in data center scenarios. The PMSI (Point-to-Multipoint Service Identifier) Attribute field defines two methods for distributing BUM traffic. For more details, see VXLAN Multicast BUM Traffic Forwarding: Best Practices for IR and Multicast Underlay

More About Message Interpretation and Interactions

For a detailed analysis, please refer to Mastering BGP EVPN in SONiC: 5 Route Types Analysis

Advanced Features in BGP EVPN VXLAN Networks

Now, let’s introduce the advanced features that we support in BGP EVPN VXLAN networks.

1. EVPN Multihoming

In a BGP EVPN VXLAN network, EVPN Multihoming is a critical feature for redundancy and load balancing. It is typically used in multi-link scenarios from Leaf switches to upper-level Spine devices, and can also be applied for dual-homed access devices. With multiple links working together, there is no longer just one uplink, and if one link fails, other paths can take over, providing some level of fault tolerance and load distribution. However, it’s not a universal solution.

EVPN Multihoming mainly relies on two types of routes to work in coordination:

  • Route Type 1 (Ethernet Auto-Discovery): Used to advertise Ethernet segments and multi-homing information within the EVPN network, helping nodes discover which devices are part of the same group, enabling load sharing over multiple paths.
  • Route Type 4 (Ethernet Segment Route): Used to describe the Ethernet segment (ES) itself, combined with DF (Designated Forwarder) election and other mechanisms, ensuring more orderly traffic forwarding in multi-homing scenarios and providing control information for redundant paths.

Campus Device Support: In Asterfusion’s campus devices, EVPN Multihoming is a fully implemented feature, not just a theoretical function. By combining multiple links and devices for uplink aggregation, we effectively avoid single points of failure. If one link fails, the business won’t go down, and traffic can be distributed across multiple paths, achieving a level of load balancing and fault tolerance.

Summary: The main objectives of EVPN Multihoming are high availability, load balancing, and redundancy, especially when an access node needs to connect to multiple Leafs or Spines. EVPN Multihoming significantly improves the overall stability and reliability of the BGP EVPN VXLAN network.

More about this ? Asterfusion Enterprise SONiC NOS Support EVPN Multi-homing

2. ARP Suppression

What is ARP Suppression? In simple terms, it’s about reducing unnecessary ARP broadcasts and making ARP more efficient. In large-scale virtualized environments, as the number of VMs increases, ARP requests become very frequent, causing massive broadcasts asking “who is who,” which over time can lead to ARP storms, consume bandwidth, and slow down the network. This issue is even more pronounced in BGP EVPN VXLAN networks.

Working Principle:

  • By enabling ARP Proxy, the switch can receive ARP requests from local hosts, query the local MAC/IP database, and directly send the ARP response to the host without broadcasting the ARP request across the entire network.
  • This avoids ARP flooding, reducing broadcast traffic and preventing ARP storms.

Asterfusion Device Support:

  • On Asterfusion campus devices, when ARP Suppression is enabled, the device uses ARP Proxy functionality to effectively reduce ARP broadcasts and optimize bandwidth usage.

Objectives:

  • Reduce ARP flooding and unnecessary broadcasting.
  • Improve overall network efficiency by focusing bandwidth on actual business traffic.
  • Prevent network congestion or ARP storms caused by excessive ARP broadcasts.

It’s not a magical solution, but in BGP EVPN VXLAN environments, ARP Suppression is almost an essential optimization option.

3. VXLAN Multicast Mode with (S,G) per VNI

VXLAN Multicast Mode with (S,G) per VNI is a technique for optimizing multicast traffic in VXLAN networks. Each VNI (VXLAN Network Identifier) is assigned a unique (S,G) (Source IP, Group IP) combination, reducing the spread of broadcast and multicast traffic, ensuring that only the required devices receive the multicast data.

Working Principle:

  • In VXLAN Multicast Mode, multicast transmission and switching mechanisms are based on unique (S,G) pairs for each VNI, ensuring that only the necessary VMs or hosts receive the multicast data, preventing other unnecessary devices from receiving unrelated multicast traffic.

Objective:

  • Optimize multicast traffic distribution, reduce unnecessary multicast data flow, and minimize network bandwidth waste.

4. VXLAN Cross-connect

VXLAN Cross-connect refers to connecting two different data centers or network areas through VXLAN tunnels, allowing them to interconnect Layer 2 or Layer 3 networks. Through VXLAN Cross-connect, traffic between different VNIs can be bridged or routed across physical network boundaries, forming a unified virtual network.

Use Case:

  • In cross-data-center environments, VXLAN Cross-connect can interconnect networks in two data centers, allowing VMs in one data center to communicate as if they were in the same data center.

5. ARP to Host

ARP to Host is a mechanism in BGP EVPN networks specifically designed to optimize ARP traffic. It is similar to ARP Suppression but focuses more on the MAC/IP binding advertisement and ARP request routing.

Function:

  • In traditional networks, ARP requests are sent as broadcasts, which may waste network bandwidth. In BGP EVPN, Route Type 2 (MAC/IP Advertisement Route) forms a distributed MAC/IP mapping table across the network. When a device needs another device’s MAC address, it can check the EVPN routing table and directly find the corresponding VTEP, instead of broadcasting ARP.

Optimization:

  • ARP to Host optimizes ARP traffic by injecting host routes into the routing table. ARP requests are directed to the device that contains the relevant host information, reducing ARP broadcasts and lowering network overhead.

Effect:

  • The main goal of ARP to Host is to reduce ARP broadcasts, minimize the risk of ARP storms, increase overall network efficiency, and make the MAC/IP mapping and query process in BGP EVPN more precise.

6. Anycast Gateway

Anycast Gateway is a redundancy and load balancing gateway technique in BGP EVPN VXLAN networks. It allows multiple gateway devices to share the same gateway IP, and routing chooses the “nearest” gateway to forward traffic. This concept is not difficult.

Function:

  • Multiple Leaf switches advertise the same gateway IP address via BGP EVPN, and BGP controls the routing. Traffic is dynamically directed to the nearest gateway (such as the nearest VTEP) for processing, providing high availability and fault tolerance in large-scale networks.

Optimization:

  • In Anycast Gateway mode, if a gateway device fails, its route will be removed from the control plane, and the remaining gateways will continue to provide the same gateway IP. Traffic is automatically rerouted to other available gateways without requiring terminal reconfiguration or manual intervention, avoiding single points of failure.

Objective:

  • Provide high availability and load balancing, ensuring that traffic automatically chooses the optimal path based on the network topology, improving network fault tolerance.

ESI-LAG is an advanced feature in BGP EVPN, designed to provide redundancy and load balancing by aggregating multiple physical links using LAG (Link Aggregation), ensuring even distribution of traffic across multiple network paths.

Function:

  • In BGP EVPN, ESI represents an identifier for a physical or virtual network. LAG is a technique that bundles multiple physical links to increase bandwidth and redundancy. ESI-LAG allows these aggregated links to be recognized as a single Ethernet segment in the control plane, participating in EVPN route selection and convergence.

Optimization:

  • ESI-LAG allows parallel operation of multiple physical links between different Leaf or Spine switches, avoiding single points of failure during link failure and ensuring traffic is evenly distributed across multiple links, improving bandwidth utilization and stability.

Objective:

  • Provide redundant paths, load balancing, and bandwidth expansion, ensuring that traffic can automatically switch to other links in the event of a failure, enhancing the network’s reliability and scalability.

By implementing key features in both campus and data centers, and supporting the advanced BGP EVPN VXLAN features required, we have enhanced the capabilities of Enterprise SONiC. This makes it more scalable than community SONiC and better reflects our research and development strength.

Conclusion: Evolving with SONiC

bgp-evpn-vxlan-sonic

Asterfusion, based on open networks, uses a unified Spine-Leaf topology across campus and data center environments. On top of the BGP EVPN VXLAN foundation, we not only inherit the capabilities of community SONiC but also expand on advanced features, amplifying the functions and support of Enterprise SONiC (AsterNOS).

Moreover, with the Spine-Leaf architecture unified and the same BGP EVPN VXLAN technology stack, both traditional campus networks and AI clusters or data center scenarios share a consistent operation and management process, eliminating the need to learn new configurations. This is something we highly value.

One more thing to remember: Asterfusion is evolving, and we’ll support even more capabilities in the future. Stay tuned!

Contact US !

Latest Posts