Skip to main content

MC-LAG and STP Interoperability White Paper
— Building a Highly Available, Loop-Free Campus Network

1. Overview

In campus aggregation and access networks, the coordinated design of Multi-Chassis Link Aggregation (MC-LAG) and the Spanning Tree Protocol (STP) defines the mainstream standard for high availability. Traditional Link Aggregation (LAG) only provides link-level redundancy within a single device. It cannot eliminate the single point of failure caused by a switch outage. MC-LAG removes this physical device boundary by virtualizing two switches into a single logical aggregation node. This dual-device virtualization architecture provides device-level redundancy while fully utilizing the bandwidth of links that would otherwise remain blocked in traditional STP-based topologies.

Even so, Layer 2 loop risks still exist in complex scenarios such as multi-tier topologies, initialization, and device reboot events. As a result, STP remains essential as the underlying loop prevention mechanism.

This white paper provides an in-depth analysis of the MC-LAG architecture in Asterfusion AsterNOS, including internal protocol interactions and data synchronization mechanisms. It also combines STP deployment best practices to provide both theoretical reference and operational guidance for building efficient and resilient campus and cloud networks.

1.1 Deployment Scenarios

Asterfusion AsterNOS provides deep integration and optimization for MC-LAG and STP interoperability. By deploying STP and MC-LAG at the access or aggregation layer, users can achieve the following key capabilities:

  • Physical-Level Redundancy: When one MC-LAG member switch fails, traffic can converge to the peer device within milliseconds, with minimal service impact.
  • Logical Loop Prevention: During MC-LAG deployment or abnormal topology changes, STP can automatically detect and block redundant paths to maintain a loop-free topology.
  • Maximum Bandwidth Utilization: Under normal operating conditions, MC-LAG links can perform load balancing across all active physical links, maximizing available bandwidth utilization.

1.2 Basic Concepts

  • Peer-Link: A direct connection between two MC-LAG switches, typically configured as a high-bandwidth Link Aggregation Group (LAG). It is mainly used to exchange protocol packets, such as ICCP messages, and to forward traffic during failure scenarios.
  • Member Interface (Member Port): A link aggregation interface connected to downstream devices. It is used for traffic forwarding, load balancing, and redundancy protection. Member interfaces logically share the same system identifier (system MAC address).
  • ICCP (Inter-Chassis Communication Protocol): A protocol used for inter-device communication. It establishes the MC-LAG peer relationship and synchronizes status information between peer devices.
  • DAD (Dual-Active Detection): A mechanism used to periodically detect peer device reachability and prevent split-brain conditions caused by Peer-Link failures. The heartbeat path can use any reachable Layer 3 IP path, including an out-of-band management network or a production network through Spine switches.

1.3 Core Processes and Interaction Logic

Within AsterNOS, MC-LAG operation mainly relies on the coordination between iccpd, mclagsyncd, and the Linux kernel.

  • iccpd (Inter-Chassis Control Process): This is the core control process of MC-LAG. It is responsible for running the ICCP protocol and establishing and maintaining control sessions with peer devices. iccpd handles role election (Active/Standby), interface MAC address updates, and state machine transitions.
  • mclagsyncd (Table Synchronization Process): This process acts as the synchronization agent between Redis databases (CONFIG_DB, STATE_DB, and APP_DB) and iccpd. It monitors locally learned ARP and FDB events, encapsulates the information, and sends it to iccpd for peer synchronization.
  • mclagdctl (Management Utility): This utility provides a CLI interface for querying and troubleshooting MC-LAG operational status. It communicates with iccpd through Unix domain sockets to retrieve real-time session and interface information.

2. Operation Principles

2.1 MC-LAG Protocol Operation

AsterNOS implements a lightweight ICCP protocol in the MC-LAG control plane. It focuses on minimal consistency verification and essential state synchronization while preserving core functionality.

In a typical MC-LAG deployment, the ICCP protocol (a standard defined in RFC 7275) establishes a TCP-based session between MC-LAG peers using port 8888. Two devices form an ICCP neighbor by using local_ip and peer_ip as the source and destination addresses of the TCP connection.

Once the session is established, each device sends heartbeat messages to its peer at 1-second intervals. If no heartbeat is received for 15 consecutive intervals (15 seconds), the session is considered timed out and the ICCP connection is brought down.

M-LAG topology with peer link and PortChannel

2.2 Core Mechanism Analysis

2.2.1 Port Isolation and Unidirectional Path Control

To prevent loops caused by BUM (Broadcast, Unknown-unicast, and Multicast) traffic between the Peer-Link and member ports, MC-LAG introduces the concept of an Isolation Group.

  • Isolation Logic: When both the local member port and the remote peer member port are operational, the system enforces a unidirectional isolation from the Peer-Link toward the member port. In this state, traffic received from the Peer-Link is not forwarded to the member port. This prevents loop formation such as “Switch A → downstream device → Switch B → Peer-Link → Switch A”.
  • Isolation Release: If the local member port fails, the system automatically removes the isolation. Traffic forwarded via the Peer-Link can then be sent out through this port, especially when it is the only available path to reach orphan devices.

2.2.2 Table Synchronization Mechanism

To present two independent devices as a single logical node externally, key forwarding tables must be synchronized. The ICCP protocol performs configuration consistency checks as well as ARP and MAC table synchronization.

  • MAC Synchronization: When an MC-LAG session is established, an initial FDB (MAC table) synchronization is performed. To avoid FDB instability, MAC learning on the Peer-Link is disabled during MC-LAG operation. When a new MAC entry is learned on a member device, mclagsyncd detects the event and immediately notifies iccpd to synchronize the entry across peers.
  • ARP Synchronization: An initial ARP synchronization is performed when the peer session is established. An aging flag mechanism is introduced so that an ARP entry is only removed on both devices when it has aged out on both sides.

2.2.3 DAD Dual-Active Detection

The DAD link (Dual-Active Detection link) is a Layer 3 reachable connection used for exchanging dual-active detection messages between MC-LAG peers. When a Peer-Link failure is detected, to prevent network inconsistency caused by loss of synchronization between peers, the system places all interfaces on the standby device—except management and Peer-Link interfaces—into an Error-Down state.

2.2.4 Failure Handling Mechanism

When a link failure occurs on one side of the MC-LAG (member port down), the device proactively updates the MAC and ARP entries associated with the affected endpoint, redirecting the interface information to the Peer-Link interface. Downstream traffic is then forwarded to the peer device through the Peer-Link, ensuring zero-service-impact failover. Once the fault is restored, traffic forwarding returns to the normal path automatically.

3. Deep Integration of STP and MC-LAG

Although MC-LAG eliminates loops at the logical level, Layer 2 loops can still occur in multi-tier topologies or due to misconfiguration (for example, when a Peer-Link is accidentally connected to a third-party switch). In campus networks, this remains a critical risk. Deploying STP together with MC-LAG requires specific adaptations to standard Spanning Tree behavior.

3.1 Shared Bridge MAC Mechanism

In standard STP operation, the Bridge ID is a key factor in topology calculation:

Bridge_ID = Priority + Bridge_MAC

If the two MC-LAG peers use different Bridge MAC addresses, downstream devices will perceive them as two independent STP nodes. This may lead to one of the aggregated links being blocked, reducing available bandwidth and disrupting traffic symmetry.

AsterNOS addresses this issue by introducing a shared Bridge MAC design. By configuring a consistent bridge MAC across both devices, the two physical switches are presented as a single logical bridge in the STP topology.

As a result, downstream devices treat the Link Aggregation Group (LAG) connected to both switches as a single STP entity. This ensures the aggregated link remains in the Forwarding state and avoids unnecessary blocking caused by STP ambiguity.

3.2 STP BPDU Filtering (BPDU Filter) Application

In STP + MC-LAG deployments, a critical configuration is enabling stp bpdu-filter enable on the Peer-Link interface.

Why BPDU Filtering is Required

  • Preventing Port Blocking: The Peer-Link serves as the control and failover channel for MC-LAG, carrying synchronization and traffic rerouting information. Without BPDU filtering, BPDUs from downstream switches may traverse member ports and reach the Peer-Link. STP may incorrectly interpret the Peer-Link as a redundant loop path and place it into the Discarding state. This would directly break the control plane and traffic recovery path between the two peer devices.
  • Management Domain Isolation: BPDU filtering isolates the internal MC-LAG control domain from external STP topology changes. This prevents frequent topology recalculations from affecting MC-LAG stability, while the shared Bridge MAC ensures a loop-free external STP representation.

4. Implement MC-LAG and STP on Enterprise SONiC-based Switches

For detailed configuration steps, please refer to MC-LAG and STP Configuration on Enterprise SONiC Switch

Related Products