Skip to main content

Differences between Scale Up, Scale Out and Scale Across

written by Asterfuison

December 10, 2025

Introduction

As AI adoption accelerates, user demand and workload volume continue to increase. AI models are becoming larger and more complex, raising a key question: can current AI data centers provide sufficient scalability, bandwidth, and latency performance? If the infrastructure cannot support these requirements, what approaches can be used to upgrade or evolve an AIDC?

This article introduces three expansion methods: Scale Up, Scale Out and Scale Across. Understanding these approaches is essential when designing or modernizing AI infrastructure.

Basic Concept about AI

Before discussing Scale Up, Scale Out and Scale Across, it is helpful to review several foundational concepts in AI infrastructure. These concepts are also areas that many readers want to understand first. A clear understanding of the basics makes it easier to study more advanced designs.

Compute node: A compute node is the smallest independent unit in a compute cluster. In practical terms, it can be viewed as a single device. The form factor may be a server, an embedded system, an intelligent terminal, or even a virtualized logical unit that does not resemble a traditional server.

Compute / AI Cluster: A cluster is a group of devices with independent compute capability that work together as a unified system. These devices (physical or virtual) are connected through a high-speed fabric and coordinated by a shared scheduling, communication, and management framework. Cluster-to-cluster communication also relies on high-speed networks such as InfiniBand or RoCE.

Cluster types:

  • By resource scheduling and workload mode: dedicated single-task clusters or shared multi-task clusters.
  • By primary use case: AI clusters, HPC clusters, storage clusters, general-purpose compute clusters, and edge compute clusters.
  • By core hardware type: GPU clusters, NPU clusters, CPU clusters, and DPU clusters.
  • By hardware architecture and coordination model: homogeneous, heterogeneous, distributed, or centralized clusters.

GPU Cluster: This is the most common type of compute cluster in an AI data center. GPUs are first integrated into a server to form a GPU compute node. Multiple nodes are then interconnected through a high-speed network to build a multi-node GPU cluster. Within a single server, multiple GPUs are also linked through board-level high-speed interconnects to form an internal “single-node GPU cluster.” The overall structure is therefore layered: internal GPU interconnect → server as a compute node → cross-node GPU cluster through the data center network.

Scale: You may ask what is scale in Scale Up, Scale Out and Scale Across. Scale is an action, not a technology or a physical component. A simple analogy is to think of a compute node in an AI data center as a rubber band. Scaling is the act of stretching it. The three scaling methods differ only in how the rubber band is extended. In practice, Scale Up, Scale Out and Scale Across represent different ways of modifying AI infrastructure. Therefore, IT architects must carefully evaluate Scale Up, Scale Out and Scale Across to ensure optimal performance and cost-efficiency.

Based on these concepts, we can view architectural changes in two directions: reduction and expansion.

  • Scale-in is the contraction of resources when system load decreases. It reflects the “bidirectional” nature of scalability.
  • Scale Up, Scale Out and Scale Across describe expansion when workload demand increases. Scale-up enhances a single node vertically, scale-out adds nodes horizontally, and scale-across extends resources across data centers.

With this foundation, we can now examine Scale Up, Scale Out and Scale Across in detail and outline the differences between them.

Scale UP in AI

1. What It Is

scaleup-scaleout-and-scaleacross-up

Scale-up refers to increasing compute resources within a single node. In AI workloads, it specifically means adding more GPUs inside one server and connecting them through a high-bandwidth interconnect. This allows 8 or 16 GPUs to share memory and operate as a single large logical GPU. Scale-up is the primary approach for handling models that exceed the memory capacity of a single GPU.

2. Architecture (Physical Topology)

  • Core components: GPUs such as H100/H800, NVLink high-speed interconnects, NVSwitch chips, and GPU baseboards such as the HGX module.
  • Connection methods:
    • Point-to-point (early or small setups): GPUs are linked directly through NVLink bridges.
    • Fully connected (mainstream for high performance): The standard NVIDIA HGX design. All GPUs—typically eight—connect to on-board NVSwitch chips.
    • Signaling path: GPU A → NVLink → NVSwitch → NVLink → GPU B.
    • Topology: NVSwitch enables a non-blocking full-mesh structure. Any two GPUs communicate with the same bandwidth.

Note: NVLink links GPUs directly. NVSwitch acts as a switching layer, with each link connecting a GPU to the NVSwitch.

3. Advantages

  • High bandwidth: For example, H100 with NVLink Gen4 provides up to 900 GB/s bidirectional bandwidth, far exceeding PCIe Gen5.
  • Low latency: Communication travels only through PCB traces or short internal cables, with nanosecond-level latency.
  • Unified memory capability: GPUs can form a unified memory pool, allowing model parameters to move across devices with minimal overhead. This is important for large-model training.

4. Limitations

  • Physical constraints: Chassis space, cooling capacity, and power delivery limit the number of GPUs in a single server. Typical designs support 8 or 16 GPUs, with rare cases reaching 32.
  • High cost: NVLink and NVSwitch use proprietary NVIDIA hardware and carry significant cost.
  • Centralized failure domain: A failure in the baseboard or NVSwitch can disrupt all GPUs within the server.

Scale Out in AI

scaleup-scaleout-and-scaleacross-out

1. What It Is

Scale Out refers to connecting multiple compute nodes through a high-speed network to form a large-scale compute cluster. When a single server (Scale Up) reaches its physical performance ceiling, the only way to increase total compute capacity is by adding more nodes or clusters.

2. Architecture (Physical Topology)

  • Core components: Server NICs such as ConnectX-7, optical transceivers, AOC/DAC cables, top-of-rack (ToR) switches, and spine switches.
  • Connection pattern: A Spine–Leaf topology.
    • Access layer: Each AI server typically carries 4–8 high-speed NICs dedicated to compute traffic. These NICs connect to the ToR switch through optical links.
    • Aggregation/core layer: ToR switches uplink to spine switches.
    • Protocols: RDMA is the dominant communication method, using either InfiniBand or RoCEv2 over Ethernet.

3. Advantages

  • Large-scale expandability: With properly designed fat-tree or multi-tier topologies, the cluster can scale to tens of thousands of GPUs.
  • Elasticity: Nodes can be added or removed on demand. A failed node can be isolated without affecting the remaining cluster.
  • Standardized ecosystem: Scale Out relies on standard InfiniBand or Ethernet technologies, offering broad vendor options such as NVIDIA/Mellanox, Broadcom, Cisco, and Huawei.

4. Limitations

  • Network bottlenecks: Even 800G fabrics deliver lower bandwidth than NVLink and introduce microsecond-level latency, several orders of magnitude higher than intra-server GPU links.
  • Tail latency: Very large clusters are sensitive to congestion and packet loss. A single slow node can delay collective operations and degrade overall training efficiency.
  • Operational complexity: Large fabrics require massive numbers of fibers and transceivers. Cabling, power consumption, and maintenance become big challenges.

Scale Across in AI

scaleup-scaleout-and-scaleacross-across

Rapidly growing AI clusters are moving beyond the boundaries of a single data center. Emerging architectures require a scale-across approach to extend AI workloads seamlessly across multiple data center locations while optimizing compute utilization and resource pooling. This represents a significant evolution in strategy, moving beyond isolated clusters to a holistic view of Scale Up, Scale Out and Scale Across.

As highlighted during the AI Infra Summit panel, Transformative Advancements in Scale-Up, Scale-Out, Scale-In, and Scale-Across Networks,” held in Santa Clara on September 10th at 2:40 pm, the demand for scale-across capabilities is accelerating in modern AI infrastructure. In parallel, Cisco introduced the 8223 router, designed specifically to enable ultra-large, cross–data-center AI GPU clusters.

1. What It Is

Scale-Across refers to extending compute, storage, and network resources across multiple data centers, facilities, campuses, or geographically distributed sites. It allows several independent data centers to operate as a single logical system that supports distributed AI training, inference, storage, and scheduling. This approach differs from Scale-Up and Scale-Out, which expand capacity within a single data center. Its core idea is to use Data Center Interconnect (DCI), metro and wide-area networks, and optical transport to link multiple sites and enable resource pooling across locations.

2. Architecture

A Scale-Across environment consists of multiple data centers (DC1, DC2, and so on), each potentially using its own Scale-Up and Scale-Out topology. These data centers are interconnected through high-speed optical links and DCI/WAN/metro routers and switches, forming a cross-site backbone. The network may include optical line systems and high-bandwidth channels such as 800G or 1.6T to support large volumes of model traffic and data movement. In some designs, several data centers are treated as one “regional cluster” or “compute region,” allowing tasks to be scheduled across sites.

3. Advantages

  • Overcomes single–data-center limits in power, space, cooling, and capacity.
  • Enables resource pooling and elasticity; workloads can burst across sites, and cold/hot standby designs are easier to implement.
  • Improves fault tolerance; a single data center outage does not take down the entire service.
  • Suitable for large-scale, global, or cross-region AI services that require massive data movement or distributed inference.

4. Challenges

  • High cost and complexity due to optical networks, DWDM systems, long-haul fiber, and specialized routing and switching hardware.
  • Latency, jitter, and bandwidth are significantly worse than intra-data-center or in-server interconnects such as NVLink, limiting strong-coupling workloads like heavy all-reduce.
  • Cross-site scheduling and data synchronization are more complex.
  • Operations become more demanding, including cross-DC monitoring, security policies, traffic engineering, and disaster recovery design.

Below is a comparison table that highlights the key differences between Scale-out, Scale-up, and Scale-across.

CategoryScale UpScale OutScale Across
What It DoesAdds more GPUs, CPUs, memory, or NVLink/NVSwitch bandwidth within a single node/serverAdds more servers to expand resources horizontally within the same data centerExpands AI capacity across multiple data centers, campuses, or metro regions
Primary TargetA single node (server / GPU box)A cluster inside one data centerMulti–data-center compute pool
Expansion MethodVertical enhancement: NVLink, NVSwitch, PCIe, CXLHorizontal enhancement: 400/800G Ethernet or InfiniBandDCI-based enhancement: 400/800G optics, routed optical networking, SRv6/EVPN
Performance CharacteristicsHighest bandwidth and lowest latency (intra-node)Aggregate performance across nodes; subject to network overheadHigh capacity DCI links optimized for resilience and throughput; latency higher than intra-DC
ScalabilityLimited by server chassis space, power, cooling, and switch ASIC capacityVirtually unlimited, constrained by network fabric designScales across buildings, campuses, regions; limited by DCI cost and optical reach
ResilienceSingle point of failure—node failure directly impacts workloadHigh fault tolerance; cluster operates even if some nodes failGeographic redundancy and cross-DC failover increase systemic resilience
Cost ProfileHigh cost per node due to dense GPU and NVSwitch integrationLower per-node cost; incremental scaling possibleHighest cost—requires DCI routers, DWDM optics, dark fiber, metro networking
Deployment ComplexityEasier to manage (few but powerful nodes)Requires sophisticated orchestration, NCCL optimizations, RDMA tuningHighest complexity: multi-DC orchestration, WAN QoS, encryption, optical transport
Network RequirementsNVLink, NVSwitch, PCIe Gen5/Gen6, CXLHigh-speed Ethernet (RoCEv2), InfiniBand, Congestion management (ECN, PFC)400/800G DCI, routed optical networks, SRv6, EVPN, Cisco 8200/8000 series routers
Typical Use CasesModel parallelism, tensor parallelism, real-time inference, HPCData parallelism, large-scale distributed training, multi-node inferenceMulti-DC AI training fabrics, regional compute pooling, cross-campus GPU scheduling

It is worth noting that although Scale-out, Scale-up and Scale-Across are introduced as separate concepts, mainstream AI system designs typically combine Scale-up and Scale-out into a hierarchical scaling architecture. The common approach is to Scale-up first, and then Scale-out.

Layer 1 (Scale-up): Within a single server node, NVLink and NVSwitch create a high-bandwidth GPU domain (GPU Island / GPU Pod).

Layer 2 (Scale-out): InfiniBand or RoCEv2 switches interconnect multiple GPU Islands to form a large-scale cluster (GPU Supercluster).

In this architecture, a GPU SuperPOD consists of multiple Scale-up nodes connected through a Scale-out fabric.

Since Scale-Across is a newer concept, it has not yet been fully integrated into mainstream AI network designs. Several technical challenges still need to be addressed before it can be combined effectively with existing Scale-up and Scale-out architectures.

Asterfusion RoCEv2 and ToR Switch for AI Architecture

Asterfusion’s DC series fully supports RoCEv2 and can operate as either a ToR switch or a Spine–Leaf switch in data-center deployments.

scaleup-scaleout-and-scaleacross-topo

High-Performance Data Center Switches with RoCEv2 Support

The CX-N and CX-N-V2 series switches (such as CX308P-48Y-N-V2) ship with the enterprise SONiC distribution AsterNOS. They provide full RoCEv2 support and deliver wire-speed L2/L3 forwarding. The platforms offer large routing capacities (288K IPv4 routes and 144K IPv6 routes) and extensive MAC and ARP table resources, making them suitable for large data-center networks, multi-tenant environments, and multi-Pod architectures.

For AI, HPC, and distributed training workloads, the Asterfusion DC product line supports lossless and congestion-control features such as RoCEv2, PFC, ECN, QCN/DCQCN, and DCTCP. These mechanisms enable reliable, zero-loss, high-efficiency transport for RDMA traffic.

Comprehensive ToR / Leaf / Spine / DC-Fabric Coverage from 25G to 800G

The CX308P-48Y-N-V2 provides 48×25G + 8×100G interfaces. It is suited for OOB management networks, front-end access networks, or inference clusters as a ToR or Leaf switch connecting servers to upstream Spine layers.

For deployments that require higher bandwidth or larger cluster scale (such as large GPU clusters, high-bandwidth RDMA fabrics, or high-speed storage access), Asterfusion offers whitebox and bare-metal switches across 100G, 200G, 400G, and 800G tiers for Spine-Leaf networks.

Each switch supports standard L2/L3 features, VXLAN/EVPN, BGP-EVPN, multi-tenant overlays, and multi-Pod environments. These capabilities align with cloud-native and AI-infrastructure design principles.

Strong Network Management and Automation Capabilities

Enterprise SONiC Distribution by Asterfusion – AsterNOS provides traffic monitoring and visibility. It exposes metrics such as CPU, traffic counters, packet drops, latency, and RoCE congestion through Exporter, and integrates with monitoring systems such as Prometheus and Grafana for real-time visualization and alerting.

The system also supports ZTP, Python and Ansible-based automation, SPAN/ERSPAN, and in-band telemetry (INT/vINT) for deep diagnostics and traffic analysis.

Conclusion

In Scale Up, Scale Out and Scale Across, Scale-Up and Scale-Out remain the core expansion methods for today’s AI clusters. Scale-Up increases per-node density and efficiency, while Scale-Out builds large, distributed GPU clusters.

Scale-Across is an emerging approach explored by advanced vendors, cloud providers, and supercomputing facilities. As AI models grow and single-data-center limits tighten, cross-DC collaboration becomes more important. This model depends on DCI, optical transport, long-haul high-bandwidth links, and cross-region scheduling.

However, Scale-Across still faces limitations in tightly coupled training due to latency and bandwidth constraints. It is better suited for resource pooling, asynchronous workloads, distributed inference, and mixed parallelism with asynchronous coordination. For tasks that require very low latency and high-bandwidth synchronization, Scale-Up and Scale-Out inside a single data center remain the preferred architecture.

Regardless of the chosen path, Asterfusion provides the high-performance switching fabric necessary to successfully implement Scale Up, Scale Out and Scale Across.

Contact US !

Latest Posts