Skip to main content

The Ultimate Guide to Buying Data Center Switches: Your Comprehensive Resource!

written by Asterfuison

October 30, 2024

As AI and HPC requirements grow, choosing the right data centre switch becomes more complex. Unlike switches used for traditional layer2/3 enterprise networks,data centre switches have to cope with more complex networking tasks. This article will help you understand what a data centre switch is, and tips for buying a data centre switch.

What is a Data Center Switch?

A data center switch is a high-performance network device specifically designed for data centers, enabling fast data transmission between servers, storage devices, and other network equipment. These switches play a critical role within data centers, ensuring efficient communication between servers and providing a stable network connection.

Today we will go into detail from three perspectives: hardware, software and brand of switches to give you buying tips.

Chipset and CPU

When selecting a data center switch, the brand and performance of the chipset (ASIC) and CPU are crucial as they determine the switch’s core data processing capabilities, supported network protocols, energy efficiency, and future scalability.

Switch Chipset (ASIC)

The chipset, often referred to as an ASIC (Application-Specific Integrated Circuit), is responsible for the critical functions of data packet switching and routing. The brand and quality of the ASIC design are vital for several reasons:

  • Forwarding Performance and Throughput: High-performance chipsets excel at processing data packets swiftly, resulting in higher throughput and lower latency—an essential feature in high-traffic data center environments.
  • Buffer: The switch buffer is used to cache packets, especially in bursty traffic or congestion where large buffers can reduce packet loss. In high-traffic or east-west environments, large buffers help improve network throughput and reliability.
  • Latency: Low latency determines the speed of data processing, which is especially important in the era of AI. Low latency directly affects the response speed of AI applications and user experience.
  • Advanced Protocol Support: Different ASIC brands provide varying levels of support for network protocols and features, directly influencing the capabilities and versatility of switches. The switch’s features are capped by what the ASIC can integrate; without specific functionality embedded in the ASIC, software alone cannot deliver the desired performance. For instance, telemetry enables real-time monitoring of traffic, latency, and packet loss, while QoS, including RoCE (RDMA over Converged Ethernet), prioritizes critical traffic to ensure low latency in demanding environments. Additionally, PTP (Precision Time Protocol) support is vital for precise clock synchronization in time-sensitive applications like financial transactions. Ultimately, the effectiveness of these advanced protocols relies on the capabilities of the ASIC—without integration, even the best software cannot fulfill these essential functions.
  • Energy Efficiency and Thermal Management: Efficient chipset designs not only enhance performance but also minimize power consumption and cooling requirements. In large-scale data centers, energy efficiency and effective heat management are key to reducing operational costs.
  • Programmability: Advanced network functions or user-defined programming capabilities (like using the P4 programming language) are crucial for high-end switches .The introduction of programmable switching chips has significantly enhanced the functionality and flexibility of switches, enabling rapid integration of new features through software. This represents a crucial advancement over the limitations of traditional ASICs.
  • ASIC should support SAI interface: The increasingly popular white-box switches decouple software and hardware, and the ASIC of white-box switches should support the SAI interface. This is because the SAI interface enables ASICs from different manufacturers to run on the same software platform, achieving hardware and software compatibility and supporting the flexibility of white-box switches.

The Role of the CPU

The CPU in a switch manages critical operations, including control plane functions, running network protocol stacks, monitoring network status, and handling administrative tasks. Choosing the right CPU brand and performance level is crucial for several reasons:

  • Control Plane Performance: The CPU is responsible for tasks such as running routing protocols and traffic management. A more powerful CPU enhances the switch’s responsiveness, especially when managing numerous connections or complex protocols.
  • Network Virtualization and Security: As the demand for virtualization and security in data centers grows, switches must efficiently handle complex software-level tasks, including firewall functions and virtual routing. A robust CPU is essential for processing these tasks effectively.
  • Brand Technical Support: Opting for a well-known CPU brand, such as Intel, AMD, or ARM, not only ensures better stability and performance but also provides broader compatibility and access to extensive technical support.

When it comes to enhancing your data center’s performance, selecting the right switch hardware is crucial. Here are essential factors to consider to ensure your investment meets both current needs and future demands:

Seven Key Considerations for Selecting Data Center Switch Hardware :

  1. Port Density and Type
    1. Port Count: Assess your connectivity requirements to determine the appropriate port density. Choose a switch that can accommodate not only your current devices but also potential future expansions.
    2. Port Type: Evaluate the necessary port types (e.g., 10GbE, 25GbE, 40GbE, 100GbE, 200GbE, 400GbE, or 800GbE) to ensure the switch can support the growing bandwidth and throughput needs of your data traffic.
  2. Throughput and Latency
    1. Throughput: Ensure the switch can handle high traffic volumes, especially during large-scale concurrent transmissions, to maintain seamless operations.
    2. Latency: For mission-critical applications like AI/ML training and inference, prioritize low-latency switches that can rapidly forward data packets, enhancing overall processing efficiency.
  3. Bandwidth and Backplane Capacity
    1. Bandwidth: Verify that the switch meets current traffic demands while allowing for scalability to accommodate future growth from new devices and applications.
    2. Backplane Capacity: A robust backplane capacity is vital; it dictates the maximum data traffic the switch can manage. Ensure it is capable of supporting all ports under full load without bottlenecks.
  4. Redundancy and High Availability
    1. Power and Fan Redundancy: Select switches equipped with redundant power supplies and fans to maintain operational continuity in case of component failures.
    2. Hot-Swappable Components: Look for switches with hot-swappable components (like power supplies and fan modules) to facilitate maintenance without downtime, thereby enhancing uptime and availability.
  5. Compatibility and Scalability
    1. Compatibility with Existing Devices: Ensure the switch is compatible with your existing network infrastructure, including servers, storage devices, and other switches.
    2. Boxing-Based Switches for Clos Architecture: Consider the increasingly popular spine-leaf network topology. Box-based switches offer scalability by allowing the addition of new spine or leaf devices in parallel as needed.
  6. Energy Efficiency and Cooling
    1. Power Consumption: Be mindful of the switch’s power consumption, as this significantly impacts operational costs. Energy-efficient switches not only lower electricity expenses but also decrease heat output.
    2. Heat dissipation design: In high-density data centers, switches handle large, continuous data workloads, making effective thermal design crucial to prevent overheating, performance drops, and latency issues. Optimized airflow, advanced cooling materials, efficient fans, and smart temperature control together ensure stable, reliable switch operation, extending equipment lifespan and performance。
  7. Vendor Support and After-Sales Service
    1. Manufacturer Support: Choose brands known for reliable after-sales support and long-term firmware updates, ensuring you receive consistent technical assistance throughout the switch’s lifecycle.
    2. Warranty and Service Contracts: Pay close attention to warranty periods and the availability of extended service contracts for timely repairs and replacements when necessary.

By carefully considering these factors, you can select a switch that not only meets your data center’s immediate performance needs but also positions you for future growth and technological advancements.

Choosing the Right Operating System for Data Center Switches

When selecting data center switches, the operating system (OS) is as crucial as the hardware. It influences capabilities, performance, management, security, and future scalability. A stable, feature-rich, and user-friendly OS streamlines operations, reduces management complexity, and enables adaptability to evolving technologies and business demands.

In recent years, open network operating systems have gained significant traction, challenging traditional switches due to their openness, flexibility, cost-effectiveness, and innovative features. Let’s dive into the key factors to consider when choosing a switch OS.

Asterfusion SONiC NOS
Asterfusion Enterprise SONiC Distribution

Network Functions and Protocol Support:

The OS is the backbone of how a switch manages data flows and executes network protocols. Different OSs offer varying functionalities and protocol support :

  • Comprehensive Protocol Support: In addition to traditional Layer 2 and Layer 3 functions, modern data centre switches must support a range of advanced features to meet the demands of virtualisation and automation. These switches are tasked with managing multiple aspects of the network, including control, management, isolation, security and performance optimisation. By doing so, they help data centres to build efficient, secure and scalable network architectures.
  • Virtualisation features: Advanced functions supported by virtualisation are mainly focused on high performance transmission, resource isolation, tenant management and traffic optimisation. For example, virtualisation functions such as VXLAN, EVPN-Vxlan, EVPN multi-homing, etc. are essential in data centre environments.
  • Advanced Features: As AI technology advances, data center switches require sophisticated capabilities to handle large-scale AI tasks with low latency and minimal packet loss. For example, RoCEv2 facilitates direct data transfers between servers without CPU intervention, reducing latency and boosting bandwidth.
  • Software programmability: This means that The switch’s functions and features can be modified and extended through software code, allowing it to adapt to changing network needs and emerging standards. For instance, a programmable switch can support future UEC (Ultra Ethernet Specification) by updating its operating system or application without replacing the hardware. This flexibility enables network operators to rapidly adapt and deploy new features as technology evolves, enhancing network scalability and return on investment.

Stability and Performance

A stable OS is the cornerstone of data center reliability and service continuity:

  • High Availability: A robust OS ensures efficient operation even under heavy loads, minimizing network disruptions and failures. Given the potential economic impact of downtime, the OS must be rigorously tested for high availability.
  • Optimized Performance: The OS should efficiently manage hardware resources to maximize performance, enabling faster data forwarding, reduced latency, and higher throughput.

Management and Monitoring:

  • Centralized Management: Modern data centers demand centralized management systems to oversee thousands of switches. The OS should support API-based management interfaces like REST APIs, allowing administrators to configure and manage devices at scale.
  • Automation and Programmability: Many OSs support network automation tools like Ansible, Puppet, and Chef, which are essential for large-scale data centers. These tools reduce manual configuration errors and facilitate automated network deployment and updates.
  • Monitoring and Troubleshooting: A comprehensive monitoring system is crucial for maintaining optimal network performance. The OS should provide robust monitoring tools that support protocols like SNMP, NetFlow, and sFlow, allowing administrators to keep a close eye on network performance, traffic statistics, and device health in real time. Moreover, advanced features such as In-band Network Telemetry (INT) can track packet delays, drops, and path traversal, empowering smarter network management and improving congestion control.

Security

  • Built-in Security Features: Modern switch OSs typically integrate firewalls, intrusion detection systems (IDS), and access control lists (ACLs) to prevent attacks and data leaks. Support for encryption protocols like IPsec and MACsec is also essential for securing data.
  • Regular Updates and Patches: It’s vital that OS vendors provide ongoing security updates and vulnerability fixes to protect against emerging threats.

Scalability and Openness

The scalability of the OS affects the long-term value of your equipment:

  • Modular Design: Some advanced switch OSs feature a modular design, allowing for functionality modules to be added or upgraded as needed. This flexibility means switches can be updated with new capabilities without replacing the entire device.
  • Openness: Open-source OSs like SONiC allow users to customize network behavior according to their business needs, offering high flexibility.

High Reliability:

  • MLAG (Multi-Chassis Link Aggregation): Supports link aggregation between multiple switches to enhance network redundancy and load balancing, ensuring network continuity in the event of link or device failure.
  • Features like EVPN Multihoming can significantly enhance network resilience and fault tolerance, enabling rapid switching to backup paths in case of failures.

Brand Support and Ecosystem:

  • Long-Term Technical Support: Reputable OS vendors typically offer extended technical support and firmware upgrades, ensuring full support throughout the device’s lifecycle.
  • Ecosystem Integration: A powerful OS often integrates seamlessly with other systems in the data center, such as storage, servers, and network management platforms, creating a comprehensive IT infrastructure solution.

Open Network Operating System (NOS)

As open network switches gain traction, an open network operating system (NOS) with in-memory databases, containerization, and event-driven architecture significantly boosts adaptability and scalability, creating a more efficient network.

  • In-Memory Database: Enables ultra-fast data processing, essential for real-time updates on routing and forwarding, reducing latency and improving responsiveness.
  • Containerization: Provides modularity and isolation, allowing quick deployment of microservices. Network operators can easily upgrade or extend features without restarts, ensuring continuous service.
  • Event-Driven Architecture: Automatically responds to real-time events, enhancing system adaptability, reliability, and stability.

These features make open NOS a powerful choice for building responsive, future-ready networks.

Data Center Switch Brand Selection: Navigating Your Options

As cloud data centers expand and virtualization technologies evolve, a diverse array of switches has emerged in the market, including branded switches, white-box switches, and bare-metal switches. Choosing the right switch brand is crucial for ensuring seamless upgrades and optimal performance in your data center network. Let’s break down the options available to you.

Branded Switches

Branded switches are offered by well-established network equipment vendors such as Cisco, Huawei, and Juniper. These switches come with both hardware and operating systems developed by the same manufacturer, ensuring high stability and reliability. While they typically carry a higher price tag, the investment often pays off in terms of their famous brand, stable performance and comprehensive technical support.

White-Box Switches

White box switches are the best option for flexibility and cost effective. These switches decouple the hardware from the operating system, giving customers the freedom to choose their data centre switch hardware from a range of hardware vendors and the software that suits them from open source software vendors. The best option is to choose a vendor that offers a complete one-stop shop for open networking. White box switches are the best option for customers who want to choose the right product for their specific needs. They are flexible and cheaper. However, hardware and software after-sales are provided by different vendors, which is not ideal for after-sales technical support. That’s why white box network vendors are now providing one-stop service, both hardware and open operating system turnkey solution development like Edgecore, Asterfusion, Celestica, and Delta.

Bare-Metal Switches

Bare-metal switches consist solely of hardware without a pre-installed operating system. Customers have the option to install community-based OS versions, such as SONiC, but these typically lack formal support. Most of them are sold by ODM manufacturers. Bare-metal switches are best suited for large enterprises with dedicated technical teams capable of developing their own Network Operating Systems (NOS). This option allows for maximum customization but requires significant technical expertise.

Conclusion of the Data Center Switch Buying Guide

And that wraps up today’s data center buying guide! I hope you now feel confident about the essential standards for selecting switch hardware and software. When you’re on the hunt for a switch, remember to kick things off by examining the hardware. It’s crucial to ensure that the ASIC CPU and other hardware specifications align with your specific needs. But don’t stop there! Good hardware is only half the battle; you also need a feature-rich, flexible, and powerful operating system to truly unlock its potential. And of course, all of this should fit comfortably within your budget.

In our next issue, we’ll dive into the intriguing world of pricing. We’ll explore the fascinating decision-making process between choosing brand-name switches versus white box switches. This is a topic that can significantly impact your network’s performance and cost-effectiveness, please refer to: Switching Strategies: Should You Go Branded Switch or Embrace White Box Switch?

Latest Posts